WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
238558
REGRESSION(
r288847
): GPU Process crash under GPUProcess::canExitUnderMemoryPressure()
https://bugs.webkit.org/show_bug.cgi?id=238558
Summary
REGRESSION(r288847): GPU Process crash under GPUProcess::canExitUnderMemoryPr...
Simon Fraser (smfr)
Reported
2022-03-30 10:18:13 PDT
Created
attachment 456139
[details]
Crash log iOS layout tests show a crash with this backtrace: Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 com.apple.JavaScriptCore 0x136c83ace WTFCrash 1 com.apple.WebKit 0x11903c278 WTFCrashWithInfo(int, char const*, char const*, int) 2 com.apple.WebKit 0x119c90970 WebKit::QualifiedResourceHeap::checkInvariants() const 3 com.apple.WebKit 0x119c996f6 WebKit::QualifiedResourceHeap::hasImageBuffer() const 4 com.apple.WebKit 0x119c83f76 WebKit::RemoteResourceCache::hasActiveDrawables() const 5 com.apple.WebKit 0x119c71346 WebKit::RemoteRenderingBackend::allowsExitUnderMemoryPressure() const 6 com.apple.WebKit 0x119c70dc0 WebKit::GPUConnectionToWebProcess::allowsExitUnderMemoryPressure() const 7 com.apple.WebKit 0x119c76640 WebKit::GPUProcess::canExitUnderMemoryPressure() const 8 com.apple.WebKit 0x119c756fc WebKit::GPUProcess::tryExitIfUnused() 9 com.apple.WebKit 0x119cb10fc decltype(*(std::__1::forward<WebKit::GPUProcess*&>(fp0)).*fp()) std::__1::__invoke<void (WebKit::GPUProcess::*&)(), WebKit::GPUProcess*&, void>(void (WebKit::GPUProcess::*&)(), WebKit::GPUProcess*&) 10 com.apple.WebKit 0x119cb107a std::__1::__bind_return<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, std::__1::tuple<>, __is_valid_bind_return<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, std::__1::tuple<> >::value>::type std::__1::__apply_functor<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, 0ul, std::__1::tuple<> >(void (WebKit::GPUProcess::*&)(), std::__1::tuple<WebKit::GPUProcess*>&, std::__1::__tuple_indices<0ul>, std::__1::tuple<>&&) 11 com.apple.WebKit 0x119cb102e std::__1::__bind_return<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, std::__1::tuple<>, __is_valid_bind_return<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, std::__1::tuple<> >::value>::type std::__1::__bind<void (WebKit::GPUProcess::*&)(), WebKit::GPUProcess*>::operator()<>() 12 com.apple.WebKit 0x119cb0fc6 WTF::Detail::CallableWrapper<std::__1::__bind<void (WebKit::GPUProcess::*&)(), WebKit::GPUProcess*>, void>::call() 13 com.apple.WebKit 0x11909b0c0 WTF::Function<void ()>::operator()() const 14 com.apple.WebKit 0x1195f2876 WebCore::Timer::fired() 15 com.apple.WebCore 0x1637dd2f4 WebCore::ThreadTimers::sharedTimerFiredInternal() 16 com.apple.WebCore 0x1637e772e WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0::operator()() const 17 com.apple.WebCore 0x1637e76c6 WTF::Detail::CallableWrapper<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0, void>::call() 18 com.apple.WebCore 0x15f0206f0 WTF::Function<void ()>::operator()() const 19 com.apple.WebCore 0x16377521e WebCore::MainThreadSharedTimer::fired()
https://build.webkit.org/results/Apple-iOS-15-Simulator-Debug-WK2-Tests/r292101%20(2082)/results.html
Attachments
Crash log
(19.91 KB, text/plain)
2022-03-30 10:18 PDT
,
Simon Fraser (smfr)
no flags
Details
Patch
(5.14 KB, patch)
2022-04-08 18:02 PDT
,
Said Abou-Hallawa
no flags
Details
Formatted Diff
Diff
Patch
(7.35 KB, patch)
2022-04-11 17:48 PDT
,
Said Abou-Hallawa
no flags
Details
Formatted Diff
Diff
Alternative
(6.92 KB, patch)
2022-04-13 05:17 PDT
,
Kimmo Kinnunen
no flags
Details
Formatted Diff
Diff
Another alternative
(6.83 KB, patch)
2022-04-13 21:47 PDT
,
Said Abou-Hallawa
no flags
Details
Formatted Diff
Diff
Patch
(10.05 KB, patch)
2022-04-14 19:47 PDT
,
Said Abou-Hallawa
no flags
Details
Formatted Diff
Diff
Patch
(13.79 KB, patch)
2022-04-15 15:13 PDT
,
Said Abou-Hallawa
no flags
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2022-03-30 10:18:36 PDT
<
rdar://problem/91052033
>
Simon Fraser (smfr)
Comment 2
2022-03-30 10:24:45 PDT
Looks like a case where we access the QualifiedResourceHeap on the main thread; it should only be accessed from the thread that RemoteRenderingBackend lives on.
Said Abou-Hallawa
Comment 3
2022-04-08 18:02:20 PDT
Created
attachment 457131
[details]
Patch
Chris Dumez
Comment 4
2022-04-09 23:39:37 PDT
Comment on
attachment 457131
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=457131&action=review
> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:533 > + m_workQueue->dispatchSync([&] {
We already tried that and it caused hangs:
https://trac.webkit.org/changeset/282693/webkit
Has the code changed in ways that it is now OK?
Chris Dumez
Comment 5
2022-04-09 23:42:47 PDT
(In reply to Chris Dumez from
comment #4
)
> Comment on
attachment 457131
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=457131&action=review
> > > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:533 > > + m_workQueue->dispatchSync([&] { > > We already tried that and it caused hangs: >
https://trac.webkit.org/changeset/282693/webkit
> > Has the code changed in ways that it is now OK?
I don't remember all the details but I think the issue had to do with the fact that the rendering backend thread was stuck on an IPC Semaphore (because that's how we were passing display lists to the GPU process, at least at the time, not sure if this changed). As a result, the background thread would not process the dispatch_sync().
Kimmo Kinnunen
Comment 6
2022-04-11 06:10:36 PDT
Comment on
attachment 457131
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=457131&action=review
> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:534 > + hasActiveDrawables = m_remoteResourceCache.hasActiveDrawables();
Yeah, this is not really sound in many ways. I think a better approach would be that: 1) RemoteResourceCache acquires ScopedRenderingResourcesRequest every time it starts having active drawables. 2) RemoteRenderingBackend::allowsExitUnderMemoryPressure() is removed altogether 3) GPUConnectionToWebProcess::allowsExitUnderMemoryPressure() checks for new function ScopedRenderingResourcesRequest::hasActiveUsers() or similar Additionally ScopedRenderingResourcesRequest could be renamed to something else, like ScopedGPUProcessActivity or something. ScopedRenderingResourcesRequest could also be modified to not purge anything but just maintain s_requests
Kimmo Kinnunen
Comment 7
2022-04-11 11:05:06 PDT
Comment on
attachment 457131
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=457131&action=review
> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:535 > + });
A better thread safe implementation of this would be to have the dispatched function capture a normal binarysemaphore, see the implementation at WorkQueue::dispatchSync. (for your interest, it probably isn't good to do this sync call)
Said Abou-Hallawa
Comment 8
2022-04-11 13:50:55 PDT
Comment on
attachment 457131
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=457131&action=review
>> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:535 >> + }); > > A better thread safe implementation of this would be to have the dispatched function capture a normal binarysemaphore, see the implementation at WorkQueue::dispatchSync. > (for your interest, it probably isn't good to do this sync call)
It seems this sync call might lead to deadlock. For instance RemoteDisplayListRecorder::paintFrameForMedia(), which runs on the "RemoteRenderingBackend work queue", makes a media call using callOnMainRunLoopAndWait(). So if RemoteRenderingBackend::allowsExitUnderMemoryPressure() blocks the execution on the main run loop till the "RemoteRenderingBackend work queue" executes the call to RemoteResourceCache::hasActiveDrawables() and there is a PaintFrameForMedia message which was dispatched to the StreamConnectionWorkQueue but has not been processed yet, a deadlock will happen. The main run loop and the "RemoteRenderingBackend work queue" will both be blocked; each is waiting for the other to finish.
Said Abou-Hallawa
Comment 9
2022-04-11 17:48:14 PDT
Created
attachment 457303
[details]
Patch
Said Abou-Hallawa
Comment 10
2022-04-11 19:05:27 PDT
Although this patch may fix the assertion, I am not convinced RemoteRenderingBackend::allowsExitUnderMemoryPressure() has the right implementation. This function returns a value which is not guaranteed to stay the same even while it is used by the caller GPUConnectionToWebProcess::allowsExitUnderMemoryPressure(). The "RemoteRenderingBackend work queue" can change the state of RemoteRenderingBackend::m_remoteResourceCache while GPUConnectionToWebProcess::allowsExitUnderMemoryPressure() makes a decision based on out-dated state.
Chris Dumez
Comment 11
2022-04-11 19:12:00 PDT
(In reply to Said Abou-Hallawa from
comment #10
)
> Although this patch may fix the assertion, I am not convinced > RemoteRenderingBackend::allowsExitUnderMemoryPressure() has the right > implementation. This function returns a value which is not guaranteed to > stay the same even while it is used by the caller > GPUConnectionToWebProcess::allowsExitUnderMemoryPressure(). > > The "RemoteRenderingBackend work queue" can change the state of > RemoteRenderingBackend::m_remoteResourceCache while > GPUConnectionToWebProcess::allowsExitUnderMemoryPressure() makes a decision > based on out-dated state.
allowsExitUnderMemoryPressure() is known to be racy, that is fine. It is always racy anyway due to IPC. Worse case scenario, we'll exit the GPUProcess and the WebProcess will get notified it exited and it is needs it, it will relaunch it. Note that eager exit on memory pressure is no longer useful once we enable DOM rendering in the GPUProcess I think (since the GPUProcess will always be needed for DOM rendering). As a result, we should probably consider disabling this idle exit logic altogether when DOM rendering in GPUProcess is enabled.
Chris Dumez
Comment 12
2022-04-12 11:23:43 PDT
API test failure looks related.
Kimmo Kinnunen
Comment 13
2022-04-13 05:17:29 PDT
Created
attachment 457526
[details]
Alternative
Said Abou-Hallawa
Comment 14
2022-04-13 19:29:11 PDT
Comment on
attachment 457526
[details]
Alternative View in context:
https://bugs.webkit.org/attachment.cgi?id=457526&action=review
> Source/WebKit/ChangeLog:18 > + the count is now counted by RemoteMediaPlayerProxy creation, destruction.
I think this solution is a good alternative except it does not consider the native image an ActiveDrawable resource as the current code does. RemoteRenderingBackend calls updateRenderingResourceRequest() from RemoteRenderingBackend::createImageBufferWithQualifiedIdentifier() and RemoteRenderingBackend::releaseRemoteResourceWithQualifiedIdentifier(). It does not call it from RemoteRenderingBackend::cacheNativeImageWithQualifiedIdentifier(). Do we need to call it from cacheNativeImageWithQualifiedIdentifier()? Do we even need to consider the NativeImage an ActiveDrawable resource? Do we need to consider the Font also as an ActiveDrawable resource? What makes a resource be ActiveDrawable? RemoteRenderingBackend::allowsExitUnderMemoryPressure() was added in
r276148
. This is the original version of it: bool RemoteRenderingBackend::allowsExitUnderMemoryPressure() const { return m_remoteResourceCache.imageBuffers().isEmpty() && m_remoteResourceCache.nativeImages().isEmpty(); }
Said Abou-Hallawa
Comment 15
2022-04-13 21:31:18 PDT
In fact this bug is a regression of
r288847
. This revision removes
r282754
which was fixing exactly the same bug.
Said Abou-Hallawa
Comment 16
2022-04-13 21:47:33 PDT
Created
attachment 457594
[details]
Another alternative
Kimmo Kinnunen
Comment 17
2022-04-14 03:08:49 PDT
(In reply to Said Abou-Hallawa from
comment #14
)
> I think this solution is a good alternative except it does not consider the > native image an ActiveDrawable resource as the current code does. > RemoteRenderingBackend calls updateRenderingResourceRequest() from > RemoteRenderingBackend::createImageBufferWithQualifiedIdentifier() and > RemoteRenderingBackend::releaseRemoteResourceWithQualifiedIdentifier(). It > does not call it from > RemoteRenderingBackend::cacheNativeImageWithQualifiedIdentifier().
Right, but that's just a bug about updateRenderingResourceRequest().
Said Abou-Hallawa
Comment 18
2022-04-14 19:47:45 PDT
Created
attachment 457664
[details]
Patch
Said Abou-Hallawa
Comment 19
2022-04-14 20:48:09 PDT
I talked to Simon and we agreed on counting the ImageBuffers only when allowing or disallowing the GPUP to exit under the memory pressure. Counting the NativeImages should not affect this decision because they are remotely cached only when they are drawn on the RemoteImageBuffers. The life cycle of the NativeImage in GPUP is different from the life cycle of the ImageBuffer. The NativeImages in GPUP are released only when the counterpart NativeImages in WebProcess are released. So allowing the GPUP process to exit when no ImageBuffer is cached but there are NativeImages cached should be fine.
Simon Fraser (smfr)
Comment 20
2022-04-15 14:10:37 PDT
Comment on
attachment 457664
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=457664&action=review
> Source/WebKit/ChangeLog:19 > + Remove RemoteMediaPlayerManagerProxy::allowsExitUnderMemoryPressure(), > + the count is now counted by RemoteMediaPlayerProxy creation, destruction.
I don't understand this part. Where does RemoteMediaPlayerManagerProxy use ScopedRenderingResourcesRequest?
Said Abou-Hallawa
Comment 21
2022-04-15 15:13:41 PDT
Created
attachment 457727
[details]
Patch
Said Abou-Hallawa
Comment 22
2022-04-15 15:18:39 PDT
(In reply to Said Abou-Hallawa from
comment #21
)
> Created
attachment 457727
[details]
> Patch
In this patch I added a nicer approach which Simon suggested. RemoteImageBuffer will have ScopedRenderingResourcesRequest. So we are going to count the active ImageBuffers and not the RemoteRenderingBackends which have active ImageBuffers as we used to do before.
EWS
Comment 23
2022-04-18 11:31:24 PDT
Committed
r292961
(
249726@main
): <
https://commits.webkit.org/249726@main
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 457727
[details]
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug