RESOLVED FIXED 238558
REGRESSION(r288847): GPU Process crash under GPUProcess::canExitUnderMemoryPressure()
https://bugs.webkit.org/show_bug.cgi?id=238558
Summary REGRESSION(r288847): GPU Process crash under GPUProcess::canExitUnderMemoryPr...
Simon Fraser (smfr)
Reported 2022-03-30 10:18:13 PDT
Created attachment 456139 [details] Crash log iOS layout tests show a crash with this backtrace: Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 com.apple.JavaScriptCore 0x136c83ace WTFCrash 1 com.apple.WebKit 0x11903c278 WTFCrashWithInfo(int, char const*, char const*, int) 2 com.apple.WebKit 0x119c90970 WebKit::QualifiedResourceHeap::checkInvariants() const 3 com.apple.WebKit 0x119c996f6 WebKit::QualifiedResourceHeap::hasImageBuffer() const 4 com.apple.WebKit 0x119c83f76 WebKit::RemoteResourceCache::hasActiveDrawables() const 5 com.apple.WebKit 0x119c71346 WebKit::RemoteRenderingBackend::allowsExitUnderMemoryPressure() const 6 com.apple.WebKit 0x119c70dc0 WebKit::GPUConnectionToWebProcess::allowsExitUnderMemoryPressure() const 7 com.apple.WebKit 0x119c76640 WebKit::GPUProcess::canExitUnderMemoryPressure() const 8 com.apple.WebKit 0x119c756fc WebKit::GPUProcess::tryExitIfUnused() 9 com.apple.WebKit 0x119cb10fc decltype(*(std::__1::forward<WebKit::GPUProcess*&>(fp0)).*fp()) std::__1::__invoke<void (WebKit::GPUProcess::*&)(), WebKit::GPUProcess*&, void>(void (WebKit::GPUProcess::*&)(), WebKit::GPUProcess*&) 10 com.apple.WebKit 0x119cb107a std::__1::__bind_return<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, std::__1::tuple<>, __is_valid_bind_return<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, std::__1::tuple<> >::value>::type std::__1::__apply_functor<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, 0ul, std::__1::tuple<> >(void (WebKit::GPUProcess::*&)(), std::__1::tuple<WebKit::GPUProcess*>&, std::__1::__tuple_indices<0ul>, std::__1::tuple<>&&) 11 com.apple.WebKit 0x119cb102e std::__1::__bind_return<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, std::__1::tuple<>, __is_valid_bind_return<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, std::__1::tuple<> >::value>::type std::__1::__bind<void (WebKit::GPUProcess::*&)(), WebKit::GPUProcess*>::operator()<>() 12 com.apple.WebKit 0x119cb0fc6 WTF::Detail::CallableWrapper<std::__1::__bind<void (WebKit::GPUProcess::*&)(), WebKit::GPUProcess*>, void>::call() 13 com.apple.WebKit 0x11909b0c0 WTF::Function<void ()>::operator()() const 14 com.apple.WebKit 0x1195f2876 WebCore::Timer::fired() 15 com.apple.WebCore 0x1637dd2f4 WebCore::ThreadTimers::sharedTimerFiredInternal() 16 com.apple.WebCore 0x1637e772e WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0::operator()() const 17 com.apple.WebCore 0x1637e76c6 WTF::Detail::CallableWrapper<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0, void>::call() 18 com.apple.WebCore 0x15f0206f0 WTF::Function<void ()>::operator()() const 19 com.apple.WebCore 0x16377521e WebCore::MainThreadSharedTimer::fired() https://build.webkit.org/results/Apple-iOS-15-Simulator-Debug-WK2-Tests/r292101%20(2082)/results.html
Attachments
Crash log (19.91 KB, text/plain)
2022-03-30 10:18 PDT, Simon Fraser (smfr)
no flags
Patch (5.14 KB, patch)
2022-04-08 18:02 PDT, Said Abou-Hallawa
no flags
Patch (7.35 KB, patch)
2022-04-11 17:48 PDT, Said Abou-Hallawa
no flags
Alternative (6.92 KB, patch)
2022-04-13 05:17 PDT, Kimmo Kinnunen
no flags
Another alternative (6.83 KB, patch)
2022-04-13 21:47 PDT, Said Abou-Hallawa
no flags
Patch (10.05 KB, patch)
2022-04-14 19:47 PDT, Said Abou-Hallawa
no flags
Patch (13.79 KB, patch)
2022-04-15 15:13 PDT, Said Abou-Hallawa
no flags
Radar WebKit Bug Importer
Comment 1 2022-03-30 10:18:36 PDT
Simon Fraser (smfr)
Comment 2 2022-03-30 10:24:45 PDT
Looks like a case where we access the QualifiedResourceHeap on the main thread; it should only be accessed from the thread that RemoteRenderingBackend lives on.
Said Abou-Hallawa
Comment 3 2022-04-08 18:02:20 PDT
Chris Dumez
Comment 4 2022-04-09 23:39:37 PDT
Comment on attachment 457131 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457131&action=review > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:533 > + m_workQueue->dispatchSync([&] { We already tried that and it caused hangs: https://trac.webkit.org/changeset/282693/webkit Has the code changed in ways that it is now OK?
Chris Dumez
Comment 5 2022-04-09 23:42:47 PDT
(In reply to Chris Dumez from comment #4) > Comment on attachment 457131 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=457131&action=review > > > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:533 > > + m_workQueue->dispatchSync([&] { > > We already tried that and it caused hangs: > https://trac.webkit.org/changeset/282693/webkit > > Has the code changed in ways that it is now OK? I don't remember all the details but I think the issue had to do with the fact that the rendering backend thread was stuck on an IPC Semaphore (because that's how we were passing display lists to the GPU process, at least at the time, not sure if this changed). As a result, the background thread would not process the dispatch_sync().
Kimmo Kinnunen
Comment 6 2022-04-11 06:10:36 PDT
Comment on attachment 457131 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457131&action=review > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:534 > + hasActiveDrawables = m_remoteResourceCache.hasActiveDrawables(); Yeah, this is not really sound in many ways. I think a better approach would be that: 1) RemoteResourceCache acquires ScopedRenderingResourcesRequest every time it starts having active drawables. 2) RemoteRenderingBackend::allowsExitUnderMemoryPressure() is removed altogether 3) GPUConnectionToWebProcess::allowsExitUnderMemoryPressure() checks for new function ScopedRenderingResourcesRequest::hasActiveUsers() or similar Additionally ScopedRenderingResourcesRequest could be renamed to something else, like ScopedGPUProcessActivity or something. ScopedRenderingResourcesRequest could also be modified to not purge anything but just maintain s_requests
Kimmo Kinnunen
Comment 7 2022-04-11 11:05:06 PDT
Comment on attachment 457131 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457131&action=review > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:535 > + }); A better thread safe implementation of this would be to have the dispatched function capture a normal binarysemaphore, see the implementation at WorkQueue::dispatchSync. (for your interest, it probably isn't good to do this sync call)
Said Abou-Hallawa
Comment 8 2022-04-11 13:50:55 PDT
Comment on attachment 457131 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457131&action=review >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:535 >> + }); > > A better thread safe implementation of this would be to have the dispatched function capture a normal binarysemaphore, see the implementation at WorkQueue::dispatchSync. > (for your interest, it probably isn't good to do this sync call) It seems this sync call might lead to deadlock. For instance RemoteDisplayListRecorder::paintFrameForMedia(), which runs on the "RemoteRenderingBackend work queue", makes a media call using callOnMainRunLoopAndWait(). So if RemoteRenderingBackend::allowsExitUnderMemoryPressure() blocks the execution on the main run loop till the "RemoteRenderingBackend work queue" executes the call to RemoteResourceCache::hasActiveDrawables() and there is a PaintFrameForMedia message which was dispatched to the StreamConnectionWorkQueue but has not been processed yet, a deadlock will happen. The main run loop and the "RemoteRenderingBackend work queue" will both be blocked; each is waiting for the other to finish.
Said Abou-Hallawa
Comment 9 2022-04-11 17:48:14 PDT
Said Abou-Hallawa
Comment 10 2022-04-11 19:05:27 PDT
Although this patch may fix the assertion, I am not convinced RemoteRenderingBackend::allowsExitUnderMemoryPressure() has the right implementation. This function returns a value which is not guaranteed to stay the same even while it is used by the caller GPUConnectionToWebProcess::allowsExitUnderMemoryPressure(). The "RemoteRenderingBackend work queue" can change the state of RemoteRenderingBackend::m_remoteResourceCache while GPUConnectionToWebProcess::allowsExitUnderMemoryPressure() makes a decision based on out-dated state.
Chris Dumez
Comment 11 2022-04-11 19:12:00 PDT
(In reply to Said Abou-Hallawa from comment #10) > Although this patch may fix the assertion, I am not convinced > RemoteRenderingBackend::allowsExitUnderMemoryPressure() has the right > implementation. This function returns a value which is not guaranteed to > stay the same even while it is used by the caller > GPUConnectionToWebProcess::allowsExitUnderMemoryPressure(). > > The "RemoteRenderingBackend work queue" can change the state of > RemoteRenderingBackend::m_remoteResourceCache while > GPUConnectionToWebProcess::allowsExitUnderMemoryPressure() makes a decision > based on out-dated state. allowsExitUnderMemoryPressure() is known to be racy, that is fine. It is always racy anyway due to IPC. Worse case scenario, we'll exit the GPUProcess and the WebProcess will get notified it exited and it is needs it, it will relaunch it. Note that eager exit on memory pressure is no longer useful once we enable DOM rendering in the GPUProcess I think (since the GPUProcess will always be needed for DOM rendering). As a result, we should probably consider disabling this idle exit logic altogether when DOM rendering in GPUProcess is enabled.
Chris Dumez
Comment 12 2022-04-12 11:23:43 PDT
API test failure looks related.
Kimmo Kinnunen
Comment 13 2022-04-13 05:17:29 PDT
Created attachment 457526 [details] Alternative
Said Abou-Hallawa
Comment 14 2022-04-13 19:29:11 PDT
Comment on attachment 457526 [details] Alternative View in context: https://bugs.webkit.org/attachment.cgi?id=457526&action=review > Source/WebKit/ChangeLog:18 > + the count is now counted by RemoteMediaPlayerProxy creation, destruction. I think this solution is a good alternative except it does not consider the native image an ActiveDrawable resource as the current code does. RemoteRenderingBackend calls updateRenderingResourceRequest() from RemoteRenderingBackend::createImageBufferWithQualifiedIdentifier() and RemoteRenderingBackend::releaseRemoteResourceWithQualifiedIdentifier(). It does not call it from RemoteRenderingBackend::cacheNativeImageWithQualifiedIdentifier(). Do we need to call it from cacheNativeImageWithQualifiedIdentifier()? Do we even need to consider the NativeImage an ActiveDrawable resource? Do we need to consider the Font also as an ActiveDrawable resource? What makes a resource be ActiveDrawable? RemoteRenderingBackend::allowsExitUnderMemoryPressure() was added in r276148. This is the original version of it: bool RemoteRenderingBackend::allowsExitUnderMemoryPressure() const { return m_remoteResourceCache.imageBuffers().isEmpty() && m_remoteResourceCache.nativeImages().isEmpty(); }
Said Abou-Hallawa
Comment 15 2022-04-13 21:31:18 PDT
In fact this bug is a regression of r288847. This revision removes r282754 which was fixing exactly the same bug.
Said Abou-Hallawa
Comment 16 2022-04-13 21:47:33 PDT
Created attachment 457594 [details] Another alternative
Kimmo Kinnunen
Comment 17 2022-04-14 03:08:49 PDT
(In reply to Said Abou-Hallawa from comment #14) > I think this solution is a good alternative except it does not consider the > native image an ActiveDrawable resource as the current code does. > RemoteRenderingBackend calls updateRenderingResourceRequest() from > RemoteRenderingBackend::createImageBufferWithQualifiedIdentifier() and > RemoteRenderingBackend::releaseRemoteResourceWithQualifiedIdentifier(). It > does not call it from > RemoteRenderingBackend::cacheNativeImageWithQualifiedIdentifier(). Right, but that's just a bug about updateRenderingResourceRequest().
Said Abou-Hallawa
Comment 18 2022-04-14 19:47:45 PDT
Said Abou-Hallawa
Comment 19 2022-04-14 20:48:09 PDT
I talked to Simon and we agreed on counting the ImageBuffers only when allowing or disallowing the GPUP to exit under the memory pressure. Counting the NativeImages should not affect this decision because they are remotely cached only when they are drawn on the RemoteImageBuffers. The life cycle of the NativeImage in GPUP is different from the life cycle of the ImageBuffer. The NativeImages in GPUP are released only when the counterpart NativeImages in WebProcess are released. So allowing the GPUP process to exit when no ImageBuffer is cached but there are NativeImages cached should be fine.
Simon Fraser (smfr)
Comment 20 2022-04-15 14:10:37 PDT
Comment on attachment 457664 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457664&action=review > Source/WebKit/ChangeLog:19 > + Remove RemoteMediaPlayerManagerProxy::allowsExitUnderMemoryPressure(), > + the count is now counted by RemoteMediaPlayerProxy creation, destruction. I don't understand this part. Where does RemoteMediaPlayerManagerProxy use ScopedRenderingResourcesRequest?
Said Abou-Hallawa
Comment 21 2022-04-15 15:13:41 PDT
Said Abou-Hallawa
Comment 22 2022-04-15 15:18:39 PDT
(In reply to Said Abou-Hallawa from comment #21) > Created attachment 457727 [details] > Patch In this patch I added a nicer approach which Simon suggested. RemoteImageBuffer will have ScopedRenderingResourcesRequest. So we are going to count the active ImageBuffers and not the RemoteRenderingBackends which have active ImageBuffers as we used to do before.
EWS
Comment 23 2022-04-18 11:31:24 PDT
Committed r292961 (249726@main): <https://commits.webkit.org/249726@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 457727 [details].
Note You need to log in before you can comment on or make changes to this bug.