WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
197558
REGRESSION(
r243197
): [GStreamer] Web process hangs when scrolling twitter timeline which contains HLS videos
https://bugs.webkit.org/show_bug.cgi?id=197558
Summary
REGRESSION(r243197): [GStreamer] Web process hangs when scrolling twitter tim...
Jérémy Lal
Reported
2019-05-03 06:58:03 PDT
On debian/testing, webkit2gtk 2.24, scrolling down a timeline containing video(s) result in web process hanging. I suspect it happens when the video is destroyed after scrolling past it, but i might be wrong. I tried removing some extra gstreamer 1.14 plugins but i got the same result.
Attachments
Patch
(16.96 KB, patch)
2019-05-09 04:31 PDT
,
Philippe Normand
calvaris
: commit-queue-
Details
Formatted Diff
Diff
Archive of layout-test-results from ews215 for win-future
(13.59 MB, application/zip)
2019-05-09 08:33 PDT
,
EWS Watchlist
no flags
Details
WiP patch
(29.14 KB, patch)
2019-05-31 03:52 PDT
,
Philippe Normand
no flags
Details
Formatted Diff
Diff
Patch
(11.54 KB, patch)
2019-07-04 08:35 PDT
,
Charlie Turner
no flags
Details
Formatted Diff
Diff
Patch for landing
(15.29 KB, patch)
2019-07-08 03:34 PDT
,
Charlie Turner
no flags
Details
Formatted Diff
Diff
Patch for landing
(11.56 KB, patch)
2019-07-08 03:36 PDT
,
Charlie Turner
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Jérémy Lal
Comment 1
2019-05-03 06:59:08 PDT
Also it did not happen with webkit2gtk 2.22.7.
Philippe Normand
Comment 2
2019-05-07 03:37:04 PDT
Please provide a URL test-case. By 2.24, do you mean 2.24.0 or 2.24.1 ?
Jérémy Lal
Comment 3
2019-05-07 03:40:11 PDT
2.24.1, the one currently in debian/testing.
https://twitter.com/search?q=video&src=typd
and scroll down until it hangs.
Philippe Normand
Comment 4
2019-05-07 04:12:54 PDT
Ok, this is a deadlock between adaptivedemux and the webkithttpsrc element used for HLS manifest or fragments downloading. I'm investigating this issue.
Philippe Normand
Comment 5
2019-05-09 04:31:09 PDT
Created
attachment 369484
[details]
Patch
Xabier Rodríguez Calvar
Comment 6
2019-05-09 06:31:18 PDT
Comment on
attachment 369484
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=369484&action=review
> Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:73 > + GST_CAT_LEVEL_LOG(webkit_media_player_debug, GST_LEVEL_WARNING, m_object.get(), "Notifier was invalidated, bailing out");
Do you think this notification should never happen? I don't think it's so weird that a task has been enqueued, notifier becomes invalid and the task is run. Wouldn't it be interesting to make this at least INFO?
> Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:97 > + m_synchronousNotificationCondition.wait(m_synchronousNotificationMutex);
Considering we are waking up everybody below, shouldn't we check for the notification flag to be still active? Maybe a conditional wait?
> Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:158 > + GRefPtr<GstObject> m_object;
I think this should be called m_gstObject.
> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:614 > + GST_DEBUG_OBJECT(src, "Closing session, loader: %p, resource: %p", priv->loader.get(), priv->resource.get());
You could put this as else of the next if
> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:649 > + return TRUE;
Shouldn't we unref the event before this here?
Philippe Normand
Comment 7
2019-05-09 06:38:58 PDT
Comment on
attachment 369484
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=369484&action=review
>> Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:73 >> + GST_CAT_LEVEL_LOG(webkit_media_player_debug, GST_LEVEL_WARNING, m_object.get(), "Notifier was invalidated, bailing out"); > > Do you think this notification should never happen? I don't think it's so weird that a task has been enqueued, notifier becomes invalid and the task is run. Wouldn't it be interesting to make this at least INFO?
Yes, makes sense :)
>> Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:97 >> + m_synchronousNotificationCondition.wait(m_synchronousNotificationMutex); > > Considering we are waking up everybody below, shouldn't we check for the notification flag to be still active? Maybe a conditional wait?
Hum, I'm not sure ... I'll try.
>> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:649 >> + return TRUE; > > Shouldn't we unref the event before this here?
No, gst_base_src_event() does it for us.
Philippe Normand
Comment 8
2019-05-09 07:20:49 PDT
Comment on
attachment 369484
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=369484&action=review
>>> Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:97 >>> + m_synchronousNotificationCondition.wait(m_synchronousNotificationMutex); >> >> Considering we are waking up everybody below, shouldn't we check for the notification flag to be still active? Maybe a conditional wait? > > Hum, I'm not sure ... I'll try.
Actually I'm not sure I understood what you meant. By notification flag, do you mean m_pendingNotifications? If so, I think the existing !isMainThread() test before the wait is more than enough already :)
EWS Watchlist
Comment 9
2019-05-09 08:33:09 PDT
Comment on
attachment 369484
[details]
Patch
Attachment 369484
[details]
did not pass win-ews (win): Output:
https://webkit-queues.webkit.org/results/12143904
New failing tests: security/contentSecurityPolicy/video-with-file-url-allowed-by-media-src-star-with-AllowContentSecurityPolicySourceStarToMatchAnyProtocol-enabled.html storage/indexeddb/modern/gc-closes-database.html http/tests/css/filters-on-iframes.html
EWS Watchlist
Comment 10
2019-05-09 08:33:11 PDT
Created
attachment 369493
[details]
Archive of layout-test-results from ews215 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews215 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Philippe Normand
Comment 11
2019-05-09 08:44:26 PDT
False positive (again).
Xabier Rodríguez Calvar
Comment 12
2019-05-09 23:06:06 PDT
Comment on
attachment 369484
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=369484&action=review
> Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:91 > + m_synchronousNotificationCondition.notifyOne();
While answering your comment below I realized this could be an issue as well because of having now shared lock and condition. Imagine you have two pending notifications, you could be waking the wrong one here, couldn't you? I think a conditional wait below could fix this as well, couldn't it?
>>>> Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:97 >>>> + m_synchronousNotificationCondition.wait(m_synchronousNotificationMutex); >>> >>> Considering we are waking up everybody below, shouldn't we check for the notification flag to be still active? Maybe a conditional wait? >> >> Hum, I'm not sure ... I'll try. > > Actually I'm not sure I understood what you meant. By notification flag, do you mean m_pendingNotifications? If so, I think the existing !isMainThread() test before the wait is more than enough already :)
Yes, I mean to check the m_pendingNotifications flags. The use case I am thinking of is you cancel a certain notification type but then you get notified for all other types and wake them here and shouldn't they be still waiting?
Philippe Normand
Comment 13
2019-05-10 02:09:25 PDT
There's another issue to address, 2 webkitwebsrc operating from different threads, both requiring access to the main thread at the same time, leading to another deadlock...
Philippe Normand
Comment 14
2019-05-31 03:52:48 PDT
Created
attachment 371048
[details]
WiP patch
Charlie Turner
Comment 15
2019-06-05 13:02:14 PDT
I spent a lot of time debugging this, and it seems like we can't achieve deadlock freedom with the design of the adaptive demuxer, and the constraint that all our network operations happen on the main thread. I tried using the AbortableTaskQueue abstraction instead of the MainThreadNotifier, which seemed in the same vain as philn's changes to the MainThreadNotifier, but there's no clear time when a startAborting phase is going to guarantee adaptivedemux doesn't kick off new get_range requests at an unfortunate time for us. It's not just in the destruction phase of the player when this can happen, it can also happen in changePipelineState calls to READY as well. I tried making the WebSrc behave asynchronously, such that no state change will block (see gst_base_src_set_async). That solves the MediaPlayer taking the main thread during shutdown and blocking on the fragment downloader threads, but there are other issues with this approach that caused it to be a dead-end that are internal to GStreamer. I tried adding signal handlers to collect all created websrcs inside the player, such that at shutdown time, before we issue a change_state to NULL, we can forcibly close all network sessions on the main thread before shutting the rest of the pipeline down. This helps a lot, but it doesn't stop the tricky case of adaptivedemux's EOS handling. When the websrc completes the download of a manifest, adaptive demux uses the EOS on the manifest file to signal it should start new srcs to download the fragments. This can happen, due to bad timing, at shutdown and render the careful collection of websrcs in the player pointless. I tried a few variations on philn's idea of adding timeouts to some of the conditional waits. These kinds of solutions can get real tricky, and indeed I didn't manage to eliminate the deadlocks, they were just a little more flakey rather than easy to trigger. I tried sending EOS's and/or GST_MESSAGE_ERRORS to the uridownloader, to try and get it to unblock itself in the range requests at the correct times. These are the only ways to signal that element that it should interrupt it's blocking operations. Again, improvements, but not complete solutions. The fundamental problem here is that the EOS handler takes an API_LOCK in the manifest, which blocks any state changes happening until fragments are downloaded (state changes also need the API lock). If, during the network request window, the player is destroyed, which can happen whenever with a scrolling view of players, the main thread is taken over and can not complete it's state change, and the network request completion signal never makes it back to the main thread. Deadlock. So, I only see three options now, 1) Someone else takes a look and hopefully spots a functional approach to this issue 2) We revert using webkitwebsrc in the adaptivedemuxers (bye sandboxing) 3) We stop advertising support for application/x-hls. I'd like for 1) to happen :), 2) is sad times, and 3) is not that crazy but seemingly a bit impractical. Chrome & FF don't support HLS natively AFAICT, what happens on Twitter is a JS library does the HLS protocol and exposes it as MSE blobs. Unfortunately, for reasons I didn't debug, when you fake the user agent as FF, no videos play. I suspect either an issue in our MSE player, or something FF specific in the HLS library twitter is using. Unfortunately because we're a WebKit, most sites will want to using the native HLS support, because Safari supports it well I assume.
Michael Catanzaro
Comment 16
2019-06-06 07:35:27 PDT
(In reply to Charlie Turner from
comment #15
)
> 2) We revert using webkitwebsrc in the adaptivedemuxers (bye sandboxing)
That would also reintroduce CVE-2019-11070.
Michael Catanzaro
Comment 17
2019-06-14 06:36:06 PDT
***
Bug 198857
has been marked as a duplicate of this bug. ***
Charlie Turner
Comment 18
2019-06-18 03:22:59 PDT
***
Bug 198958
has been marked as a duplicate of this bug. ***
Philippe Normand
Comment 19
2019-06-27 04:14:38 PDT
***
Bug 199217
has been marked as a duplicate of this bug. ***
Philippe Normand
Comment 20
2019-06-29 01:33:52 PDT
***
Bug 199287
has been marked as a duplicate of this bug. ***
Charlie Turner
Comment 21
2019-07-04 08:35:55 PDT
Created
attachment 373468
[details]
Patch
Charlie Turner
Comment 22
2019-07-04 08:36:37 PDT
Above patch also a bit of a WIP, since the upstream GStreamer review has not begun yet.
Michael Catanzaro
Comment 23
2019-07-04 09:01:09 PDT
Charlie, you're a hero! But what is the plan for older versions of GStreamer? I think it's OK to regress functionality if GStreamer isn't new enough, but of course not OK to deadlock.
Charlie Turner
Comment 24
2019-07-04 09:08:46 PDT
(In reply to Michael Catanzaro from
comment #23
)
> Charlie, you're a hero! But what is the plan for older versions of GStreamer? > > I think it's OK to regress functionality if GStreamer isn't new enough, but > of course not OK to deadlock.
Thanks :blush: but lets see how it settles with users, hopefully I can get some testers here with the patch. I think we have to just disable HLS support for now until the GStreamer patch makes its way to users through the distro channels. Especially on Twitter this is going to be painful for those users who enjoy watching videos on that site.
Charlie Turner
Comment 25
2019-07-08 03:34:37 PDT
Created
attachment 373621
[details]
Patch for landing
Charlie Turner
Comment 26
2019-07-08 03:36:11 PDT
Created
attachment 373622
[details]
Patch for landing
WebKit Commit Bot
Comment 27
2019-07-08 11:16:14 PDT
Comment on
attachment 373622
[details]
Patch for landing Clearing flags on attachment: 373622 Committed
r247215
: <
https://trac.webkit.org/changeset/247215
>
WebKit Commit Bot
Comment 28
2019-07-08 11:16:16 PDT
All reviewed patches have been landed. Closing bug.
Rich Coe
Comment 29
2019-07-14 08:44:50 PDT
the patch is working for me.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug