WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
31615
REGRESSION(
r50919
) Introduced a non-thread-safe refcounting in ScriptExecutionContext::postTaskToMainThread
https://bugs.webkit.org/show_bug.cgi?id=31615
Summary
REGRESSION(r50919) Introduced a non-thread-safe refcounting in ScriptExecutio...
Dmitry Titov
Reported
2009-11-17 22:24:21 PST
Using RefPtr on both sides of thread boundary have caused significant part of recent flakiness of xmlhttprequest/workers tests. Need to remove RefPtr as it was implemented in Document before
r50919
. This explains at least some flakiness captured in
bug 30835
.
Attachments
Proposed patch
(3.18 KB, patch)
2009-11-17 22:57 PST
,
Dmitry Titov
dimich
: commit-queue-
Details
Formatted Diff
Diff
Same patch, Added comment for the ASSERT.
(3.48 KB, patch)
2009-11-17 23:20 PST
,
Dmitry Titov
dimich
: commit-queue-
Details
Formatted Diff
Diff
Revert r50919 that has caused the regression.
(4.89 KB, patch)
2009-11-18 12:13 PST
,
Dmitry Titov
eric
: review+
dimich
: commit-queue-
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Dmitry Titov
Comment 1
2009-11-17 22:57:44 PST
Created
attachment 43406
[details]
Proposed patch
Eric Seidel (no email)
Comment 2
2009-11-17 23:08:32 PST
If we add the mainThread() assert, we'll need to at least add a comment there.
Dmitry Titov
Comment 3
2009-11-17 23:20:15 PST
Created
attachment 43408
[details]
Same patch, Added comment for the ASSERT. Added comment for the ASSERT.
Eric Seidel (no email)
Comment 4
2009-11-18 01:35:38 PST
Comment on
attachment 43408
[details]
Same patch, Added comment for the ASSERT. So this is OK if we never plan to use anything TreeShared from a non-main thread, which is probably the case since Nodes are the only TreeShared types iirc. This is OK with me. Others should see this go by and comment. However since this may be the cause of our recent bout of bot crashes, it would be nice to get this in quickly so we can work on finding the real cause behind
bug 30835
. Please make sure you've run the layout tests in debug mode before landing this. I think we may need to solve the FIXME before landing, or at least shortly after: + // FIXME: Find out what guarantees that context is still alive when task is executed. + // If it is not so, it is another source of random crashes. We need to understand why a raw pointer there is safe, and if it is, document it. Would it be easy to write a layout test which caused this crash w/o needing to be run 100 times?
Darin Adler
Comment 5
2009-11-18 09:40:38 PST
(In reply to
comment #4
)
> (From update of
attachment 43408
[details]
) > So this is OK if we never plan to use anything TreeShared from a non-main > thread, which is probably the case since Nodes are the only TreeShared types > iirc.
The assertion is great. I would also do the assertion in all the other TreeShared member functions: The constructor, destructor, setParent, parent too. It's never safe to read or write these on other threads -- obviously we can't add such assertions to every single access to any field of any DOM node, but if there was an efficient way to do so we would! The practical way to get this right for the many other reference counted objects is by adding machinery in RefCounted to make it fire an assert when the ref is done from the wrong thread too. For RefCounted, we'll need to add a function to disable this for the few objects that are used cross-thread such as strings that are copied for that reason. The vast majority of RefCounted can either assert they are used only on the main thread or only on the same thread they were created on. Adding such assertions will catch problems like this in a way that makes them much easier to notice and understand. And a function to turn them off for certain objects is likely to be a practical way to do this without getting in the way of work on things like workers.
Darin Adler
Comment 6
2009-11-18 09:51:23 PST
Comment on
attachment 43408
[details]
Same patch, Added comment for the ASSERT.
> + // This is not a RefPtr because it would require ScriptExecutionContext-derived classes to be > + // refcountable in a thread-safe manner which is overkill because they always are created and deleted on > + // the same thread. > + // FIXME: Find out what guarantees that context is still alive when task is executed. > + // If it is not so, it is another source of random crashes.
This comment seems way too big. Mentioning "source of random crashes" seems too vague to me, but your FIXME is right on. I have reviewed the clients and I can tell you right now there is no guarantee the context is still alive when the task is executed. The design here is broken. The code has this idea that we can post a task for later execution on the main thread and have the document passed to the task, given that we can't add a ref to the document to guarantee it still still be around. That simply can't be done. So I think we need to do something different here. This patch trades one bug for another, but doesn't fix it, as you suspect. I don't see any way ScriptExecutionContext::postTaskToMainThread can work. It makes a promise it can't keep, that it can defer work to the main thread for a document, without being given a ref to that document, which is normally needed for such things. Either we need to already have a ref to the document, or the document needs a thread-safe way to cancel already-posted tasks. I don't know what to say about the patch. It's definitely no worse with the patch than without it, but the bug is still there.
Alexey Proskuryakov
Comment 7
2009-11-18 10:14:40 PST
> I don't see any way ScriptExecutionContext::postTaskToMainThread can work. It > makes a promise it can't keep, that it can defer work to the main thread for a > document, without being given a ref to that document, which is normally needed > for such things. Either we need to already have a ref to the document, or the > document needs a thread-safe way to cancel already-posted tasks.
The original idea was that tasks could only be sent via this mechanism when there was activity in a worker. So, worker messaging proxy kept the document alive. It seems that the mechanism has been generalized since then - I'm not sure if the guarantee still holds in practice, but it's certainly possible to abuse postTaskToMainThread in a way that makes it void.
Dmitry Titov
Comment 8
2009-11-18 10:36:48 PST
Thanks guys! I think we should have a way to remove tasks from the 'post to main thread' queue when Document is destroyed or earlier, since even if we have thread-safe ref it's unclear if it makes sense to run a task on a document that is only alive because it still have refcount > 0. I think this worked before w/o many problems because in practice there are always other 'tasks' that has to be also delivered via main thread's run loop before Document will be destroyed so there is some serialization of work. It is not a guarantee though so I agree this could use more thought. I can see 2 issues here: 1. Recent surge in test flakiness caused by me adding a RefPtr - the attached patch reverts this back to state as existed before
http://trac.webkit.org/changeset/50919
. If others think this is an ok intermediate solution, we could land the patch, with suggested modifications. Alternatively I could simply revert
r50919
. 2. postTaskToMainThread design that has to be redone. I'll be glad to give it a try and iterate on something that could be better. It could take a few days though.
Darin Adler
Comment 9
2009-11-18 10:39:19 PST
For simplicity, I am calling it "document", rather than "script execution context". (In reply to
comment #7
)
> The original idea was that tasks could only be sent via this mechanism when > there was activity in a worker. So, worker messaging proxy kept the document > alive.
That would be fine as long as the task doesn't get passed a reference to the document. The issue is really that the task passes a reference to the document, but can't guarantee that reference is good. If the task has its own pointer to the document, there is no problem.
> It seems that the mechanism has been generalized since then - I'm not sure if > the guarantee still holds in practice, but it's certainly possible to abuse > postTaskToMainThread in a way that makes it void.
The only benefit this seems to provide is callOnMainThread plus a delete call. There's no additional service provided here.
Alexey Proskuryakov
Comment 10
2009-11-18 11:06:51 PST
> That would be fine as long as the task doesn't get passed a reference to the > document.
A document cannot go away while there are (dedicated?) workers present, or any communication pending with them.
Dmitry Titov
Comment 11
2009-11-18 12:13:49 PST
Created
attachment 43444
[details]
Revert
r50919
that has caused the regression. The easiest way to deal with recent regression seems to revert the recent change. Created
bug 31633
for 'dangling Document pointer' issue discussed here.
Eric Seidel (no email)
Comment 12
2009-11-18 12:16:39 PST
Comment on
attachment 43444
[details]
Revert
r50919
that has caused the regression. If this is just a revert rs=me. I also think we'll need separate patches to add the ASSERTs to TreeShared and possibly RefCounted. Shared workers might still be the underlying cause of
bug 30835
, which started sometime before
r50171
, possibly due to another shared worker change.
Oliver Hunt
Comment 13
2009-11-18 12:27:30 PST
(In reply to
comment #12
)
> (From update of
attachment 43444
[details]
) > If this is just a revert rs=me. > > I also think we'll need separate patches to add the ASSERTs to TreeShared and > possibly RefCounted. > > Shared workers might still be the underlying cause of
bug 30835
, which started > sometime before
r50171
, possibly due to another shared worker change.
What assertion can we do with RefCounted? RefCounted can (and is) legitimately used on non-main threads, maybe refcounted would be expected to maintain a record of what thread it expects to be used on?
Eric Seidel (no email)
Comment 14
2009-11-18 12:31:31 PST
(In reply to
comment #13
)
> What assertion can we do with RefCounted? RefCounted can (and is) legitimately > used on non-main threads, maybe refcounted would be expected to maintain a > record of what thread it expects to be used on?
Yes. See
comment 5
.
Dmitry Titov
Comment 15
2009-11-18 12:44:36 PST
Landed:
http://trac.webkit.org/changeset/51127
Created
bug 31637
and
bug 31639
to add asserts in TreeShared and RefCounted.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug