Bug 243410 - REGRESSION(250037@main): wpt /service-workers/service-worker/registration-updateviacache.https.html has become flaky
Summary: REGRESSION(250037@main): wpt /service-workers/service-worker/registration-upd...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Service Workers (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Chris Dumez
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2022-08-01 10:27 PDT by Sam Sneddon [:gsnedders]
Modified: 2022-08-05 10:09 PDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sam Sneddon [:gsnedders] 2022-08-01 10:27:27 PDT
See https://wpt.fyi/results/service-workers/service-worker/registration-updateviacache.https.html?label=master&product=safari-15%5Bstable%5D&product=safari%5Bexperimental%5D&aligned&view=subtest

A variety of subtests have regressed:

register-with-updateViaCache-all
register-with-updateViaCache-undefined-then-all
register-with-updateViaCache-imports-then-all
register-with-updateViaCache-none-then-all

https://github.com/WebKit/WebKit/blob/main/LayoutTests/imported/w3c/web-platform-tests/service-workers/service-worker/registration-updateviacache.https-expected.txt makes it look like our test configuration passes there.
Comment 1 Radar WebKit Bug Importer 2022-08-01 10:27:41 PDT
<rdar://problem/97921749>
Comment 2 Chris Dumez 2022-08-04 11:53:05 PDT
The test is marked as flaky on iOS:
LayoutTests/platform/ios/TestExpectations:webkit.org/b/240074 imported/w3c/web-platform-tests/service-workers/service-worker/registration-updateviacache.https.html [ Pass Failure ]

But seems to be running on macOS.
Comment 3 Chris Dumez 2022-08-04 12:02:50 PDT
Similarly, in STP 150, I see all subtests passing expect the last one. Not sure why the wpt.fyi dashboard is saying differently.
Comment 4 Chris Dumez 2022-08-04 12:14:05 PDT
The flaky failure on iOS (https://bugs.webkit.org/show_bug.cgi?id=240074) looks a lot like the failure here. So I think the test is just flaky on iOS and macOS.

I guess I can try to reproduce the flakiness with iOS instead since our bots seem to indicate it is more prevalent on that platform.
Comment 5 Chris Dumez 2022-08-04 12:26:26 PDT
Previous attempt to fix it by Youenn (Bug 240074) was unsuccessful based on the flakiness dashboard and wpt.fyi results.
Comment 6 Chris Dumez 2022-08-04 13:01:13 PDT
I do see one weirdness in our code here:
```
    // If newestWorker is not null, newestWorker's script url equals job's script url with the exclude fragments
    // flag set, and script's source text is a byte-for-byte match with newestWorker's script resource's source
    // text, then:
    if (newestWorker && equalIgnoringFragmentIdentifier(newestWorker->scriptURL(), job.scriptURL) && newestWorker->type() == job.workerType && result.script == newestWorker->script() && doCertificatesMatch(result.certificateInfo, newestWorker->certificateInfo())) {
```

But the spec says at https://w3c.github.io/ServiceWorker/#start-register (step 5.2):
```
If newestWorker is not null, job’s script url equals newestWorker’s script url, job’s worker type equals newestWorker’s type, and job’s update via cache mode's value equals registration’s update via cache mode, then:

Invoke Resolve Job Promise with job and registration.
Invoke Finish Job with job and abort these steps.
```

It seems we are missing the "update via cache mode" check. I'll try to reproduce the flakiness locally and see if this fixes it.
Comment 7 Chris Dumez 2022-08-04 13:30:11 PDT
(In reply to Chris Dumez from comment #6)
> I do see one weirdness in our code here:
> ```
>     // If newestWorker is not null, newestWorker's script url equals job's
> script url with the exclude fragments
>     // flag set, and script's source text is a byte-for-byte match with
> newestWorker's script resource's source
>     // text, then:
>     if (newestWorker &&
> equalIgnoringFragmentIdentifier(newestWorker->scriptURL(), job.scriptURL) &&
> newestWorker->type() == job.workerType && result.script ==
> newestWorker->script() && doCertificatesMatch(result.certificateInfo,
> newestWorker->certificateInfo())) {
> ```
> 
> But the spec says at https://w3c.github.io/ServiceWorker/#start-register
> (step 5.2):
> ```
> If newestWorker is not null, job’s script url equals newestWorker’s script
> url, job’s worker type equals newestWorker’s type, and job’s update via
> cache mode's value equals registration’s update via cache mode, then:
> 
> Invoke Resolve Job Promise with job and registration.
> Invoke Finish Job with job and abort these steps.
> ```
> 
> It seems we are missing the "update via cache mode" check. I'll try to
> reproduce the flakiness locally and see if this fixes it.

Actually, we do have the updateViaCache check in runRegisterJob() already:
```
        auto* newestWorker = registration->getNewestWorker();
        if (newestWorker && equalIgnoringFragmentIdentifier(job.scriptURL, newestWorker->scriptURL()) && job.workerType == newestWorker->type() && job.registrationOptions.updateViaCache == registration->updateViaCache()) {
            RELEASE_LOG(ServiceWorker, "%p - SWServerJobQueue::runRegisterJob: Found directly reusable registration %llu for job %s (DONE)", this, registration->identifier().toUInt64(), job.identifier().loggingString().utf8().data());
            m_server.resolveRegistrationJob(job, registration->data(), ShouldNotifyWhenResolved::No);
            finishCurrentJob();
            return;
        }
```
Comment 8 Chris Dumez 2022-08-04 16:16:32 PDT
I can't reproduce the flakiness on either macOS or iOS so I am kind of stuck.
I have tried both via WebKitTestRunner and in Safari and did many iterations.
Comment 9 Chris Dumez 2022-08-04 16:48:39 PDT
(In reply to Chris Dumez from comment #8)
> I can't reproduce the flakiness on either macOS or iOS so I am kind of stuck.
> I have tried both via WebKitTestRunner and in Safari and did many iterations.

Ok, I think I have a flaky reproduction now when running the test in Safari using a local http server instead of wpt.fyi.

Since Youenn's original patch was suspecting the memory cache, I tried disabling the memory cache but it didn't fix it.
Comment 10 Chris Dumez 2022-08-04 18:31:11 PDT
(In reply to Chris Dumez from comment #9)
> (In reply to Chris Dumez from comment #8)
> > I can't reproduce the flakiness on either macOS or iOS so I am kind of stuck.
> > I have tried both via WebKitTestRunner and in Safari and did many iterations.
> 
> Ok, I think I have a flaky reproduction now when running the test in Safari
> using a local http server instead of wpt.fyi.
> 
> Since Youenn's original patch was suspecting the memory cache, I tried
> disabling the memory cache but it didn't fix it.

Never mind, I must have tested it wrong earlier. Disabling the memory cache definitely fixes it so Youenn was likely right about the issue. However, it seems his fix was incomplete.
Comment 11 Chris Dumez 2022-08-04 19:20:07 PDT
Pull request: https://github.com/WebKit/WebKit/pull/3034
Comment 12 EWS 2022-08-04 21:44:22 PDT
Committed 253140@main (3f019cf4b5d2): <https://commits.webkit.org/253140@main>

Reviewed commits have been landed. Closing PR #3034 and removing active labels.
Comment 13 Karl Rackler 2022-08-05 09:42:51 PDT
(In reply to Chris Dumez from comment #10)
> (In reply to Chris Dumez from comment #9)
> > (In reply to Chris Dumez from comment #8)
> > > I can't reproduce the flakiness on either macOS or iOS so I am kind of stuck.
> > > I have tried both via WebKitTestRunner and in Safari and did many iterations.
> > 
> > Ok, I think I have a flaky reproduction now when running the test in Safari
> > using a local http server instead of wpt.fyi.
> > 
> > Since Youenn's original patch was suspecting the memory cache, I tried
> > disabling the memory cache but it didn't fix it.
> 
> Never mind, I must have tested it wrong earlier. Disabling the memory cache
> definitely fixes it so Youenn was likely right about the issue. However, it
> seems his fix was incomplete.

imported/w3c/web-platform-tests/service-workers/service-worker/registration-updateviacache.https.html is consistently passing after landing 251742@main with the exception of a one-off on 8/3/2022, 5:10:36 PM at 253089@main for iOS.  

Here is the history with 'Filter expected results' off: 
https://results.webkit.org/?suite=layout-tests&test=imported%2Fw3c%2Fweb-platform-tests%2Fservice-workers%2Fservice-worker%2Fregistration-updateviacache.https.html&platform=ios&platform=mac&limit=50000
Comment 15 Chris Dumez 2022-08-05 09:47:01 PDT
(In reply to Karl Rackler from comment #14)
> (In reply to Chris Dumez from comment #11)
> > Pull request: https://github.com/WebKit/WebKit/pull/3034
> 
> After landing 253140@main, Debug is consistently crashing.
> 
> History: 
> https://results.webkit.org/?suite=layout-tests&test=imported%2Fw3c%2Fweb-
> platform-tests%2Fservice-workers%2Fservice-worker%2Fregistration-
> updateviacache.https.html&platform=ios&platform=mac&limit=50000

ASSERTION FAILED: jobData.connectionIdentifier() == Process::identifier()

Looking now.
Comment 16 Chris Dumez 2022-08-05 09:49:56 PDT
(In reply to Chris Dumez from comment #15)
> (In reply to Karl Rackler from comment #14)
> > (In reply to Chris Dumez from comment #11)
> > > Pull request: https://github.com/WebKit/WebKit/pull/3034
> > 
> > After landing 253140@main, Debug is consistently crashing.
> > 
> > History: 
> > https://results.webkit.org/?suite=layout-tests&test=imported%2Fw3c%2Fweb-
> > platform-tests%2Fservice-workers%2Fservice-worker%2Fregistration-
> > updateviacache.https.html&platform=ios&platform=mac&limit=50000
> 
> ASSERTION FAILED: jobData.connectionIdentifier() == Process::identifier()
> 
> Looking now.

Looks like an outdated assertion now that we do the refresh loads in the network process unconditionally. I'll fix shortly.
Comment 17 Chris Dumez 2022-08-05 10:09:25 PDT
(In reply to Chris Dumez from comment #16)
> (In reply to Chris Dumez from comment #15)
> > (In reply to Karl Rackler from comment #14)
> > > (In reply to Chris Dumez from comment #11)
> > > > Pull request: https://github.com/WebKit/WebKit/pull/3034
> > > 
> > > After landing 253140@main, Debug is consistently crashing.
> > > 
> > > History: 
> > > https://results.webkit.org/?suite=layout-tests&test=imported%2Fw3c%2Fweb-
> > > platform-tests%2Fservice-workers%2Fservice-worker%2Fregistration-
> > > updateviacache.https.html&platform=ios&platform=mac&limit=50000
> > 
> > ASSERTION FAILED: jobData.connectionIdentifier() == Process::identifier()
> > 
> > Looking now.
> 
> Looks like an outdated assertion now that we do the refresh loads in the
> network process unconditionally. I'll fix shortly.

Follow-up fix in https://commits.webkit.org/253148@main.