WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
235114
REGRESSION (iOS 15): Massive variation in microbenchmark results when tab returns to foreground
https://bugs.webkit.org/show_bug.cgi?id=235114
Summary
REGRESSION (iOS 15): Massive variation in microbenchmark results when tab ret...
Simon Taylor
Reported
2022-01-12 07:43:01 PST
This is a weird one... I've got a simple microbenchmark that counts how many times Math.random() can be called in a single performance.now() tick (1ms on current iOS Safari). The idea is this provides a metric for the the current JS thread CPU performance (which varies depending on if it is running on an efficiency or performance core, and clock frequency). This is scheduled in a requestAnimationFrame loop. The page also has an option to further busy-loop in the rAF callback so the whole thing takes 20ms - checking with System Trace in Instruments this seems to reliably trigger the content thread being moved up to a Performance core, and gives higher call counts in the microbenchmark. I believe this is resulting in consistently high clocks but Instruments does not expose this data so I can't be 100% sure. Without the busy-loop option enabled the counts drop massively as the thread is moved to an efficiency core and clock speeds can be reduced. Interacting with the page (pinch zoom etc) does give a boost in the numbers, reflecting frequency ramping up to deal with the interaction and keep everything smooth. So far, all is as expected. Here's the weird thing: it looks like there are two distinct "operating points" for the microbenchmark results, and which one you get varies with page reload, or with events that cause the tab to make a background / foreground transition (tab switch, app switch, lock screen, etc). On my iPhone 12 Pro, iOS 15.2 the "fast" operating point has ~4k calls without busy looping and ~30k calls with busy looping. The "slow" operating point is ~1.2k without busy looping and ~12k with busy looping. These numbers seem to remain pretty consistent as long as the tab stays on screen, and only change on a background / foreground or reload event. I also have an iPhone 12 Mini with iOS 14.8.1, and the numbers are very similar to the "fast" operating point mentioned above. So I believe this to be an iOS 15.x regression. I've also been able to reproduce on an iPod touch 7 on 15.2. The best guess I have to explain the observed behaviour here is that some bug in tab re-awakening means it ends up using code from a lower JIT tier? This came to light in my testing for
Bug 235002
- note in either operating point for the microbenchmark, the webgl timings look consistent, so I do believe the clocks are consistent. One difference is on refresh the full webGL demo page seems to be in the "slow" mode for the benchmark but the simpler example seems to start out more often in "fast" mode. Some issue with tiering up in general that depends on the code size could be at play?
Attachments
Demo page
(3.17 KB, text/html)
2022-01-12 07:50 PST
,
Simon Taylor
no flags
Details
Screen recording from M1 Pro MBP
(6.28 MB, video/quicktime)
2022-01-12 08:15 PST
,
Simon Taylor
no flags
Details
Reproducing different operating points in STP 138
(1.76 MB, video/quicktime)
2022-02-03 09:23 PST
,
Simon Taylor
no flags
Details
View All
Add attachment
proposed patch, testcase, etc.
Simon Taylor
Comment 1
2022-01-12 07:50:39 PST
Created
attachment 448944
[details]
Demo page I've attached the demo page. It's also hosted at:
https://tango-bravo.net/webkit-bug-235114/index.html
Simon Taylor
Comment 2
2022-01-12 07:55:07 PST
I've also successfully reproduced this on both Intel and M1 Pro MacBooks with Safari 15.2 - both have two clear operating points. Quitting Safari and "Reopen Last Closed Window" seems to do the trick for rolling the dice and seeing which one you end up with.
Simon Taylor
Comment 3
2022-01-12 08:15:58 PST
Created
attachment 448946
[details]
Screen recording from M1 Pro MBP Screen recording showing this in action on an M1 MBP. Shows me doing Quit / Restart loop a few times (Safari set to re-open previous windows) until the page seems to be in the slower operating point. Then Refresh eventually seems to get one at the faster point. On Mac at least it seems refresh doesn't usually trigger the fast->slow transition, the Quit / Restart app is the more reliable way to observe that behaviour.
Radar WebKit Bug Importer
Comment 4
2022-01-12 09:16:12 PST
<
rdar://problem/87462931
>
Chris Dumez
Comment 5
2022-01-12 09:31:44 PST
I can reproduce on macOS / Intel. I just keep relaunching Safari and open
https://tango-bravo.net/webkit-bug-235114/index.html
. Sometimes, the math.random calls number is ~7k, sometimes the number is ~17k. I am doing some initial investigation but this may be a JSC issue having to do with tiering up or not.
Chris Dumez
Comment 6
2022-01-12 09:42:54 PST
The process priorities look similar when the benchmark is slow and fast: - Slow: 491 9 (FGSupport) 13 15MB 46MB 1000MB --- Y Y - 35711 com.apple.WebKit.WebContent policy: POLICY_TIMESHARE timeshare max priority: 63 timeshare base priority: 46 timeshare cur priority: 46 timeshare depressed: NO, prio -1 requested policy: req thread qos: THREAD_QOS_USER_INTERACTIVE, relprio: -1 req workqueue/pthread overides: req legacy qos override: THREAD_QOS_UNSPECIFIED req workqueue qos override: THREAD_QOS_UNSPECIFIED req kernel overides: req kevent overrides: THREAD_QOS_UNSPECIFIED req workloop servicer override: THREAD_QOS_UNSPECIFIED req turnstiles sync promotion qos: THREAD_QOS_UNSPECIFIED, user promotion base pri: 0 req latency qos: LATENCY_QOS_TIER_UNSPECIFIED req thruput qos: THROUGHPUT_QOS_TIER_UNSPECIFIED req darwin BG: NO req internal/external iotier: THROTTLE_LEVEL_TIER0 (IMPORTANT) / THROTTLE_LEVEL_TIER0 (IMPORTANT) req other: effective policy: eff thread qos: THREAD_QOS_USER_INTERACTIVE eff thread qos relprio: -1 eff promotion qos: THREAD_QOS_USER_INTERACTIVE eff latency qos: LATENCY_QOS_TIER_0 eff thruput qos: THROUGHPUT_QOS_TIER_0 eff darwin BG: NO eff iotier: THROTTLE_LEVEL_TIER0 (IMPORTANT) eff other: - Fast: 493 9 (FGSupport) 13 16MB 48MB 1000MB --- Y Y - 35877 com.apple.WebKit.WebContent policy: POLICY_TIMESHARE timeshare max priority: 63 timeshare base priority: 46 timeshare cur priority: 46 timeshare depressed: NO, prio -1 requested policy: req thread qos: THREAD_QOS_USER_INTERACTIVE, relprio: -1 req workqueue/pthread overides: req legacy qos override: THREAD_QOS_UNSPECIFIED req workqueue qos override: THREAD_QOS_UNSPECIFIED req kernel overides: req kevent overrides: THREAD_QOS_UNSPECIFIED req workloop servicer override: THREAD_QOS_UNSPECIFIED req turnstiles sync promotion qos: THREAD_QOS_UNSPECIFIED, user promotion base pri: 0 req latency qos: LATENCY_QOS_TIER_UNSPECIFIED req thruput qos: THROUGHPUT_QOS_TIER_UNSPECIFIED req darwin BG: NO req internal/external iotier: THROTTLE_LEVEL_TIER0 (IMPORTANT) / THROTTLE_LEVEL_TIER0 (IMPORTANT) req other: effective policy: eff thread qos: THREAD_QOS_USER_INTERACTIVE eff thread qos relprio: -1 eff promotion qos: THREAD_QOS_USER_INTERACTIVE eff latency qos: LATENCY_QOS_TIER_0 eff thruput qos: THROUGHPUT_QOS_TIER_0 eff darwin BG: NO eff iotier: THROTTLE_LEVEL_TIER0 (IMPORTANT) eff other: I don't think this is a process priority issue. I am curious what JSC engineers think about this. I guess I can try and compare profiles next.
Chris Dumez
Comment 7
2022-01-12 09:55:43 PST
Looking at the traces: 1. The WebContent process has about as many samples 2. The WebContent process has the same priorities 3. The WebContent process has about the same number of syscalls 4. The WebContent process has about the same amount of scheduled time I'll keep looking a bit but I am starting to wonder if something is wrong with the benchmark itself?
Chris Dumez
Comment 8
2022-01-12 09:59:41 PST
In the slow case, top samples are: Sample Count, Samples %, Normalized CPU %, Symbol 10167, 74.2%, 2.8%, vmEntryToJavaScript (in JavaScriptCore) 5205, 38.0%, 1.4%, ??? (in (null)) [0x53a6858040a4] 5107, 37.3%, 1.4%, operationGetFromScope (in JavaScriptCore) 2580, 18.8%, 0.7%, WebCore::JSDOMWindow::getOwnPropertySlot(JSC::JSObject*, JSC::JSGlobalObject*, JSC::PropertyName, JSC::PropertySlot&) (in WebCore) 1467, 10.7%, 0.4%, JSC::JSGlobalObject::getOwnPropertySlot(JSC::JSObject*, JSC::JSGlobalObject*, JSC::PropertyName, JSC::PropertySlot&) (in JavaScriptCore) 1136, 8.3%, 0.3%, JSC::JSObject::getOwnStaticPropertySlot(JSC::VM&, JSC::PropertyName, JSC::PropertySlot&) (in JavaScriptCore) 873, 6.4%, 0.2%, JSC::getStaticPropertySlotFromTable(JSC::VM&, JSC::ClassInfo const*, JSC::HashTable const&, JSC::JSObject*, JSC::PropertyName, JSC::PropertySlot&) (in JavaScriptCore) 510, 3.7%, 0.1%, WTF::equal(WTF::StringImpl const*, unsigned char const*) (in JavaScriptCore) In the fast case, top samples are: Sample Count, Samples %, Normalized CPU %, Symbol 10164, 72.1%, 2.8%, vmEntryToJavaScript (in JavaScriptCore) 3501, 24.9%, 1.0%, ??? (in (null)) [0x20370c40ae38] 3363, 23.9%, 0.9%, ??? (in (null)) [0x20370c4011d8] 3317, 23.5%, 0.9%, WebCore::jsPerformancePrototypeFunction_now(JSC::JSGlobalObject*, JSC::CallFrame*) (in WebCore) 2072, 14.7%, 0.6%, WTF::MonotonicTime::now() (in JavaScriptCore) 1195, 8.5%, 0.3%, mach_absolute_time (in libsystem_kernel.dylib) They do look very different and I suspect this is about JIT'ing vs not JIT'ing. Somebody from JSC should probably take a look (I attached the profiles to the radar).
Simon Taylor
Comment 9
2022-01-12 10:09:58 PST
(In reply to Chris Dumez from
comment #7
)
> I am starting to wonder if something is wrong with the benchmark itself?
Certainly possible. In the WebGL one I added console.log() calls (which show up as syscalls if WebInspector is open) and verified that the section where the calls are counted was 1ms long in both fast and slow cases. The idea of accumulating to a global var was to stop the optimizer removing the calls entirely. Different JIT levels being used does feel most likely. I'm not able to reproduce very readily on M1 Pro MBP with Safari Tech Preview. There's a vague hint of a 28k / 32k difference (perhaps some non-determinism but at the same JIT tier?), but I haven't seen the ~11k case there.
Mark Lam
Comment 10
2022-01-19 15:30:58 PST
FWIW, this issue no longer manifests after
r283102
.
Simon Taylor
Comment 11
2022-02-03 09:23:08 PST
Created
attachment 450781
[details]
Reproducing different operating points in STP 138 (In reply to Mark Lam from
comment #10
)
> FWIW, this issue no longer manifests after
r283102
.
I haven't tried to fully understand that commit, but does it replace the JS interpreter tier with a baseline JIT? So the issue of using less-optimised code on some loads of code may still be present, but with a less obvious gap between the operating points since that commit? I've updated the test page to try and tease this out. Now rather than showing the mean of the counts, I take the max (and median) of each set of 60 - these tend to be really quite consistent. I've also added a history of the "max" results to get more of a feel for the consistency of this over time.
https://tango-bravo.net/webkit-bug-235114/index.html
There do still seem to be distinct operating points in Safari Tech Preview 138 (the release notes mention unlinked baseline JIT being added in 134). I've attached a video showing what seem to be 3 different performance points from STP 138 on my M1 Pro MBP. 29000, 29600, 31950 - the gaps between these points are all significantly larger than the differences in samples in each of them, so I'm pretty confident there's still different code running in these different cases.
Simon Taylor
Comment 12
2022-02-03 09:30:10 PST
I've also added a "manual" benchmark runner page that just takes one set of 200 samples and shows them all each time you click a button:
https://tango-bravo.net/webkit-bug-235114/manual.html
You can observe the CPU frequency ramping up as the early samples have lower counts than the later ones, and the later ones are still quite consistent. For whatever reason this bug isn't as obvious on that page though, even on iOS 15.2 devices, though a do still see a couple of clear operating points, it's not the 3x difference that is visible in the rAF-based test. So the rAF test page is probably the more useful test case but thought I might as well share this one too.
Simon Taylor
Comment 13
2022-02-16 03:58:46 PST
This is readily reproducible on iOS 15.3.1 and not on 15.4 beta 3 (iPhone 12 Pro). Numbers are slightly more variable on iOS so it's not clear if there are still the slightly different operating points mentioned in
Comment 11
in-play.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug