RESOLVED FIXED 100057
Replace NodeRareData hash map with a union on m_renderer
https://bugs.webkit.org/show_bug.cgi?id=100057
Summary Replace NodeRareData hash map with a union on m_renderer
Elliott Sprehn
Reported 2012-10-22 17:00:41 PDT
Replace NodeRareData hash map with a union on m_renderer
Attachments
Patch (16.05 KB, patch)
2012-10-22 17:16 PDT, Elliott Sprehn
no flags
Patch (11.74 KB, patch)
2012-10-23 13:25 PDT, Elliott Sprehn
no flags
Patch (13.42 KB, patch)
2012-11-01 16:02 PDT, Elliott Sprehn
no flags
Patch for landing (13.65 KB, patch)
2012-11-01 17:56 PDT, Elliott Sprehn
no flags
Patch for landing (13.69 KB, patch)
2012-11-02 14:16 PDT, Elliott Sprehn
no flags
Patch for landing (13.93 KB, patch)
2012-11-02 14:26 PDT, Elliott Sprehn
no flags
Patch (16.67 KB, patch)
2012-11-02 15:23 PDT, Elliott Sprehn
no flags
Elliott Sprehn
Comment 1 2012-10-22 17:16:58 PDT
Eric Seidel (no email)
Comment 2 2012-10-22 17:34:05 PDT
Comment on attachment 170032 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=170032&action=review I think this is interesting. It's unclear to me how likely this is to cause perf changes (good or bad). > Source/WebCore/dom/Node.h:814 > + union DataUnion { You should probably add a comment, explaining why, etc.
Build Bot
Comment 3 2012-10-22 17:44:02 PDT
Build Bot
Comment 4 2012-10-22 17:59:57 PDT
Darin Adler
Comment 5 2012-10-22 18:41:20 PDT
Comment on attachment 170032 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=170032&action=review Looks like a good change, but requires some additional performance testing and tuning, I suspect. Mac build failed because what was an inline function is now non-inline. You’ll have to update the export file because of that. Not sure why the GTK and Windows builds failed. > Source/WebCore/ChangeLog:15 > + Use a union on Node::m_renderer between NodeRareData* and RenderObject*. This removes > + the overhead of accessing rare data and the memory from the map. We now get the 5% > + performance increase observed in Bug 90059 but when accessing node lists on any node. > + This should be a perf win on node-list-access.html of a similar percentage. > + > + The extra pointer and conditional does not regress performance when accessing node > + lists through methods like getElementsByClassName on the document. I also removed > + multiple accesses down hot code paths like recalcStyle. How did you measure performance outside of those DOM benchmarks? Saying “I removed multiple accesses down hot code paths” sounds like optimization by code inspection, but we need to actually test whether this measurably slows performance in code that was heavily accessing the renderer. Did you do some kind of testing? > Source/WebCore/dom/Node.cpp:483 > + data->setRenderer(renderer()); Really should be: data->setRenderer(m_data.m_renderer); No reason to check HasRareDataFlag an extra time here, and the more verbose version also has the advantage of being slightly clearer about what’s going on. > Source/WebCore/dom/Node.cpp:502 > + RenderObject* renderer = this->renderer(); Really should be: RenderObject* renderer = m_data.m_rareData->renderer(); No reason to check HasRareDataFlag an extra time here, and the more verbose version also has the advantage of being slightly clearer about what’s going on. > Source/WebCore/dom/Node.cpp:511 > +RenderObject* Node::renderer() const > +{ > + return hasRareData() ? m_data.m_rareData->renderer() : m_data.m_renderer; > +} Why isn’t this inlined? It’s really OK to have function overhead every time this is called? I’d expect that we’d at least put the !hasRareData() case in the header and inline it. > Source/WebCore/dom/Node.cpp:1427 > + RenderObject* renderer = this->renderer(); > + if (renderer) > + renderer->setAnimatableStyle(s); Our usual style would be to define the variable inside the if statement. > Source/WebCore/dom/Node.cpp:2842 > + RenderObject* renderer = this->renderer(); > + if (renderer) > + info.addMember(renderer->style()); Our usual style would be to define the variable inside the if statement. > Source/WebCore/dom/NodeRareData.h:183 > + : m_renderer(0) A shame to set m_renderer here since it will always get set again as soon as the createRareData function returns. > Source/WebCore/dom/NodeRenderStyle.h:39 > - if (m_renderer) > - return m_renderer->style(); > + if (renderer()) > + return renderer()->style(); Strange that you changed this, but did not put the renderer into a local variable.
Maciej Stachowiak
Comment 6 2012-10-22 22:04:20 PDT
I'd like to see some data about the performance effects of this change. Specifically: - What tests does it speed up and how much? Actual data, please, not just guesses. - Tests that cover rendering (including page loading tests, dom-driven animation tests, and anything else that might be rendering heavy). Reason for the latter is that this patch makes access to a node's renderer take a branch always, and dereference an extra pointer in the case where the node has rare data, so prima facie it seems like there is a risk of regressing anything that is hot in renderer access. Just as for behavior-affecting changes it is the patch submitter's job to provide evidence that the patch is correct, with performance-affecting patches it is the patch submitter's job to provide evidence that the patch improves performance as intended and does not regress things that it might be feared to regress. So I hope it is reasonable to ask for this perf test data up front.
Elliott Sprehn
Comment 7 2012-10-23 12:59:06 PDT
Comment on attachment 170032 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=170032&action=review >> Source/WebCore/ChangeLog:15 >> + multiple accesses down hot code paths like recalcStyle. > > How did you measure performance outside of those DOM benchmarks? Saying “I removed multiple accesses down hot code paths” sounds like optimization by code inspection, but we need to actually test whether this measurably slows performance in code that was heavily accessing the renderer. Did you do some kind of testing? I'll remove these for now and benchmark more. >> Source/WebCore/dom/Node.cpp:511 >> +} > > Why isn’t this inlined? It’s really OK to have function overhead every time this is called? I’d expect that we’d at least put the !hasRareData() case in the header and inline it. There's header cycles trying to make them inline and knowing the internal layout of a NodeRareData. I'll make the fast path inline.
Elliott Sprehn
Comment 8 2012-10-23 13:25:04 PDT
Elliott Sprehn
Comment 9 2012-10-23 13:26:55 PDT
(In reply to comment #8) > Created an attachment (id=170218) [details] > Patch I used a base class for NodeRareData that lets me inline the whole impl of setRenderer() and renderer(). The cast in rareData() is a little gross, but this saves you from all the exports and adding the method call. I'll run the bencharks now :)
Adam Barth
Comment 10 2012-10-25 23:42:44 PDT
Comment on attachment 170218 [details] Patch Marking r- to remove from EWS queue.
Elliott Sprehn
Comment 11 2012-11-01 16:02:18 PDT
Created attachment 171948 [details] Patch Did performance testing and removed original speculative optimizations in favor of ones that actually seem needed. There's no observable slowdown now, and there's a 15% improvement on Parser/textarea-parsing.html in addition to the 5% improvement for node lists (and treeScope)
Eric Seidel (no email)
Comment 12 2012-11-01 16:07:45 PDT
Comment on attachment 171948 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171948&action=review This LGTM, but I expect we're going to have some minor perf fallout from this. Code has long assumed that renderer() was basically free, and now it's a branch. > Source/WebCore/dom/Node.cpp:503 > + delete m_data.m_rareData; I assume manual delete is needed because of our use of a union? > Source/WebCore/dom/Node.h:835 > + // When a node has rare data we move the renderer into the rare data. You might also mention somnewhere in this header that this makes renderer() slightly more expensive than before. Previously renderer() was assumed free. :)
Elliott Sprehn
Comment 13 2012-11-01 17:33:01 PDT
(In reply to comment #12) > (From update of attachment 171948 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=171948&action=review > > This LGTM, but I expect we're going to have some minor perf fallout from this. Code has long assumed that renderer() was basically free, and now it's a branch. Yeah I'll keep an eye on it. > > > Source/WebCore/dom/Node.cpp:503 > > + delete m_data.m_rareData; > > I assume manual delete is needed because of our use of a union? Manual delete was always needed because we don't really use the OwnPtr from createRareData, we instead call leakPtr() and just manage it ourselves. You tried to fix this once: https://bugs.webkit.org/show_bug.cgi?id=17199 With the union approach we definitely can't use an OwnPtr, so I was going to just make createRareData return a bare ptr in a future patch.
Build Bot
Comment 14 2012-11-01 17:46:35 PDT
Comment on attachment 171948 [details] Patch Attachment 171948 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/14657091 New failing tests: svg/W3C-SVG-1.1/text-align-04-b.svg svg/custom/tref-remove-target-crash.svg svg/custom/text-tref-03-b-referenced-element-removal.svg svg/custom/text-tref-03-b-tref-removal.svg svg/custom/tref-own-content-removal.svg svg/batik/text/textProperties.svg svg/custom/text-linking.svg svg/W3C-SVG-1.1-SE/styling-pres-02-f.svg svg/custom/tref-update.svg svg/foreignObject/text-tref-02-b.svg svg/custom/tref-nested-events-crash.svg svg/text/text-align-04-b.svg svg/custom/text-tref-03-b-change-href.svg svg/custom/text-tref-03-b-change-href-dom.svg svg/W3C-SVG-1.1-SE/text-tref-03-b.svg
Elliott Sprehn
Comment 15 2012-11-01 17:56:36 PDT
Created attachment 171961 [details] Patch for landing
Elliott Sprehn
Comment 16 2012-11-01 18:00:53 PDT
(In reply to comment #14) > (From update of attachment 171948 [details]) > Attachment 171948 [details] did not pass mac-ews (mac): > Output: http://queues.webkit.org/results/14657091 > > New failing tests: > svg/W3C-SVG-1.1/text-align-04-b.svg > svg/custom/tref-remove-target-crash.svg > svg/custom/text-tref-03-b-referenced-element-removal.svg > svg/custom/text-tref-03-b-tref-removal.svg > svg/custom/tref-own-content-removal.svg > svg/batik/text/textProperties.svg > svg/custom/text-linking.svg > svg/W3C-SVG-1.1-SE/styling-pres-02-f.svg > svg/custom/tref-update.svg > svg/foreignObject/text-tref-02-b.svg > svg/custom/tref-nested-events-crash.svg > svg/text/text-align-04-b.svg > svg/custom/text-tref-03-b-change-href.svg > svg/custom/text-tref-03-b-change-href-dom.svg > svg/W3C-SVG-1.1-SE/text-tref-03-b.svg I fixed the crashes. I had assumed in Text::recalcTextStyle that if you have a renderer your parent does too, but this is not the case if your parent is a ShadowRoot. Incidentally the hack in SVGShadowText::willRecalcTextStyle is concerning, I don't think we handle Text styles properly in shadows for things outside SVG right now. I can't figure out where we ever call setStyle() on a Text if it's an immediate child of a ShadowRoot.
Maciej Stachowiak
Comment 17 2012-11-01 23:43:07 PDT
(In reply to comment #11) > Created an attachment (id=171948) [details] > Patch > > Did performance testing and removed original speculative optimizations in favor of ones that actually seem needed. There's no observable slowdown now, and there's a 15% improvement on Parser/textarea-parsing.html in addition to the 5% improvement for node lists (and treeScope) Could you please give some more detail on what tests you ran and what the actual raw results were (as per request in comment #6)? Did you run any benchmarks that test page loading speed?
Elliott Sprehn
Comment 18 2012-11-02 09:42:43 PDT
(In reply to comment #17) > (In reply to comment #11) > > Created an attachment (id=171948) [details] [details] > > Patch > > > > Did performance testing and removed original speculative optimizations in favor of ones that actually seem needed. There's no observable slowdown now, and there's a 15% improvement on Parser/textarea-parsing.html in addition to the 5% improvement for node lists (and treeScope) > > Could you please give some more detail on what tests you ran and what the actual raw results were (as per request in comment #6)? > I used run-perf-tests on a Mac Pro with yes > /dev/null in a background terminal to ensure no CPU throttling (I tested many times, and brining one core to a constant 100% CPU made all run-perf-tests results have less run-to-run delta). Results from run-perf-tests (exactly copied): Parser/html5-full-render ms Parser/textarea-parsing runs/s Baseline (with new RareData patch) 4590.78 ± 0.55% 55.42 ± 0.29% 4596.38 ± 0.23% 57.45 ± 0.19% 4565.54 ± 0.43% 3.66% Better 56.00 ± 0.11% 1.04% Better r133226 by comparison: 4584.45 ± 0.53% 48.22 ± 0.27% 13.00% Worse 4592.59 ± 0.85% 48.03 ± 0.62% 13.34% Worse 4531.03 ± 0.59% 1.30% Better 46.98 ± 0.24% 15.22% Worse I also ran the Bindings/* tests to ensure there were no changes and there weren't. The existing tests for getElementsByTagName and other node list getters showed no regression meaning this patch is indeed the same +5% Better as observed in 90059. Unfortunately the Bindings/get-elements-by* benchmarks only test document.* versions so this doesn't show that by not regressing performance on those tests with this patch I've also generalized the performance win. > Did you run any benchmarks that test page loading speed? I used html5-full-render as a micro benchark for that. It loads the HTML5 spec which is huge and hammers on recalcStyle, setRenderer and other page loading centric things. That's how I decided to change Text::recalcTextStyle. This was in response to Darin's comment on figuring out which places we should really care about accessing renderer() being slower. More detail of this optimization step is in the ChangeLog. Once this lands the Chromium bots can give us a clearer picture of page loading speed. As this patch is quite small it would be easy to just roll it out if there's a major issue just as was done the last time someone attempted to move m_document into the rare data.
Elliott Sprehn
Comment 19 2012-11-02 14:04:59 PDT
(In reply to comment #18) > (In reply to comment #17) > > (In reply to comment #11) > > > Created an attachment (id=171948) [details] [details] [details] > > > Patch > > > > > > Did performance testing and removed original speculative optimizations in favor of ones that actually seem needed. There's no observable slowdown now, and there's a 15% improvement on Parser/textarea-parsing.html in addition to the 5% improvement for node lists (and treeScope) > > > > Could you please give some more detail on what tests you ran and what the actual raw results were (as per request in comment #6)? > > > > I used run-perf-tests on a Mac Pro with yes > /dev/null in a background terminal to ensure no CPU throttling (I tested many times, and brining one core to a constant 100% CPU made all run-perf-tests results have less run-to-run delta). > > Results from run-perf-tests (exactly copied): > > Parser/html5-full-render ms > Parser/textarea-parsing runs/s > > Baseline (with new RareData patch) > > 4590.78 ± 0.55% > 55.42 ± 0.29% > > 4596.38 ± 0.23% > 57.45 ± 0.19% > > 4565.54 ± 0.43% 3.66% Better > 56.00 ± 0.11% 1.04% Better > > r133226 by comparison: > > 4584.45 ± 0.53% > 48.22 ± 0.27% 13.00% Worse > > 4592.59 ± 0.85% > 48.03 ± 0.62% 13.34% Worse > > 4531.03 ± 0.59% 1.30% Better > 46.98 ± 0.24% 15.22% Worse > Eric wanted me to clarify these numbers. The numbers show that for the Parser/html5-full-render.html test the patch has effectively no effect. The last one that's 1.3% better without my patch is just normal variation as the two previous runs don't show enough difference for run-perf-tests to even show the delta. Before I tuned the Text::recalcTextStyle method I saw consistent "2% Better" without my patch on html5-full-render which is why I made changes to that method to remove the overhead. The second test for Parser/textarea-parsing.html shows a consistent improvement of 13-15% with my patch. The other test I ran was Bindings/get-elements-by-tag-name.html which originally showed no difference on the first iteration of this patch, but I just reran with the current iteration of the patch and it does actually show some improvement: run-perf-tests --release --chromium Bindings/get-elements-by-tag-name.html Baseline of r133226: 224.28 ± 1.49% 2.52% Worse 227.44 ± 1.91% 230.08 ± 0.48% With my patch: 249.95 ± 1.27% 8.64% Better 249.27 ± 1.49% 8.34% Better 243.02 ± 2.21% 5.63% Better Which makes it look like removing the branch from inside Node::rareData() and the special casing for document rare data improved that benchmark by +5% as well. So in terms of these benchmarks this looks like all win, but of course the change that was needed in Text::recalcTextStyle highlights that there's possibly minor regressions in places where we call renderer() repeatedly. In a follow up patch we can fix those to cache the renderer in a local variable if needed just like I did in Text::recalcTextStyle and renderStyle().
Eric Seidel (no email)
Comment 20 2012-11-02 14:07:01 PDT
Comment on attachment 171961 [details] Patch for landing Thank you for clarifying. As we discussed, it's likely there may be some minor regressions due to repeated renderer() calls which can easily be avoided by using a local to store the renderer() result.
WebKit Review Bot
Comment 21 2012-11-02 14:09:22 PDT
Comment on attachment 171961 [details] Patch for landing Rejecting attachment 171961 [details] from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 2 Last 500 characters of output: atching file Source/WebCore/dom/Node.h Hunk #2 succeeded at 510 (offset -4 lines). Hunk #3 succeeded at 827 with fuzz 2 (offset -6 lines). patching file Source/WebCore/dom/NodeRareData.h Hunk #1 succeeded at 178 (offset 2 lines). Hunk #2 succeeded at 200 (offset 2 lines). patching file Source/WebCore/dom/NodeRenderStyle.h patching file Source/WebCore/dom/Text.cpp Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', u'--force']" exit_code: 1 cwd: /mnt/git/webkit-commit-queue Full output: http://queues.webkit.org/results/14670960
Elliott Sprehn
Comment 22 2012-11-02 14:16:59 PDT
Created attachment 172133 [details] Patch for landing
Eric Seidel (no email)
Comment 23 2012-11-02 14:23:38 PDT
Comment on attachment 172133 [details] Patch for landing Engage!
Elliott Sprehn
Comment 24 2012-11-02 14:26:40 PDT
Created attachment 172136 [details] Patch for landing
Elliott Sprehn
Comment 25 2012-11-02 14:27:46 PDT
(In reply to comment #23) > (From update of attachment 172133 [details]) > Engage! Sorry, want to cq+ one more time? I fixed the changelog to have more details from the perf tests I just ran since it turns out this is an unexpected 8% win :)
Eric Seidel (no email)
Comment 26 2012-11-02 14:28:32 PDT
Comment on attachment 172136 [details] Patch for landing Re-engage!
Early Warning System Bot
Comment 27 2012-11-02 14:38:31 PDT
Comment on attachment 172136 [details] Patch for landing Attachment 172136 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/14670967
Early Warning System Bot
Comment 28 2012-11-02 14:39:02 PDT
Comment on attachment 172136 [details] Patch for landing Attachment 172136 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/14718016
Elliott Sprehn
Comment 29 2012-11-02 14:56:38 PDT
(In reply to comment #28) > (From update of attachment 172136 [details]) > Attachment 172136 [details] did not pass qt-wk2-ews (qt): > Output: http://queues.webkit.org/results/14718016 Woops, it looks like new code went in this morning to track the memory usage of the map itself: https://trac.webkit.org/changeset/133298 which means that removing the map in this patch also reduces the memory usage on nytimes.com by 250k.
WebKit Review Bot
Comment 30 2012-11-02 14:58:22 PDT
Comment on attachment 172136 [details] Patch for landing Attachment 172136 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/14687988
Build Bot
Comment 31 2012-11-02 15:17:56 PDT
Comment on attachment 172136 [details] Patch for landing Attachment 172136 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/14677940
Build Bot
Comment 32 2012-11-02 15:18:45 PDT
Comment on attachment 172136 [details] Patch for landing Attachment 172136 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/14713101
Elliott Sprehn
Comment 33 2012-11-02 15:23:39 PDT
Created attachment 172156 [details] Patch Fix bug and update changelog to note the memory savings
Eric Seidel (no email)
Comment 34 2012-11-02 16:15:14 PDT
Comment on attachment 172156 [details] Patch By our powers combined!
WebKit Review Bot
Comment 35 2012-11-02 16:40:07 PDT
Comment on attachment 172156 [details] Patch Clearing flags on attachment: 172156 Committed r133372: <http://trac.webkit.org/changeset/133372>
WebKit Review Bot
Comment 36 2012-11-02 16:40:14 PDT
All reviewed patches have been landed. Closing bug.
Elliott Sprehn
Comment 37 2012-11-02 16:47:36 PDT
(In reply to comment #36) > All reviewed patches have been landed. Closing bug. By the power of Grayskull!
Maciej Stachowiak
Comment 38 2012-11-04 07:42:12 PST
Thanks for the performance data! I've asked Apple folks to run this through our internal page load speed benchmark as well.
Elliott Sprehn
Comment 39 2012-11-07 16:08:43 PST
(In reply to comment #38) > Thanks for the performance data! I've asked Apple folks to run this through our internal page load speed benchmark as well. The Chromium page cyclers showed no regression so the change looks good.
Ryosuke Niwa
Comment 40 2012-11-07 16:24:41 PST
*** Bug 89635 has been marked as a duplicate of this bug. ***
Stephanie Lewis
Comment 41 2012-11-07 16:37:58 PST
Ran it through our performance tests and there were no regressions.
Note You need to log in before you can comment on or make changes to this bug.