89635 – NodeRareData should be directly referred from Node instead of using HashMap

RESOLVED DUPLICATE of bug 10005789635

NodeRareData should be directly referred from Node instead of using HashMap

https://bugs.webkit.org/show_bug.cgi?id=89635

Summary NodeRareData should be directly referred from Node instead of using HashMap

Hajime Morrita

Reported 2012-06-20 22:57:16 PDT

Coined by Ryosuke, There is an observation that there is noticeable memory usage change even after haraken's 8 byte: http://trac.webkit.org/changeset/119802 because of some reasons like allocator fragmentation. This change is going to take back that bytes to put a pointer to NodeRareData. Then we can kill the hashtable lookup when fetching it. I understand this can be controversial. I'd like to hear your feedback.

Attachments
Patch (7.47 KB, patch) 2012-06-21 02:03 PDT, Hajime Morrita	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ec2-cr-linux-01 (732.52 KB, application/zip) 2012-06-21 05:10 PDT, WebKit Review Bot	no flags	Details
Patch (17.11 KB, patch) 2012-06-21 22:09 PDT, Hajime Morrita	rniwa: review+	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Hajime Morrita

Comment 1 2012-06-20 22:59:50 PDT

s/noticeable/no noticeable/

Ryosuke Niwa

Comment 2 2012-06-20 23:00:30 PDT

I'm supportive of this change. When I profiled some basic DOM APIs like appendChild, removeChild, etc... we were spending some significant amount of time 5-10% look up hash maps to look for nodeRareData() to obtain treeScope(). I suspect much of that will go away if we avoided the hash lookup for the rare node data.

Hajime Morrita

Comment 3 2012-06-21 02:03:12 PDT

Created attachment 148750 [details] Patch

WebKit Review Bot

Comment 4 2012-06-21 05:10:18 PDT

Comment on attachment 148750 [details] Patch Attachment 148750 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13026082 New failing tests: fast/forms/textarea-live-pseudo-selectors.html fast/dom/HTMLElement/attr-dir-auto-text-form-control-child.html fast/dom/HTMLElement/attr-dir-value-change.html fast/dom/HTMLElement/attr-dir-auto-change-child-node.html fast/text/international/bdi-dir-default-to-auto.html fast/dom/HTMLElement/attr-dir-auto-change-text.html fast/dom/HTMLElement/attr-dir-auto-remove-add-children.html fast/text/international/bdi-neutral-wrapped.html fast/dom/HTMLElement/attr-dir-auto.html fast/dom/HTMLElement/attr-dir-auto-change-before-text-node.html

WebKit Review Bot

Comment 5 2012-06-21 05:10:24 PDT

Created attachment 148770 [details] Archive of layout-test-results from ec2-cr-linux-01 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: ec2-cr-linux-01 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick

Andreas Kling

Comment 6 2012-06-21 06:30:47 PDT

I support this change as well. There's probably a bunch of smaller cleanups we can do where it's no longer necessary to cache rareData() in a local variable, etc.

Hajime Morrita

Comment 7 2012-06-21 22:09:48 PDT

Created attachment 148961 [details] Patch

Hajime Morrita

Comment 8 2012-06-21 22:11:42 PDT

Ryosuke, Kling, Thank you for your support :-) I updated the patch to fix test failures and to have some cleanup which Kling mentioned.

Ryosuke Niwa

Comment 9 2012-06-21 22:39:55 PDT

Comment on attachment 148961 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=148961&action=review > Source/WebCore/ChangeLog:9 > + Local benchmark shows 3% speedup on dom-modify.html and 7% speedup on dom-query.html Nice! > Source/WebCore/dom/Element.cpp:152 > inline ElementRareData* Element::rareData() const > { > - ASSERT(hasRareData()); > - return static_cast<ElementRareData*>(NodeRareData::rareDataFromMap(this)); > + return static_cast<ElementRareData*>(Node::rareData()); > } We can probably just inline this at the declaration. > Source/WebCore/dom/Element.cpp:1044 > if (hasRareData()) { Should we replace this with rareData() ? > Source/WebCore/dom/Node.h:724 > + bool hasRareData() const { return m_rareData; } Do we really need this function? > Source/WebCore/dom/Node.h:786 > + NodeRareData* m_rareData; Why don't we make this OwnPtr?

Hajime Morrita

Comment 10 2012-06-21 22:43:54 PDT

(In reply to comment #9) > (From update of attachment 148961 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=148961&action=review > > > Source/WebCore/ChangeLog:9 > > + Local benchmark shows 3% speedup on dom-modify.html and 7% speedup on dom-query.html > > Nice! > > > Source/WebCore/dom/Element.cpp:152 > > inline ElementRareData* Element::rareData() const > > { > > - ASSERT(hasRareData()); > > - return static_cast<ElementRareData*>(NodeRareData::rareDataFromMap(this)); > > + return static_cast<ElementRareData*>(Node::rareData()); > > } > > We can probably just inline this at the declaration. Unfortunately we cannot since ElementRareData definition isn't visible from the header file. > > > Source/WebCore/dom/Element.cpp:1044 > > if (hasRareData()) { > > Should we replace this with rareData() ? > > > Source/WebCore/dom/Node.h:724 > > + bool hasRareData() const { return m_rareData; } > > Do we really need this function? I'd like to do that in separate patch to keep this change clean. > > > Source/WebCore/dom/Node.h:786 > > + NodeRareData* m_rareData; > > Why don't we make this OwnPtr? Because NodeRareData definition from the header file and Node::Node is inlined (in Document.h). This is a bit ugly bat not worse than before...

Ryosuke Niwa

Comment 11 2012-06-21 22:51:03 PDT

Comment on attachment 148961 [details] Patch ok, fair enough. r=me provided we don't break tests :)

Darin Adler

Comment 12 2012-06-22 10:18:50 PDT

Comment on attachment 148961 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=148961&action=review > Source/WebCore/ChangeLog:8 > + This change adds Node::m_rareData and eliminate old raredata-support codes. Why is adding a pointer to every Node an acceptable memory regression!? On 64-bit systems this will make every Node 8 bytes bigger! We should look at other options before we spend all this extra memory. I don’t see any data here on the change in memory use and why it’s OK.

Andreas Kling

Comment 13 2012-06-22 11:02:54 PDT

(In reply to comment #12) > (From update of attachment 148961 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=148961&action=review > > > Source/WebCore/ChangeLog:8 > > + This change adds Node::m_rareData and eliminate old raredata-support codes. > > Why is adding a pointer to every Node an acceptable memory regression!? On 64-bit systems this will make every Node 8 bytes bigger! We should look at other options before we spend all this extra memory. I don’t see any data here on the change in memory use and why it’s OK. We just recently knocked one pointer off of every Node with the TreeShared devirtualization, so we're basically getting back to square one. Furthermore, I think we could move the "inline" JS wrapper pointer to NodeRareData after this change and make up for much of the regression.

Darin Adler

Comment 14 2012-06-22 11:14:13 PDT

I can understand adding the pointer since this is showing up as slowness on some realistic performance tests. But in general I am not sure I understand how to make the tradeoff between memory use and speed for this data. If the data was truly rare, then clearly the speed would not matter. So the heart of this is that the data is not all that rare. I could have used a pointer from the beginning when I added rare data, so I’m wondering how we will weigh this appropriately.

Ryosuke Niwa

Comment 15 2012-06-22 12:02:24 PDT

(In reply to comment #14) > I can understand adding the pointer since this is showing up as slowness on some realistic performance tests. But in general I am not sure I understand how to make the tradeoff between memory use and speed for this data. What we have observed from haraken's vtable pointer removal is that we adding a single pointer doesn't have any observable impact on memory usage probably because it fits within the fragment malloc allocates for each Node. > If the data was truly rare, then clearly the speed would not matter. So the heart of this is that the data is not all that rare. Possibly. The problem is with treeScope(), which is used in a lot of places. See https://bugs.webkit.org/show_bug.cgi?id=87034. There, morrita had tried to replace m_document with m_treeScope but caused a perf. regression. Also, node lists are rare but they need to be fast when they're accessed.

Darin Adler

Comment 16 2012-06-22 17:30:52 PDT

(In reply to comment #15) > What we have observed from haraken's vtable pointer removal is that we adding a single pointer doesn't have any observable impact on memory usage probably because it fits within the fragment malloc allocates for each Node. Sorry, I don’t buy it. Eventually you cross the fragment size threshold, and then size does matter! > > If the data was truly rare, then clearly the speed would not matter. So the heart of this is that the data is not all that rare. > > Possibly. The problem is with treeScope(), which is used in a lot of places. See https://bugs.webkit.org/show_bug.cgi?id=87034. There, morrita had tried to replace m_document with m_treeScope but caused a perf. regression. This worries me. We’re saying that the tree scope feature basically requires an additional pointer per node. I think that instead of tightening up we are letting size creep up. This treeScope regression is indeed a bad thing, and I suggest we find another way to fix it. > Also, node lists are rare but they need to be fast when they're accessed. Not sure what this means in practice. I think this patch is really about the treeScope regression.

Adam Barth

Comment 17 2012-06-22 18:24:19 PDT

> If the data was truly rare, then clearly the speed would not matter. I'm sure that's true as a general proposition. It's entirely possible that the data is present only on a small fraction of nodes but those nodes (and that data) are accessed frequently. In other words, it might be rare from a space point-of-view but not rare from an access frequency point-of-view. (Note: I haven't looked into any of the details here, so the above might not be what's going on here.)

Ryosuke Niwa

Comment 18 2012-06-22 18:26:27 PDT

(In reply to comment #17) > I'm sure that's true as a general proposition. It's entirely possible that the data is present only on a small fraction of nodes but those nodes (and that data) are accessed frequently. In other words, it might be rare from a space point-of-view but not rare from an access frequency point-of-view. That's certainly the case for getElementsByTagName.

Hajime Morrita

Comment 19 2012-06-24 18:12:48 PDT

My guess is that 3% speedup on dom-modify is from treeScope() and 7% speedup on dom-query is from nodelist cache. Here is my another plan to attack treeScope() speedup. - We could make it lazily-assigned value. - We could early return when a node isn't inside shadow dom. I personally believe this approach works well and can open opportunity for further optimization. But as Darin mentioned, we need some data to see how the tradeoff goes. Bug 78984 will help for that purpose. So let us come back here once it starts working.

Ryosuke Niwa

Comment 20 2012-06-25 10:19:24 PDT

I think we'll need to do this regardless of how we fix treeScope() regression. Given that jquery depends on our current optimization of getElementsByTagName('*'), this is the only way to mitigate the performance problem in the bug 73853.

Elliott Sprehn

Comment 21 2012-11-07 15:32:53 PST

This was resolved by the removal of the rare data map in Bug 100057

Ryosuke Niwa

Comment 22 2012-11-07 16:24:41 PST

*** This bug has been marked as a duplicate of bug 100057 ***

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution DUPLICATE

of bug 100057

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component DOM

Assignee

Hajime Morrita

Reported

2012-06-20 22:57 PDT

Modified

2012-11-07 16:24 PST History

CC List

13 users Show

URL

Keywords

Depends on

100057

Blocks

87034 89737

Dependencies

tree graph