NEW 13351
O(N^2) behavior seen in deeply nested trees when constructing renderers, laying out renderers and destroying renderers
https://bugs.webkit.org/show_bug.cgi?id=13351
Summary O(N^2) behavior seen in deeply nested trees when constructing renderers, layi...
Geoffrey Garen
Reported 2007-04-14 16:30:12 PDT
An equal number of un-nested tags shows very fast O(N) behavior. I'd like to use this bug as a master bug for all the different reasons we might see O(N^2) behavior with nested tags. As we diagnose specific causes, we should break off new bugs to track them. I think our first task should be to test parsing, rendering, and destroying individually, and then to profile what's slow.
Attachments
nesting tests (309.04 KB, application/octet-stream)
2007-04-14 16:35 PDT, Geoffrey Garen
no flags
nesting results (410.75 KB, image/jpeg)
2007-04-14 16:36 PDT, Geoffrey Garen
no flags
Geoffrey Garen
Comment 1 2007-04-14 16:35:15 PDT
Created attachment 14034 [details] nesting tests Attached the tests I wrote, along with results. Basic summary: not nested (span): 5ms nested (span): 99ms nested (div): 212ms nested (div span): 38338ms The div/span pair is particularly slow because mixing inline and block children forces the render tree into contortions.
Geoffrey Garen
Comment 2 2007-04-14 16:36:03 PDT
Created attachment 14035 [details] nesting results Attached image of results graph.
Dave Hyatt
Comment 3 2007-04-14 16:56:59 PDT
Some comments on each case: (1) The nested span case is going to result in orders of magnitude more line boxes. If you nest to an extreme depth, then every line will have a gigantic nesting level. It should be possible to collapse together nested spans with identical styles and line box positions (e.g., if one span's ypos and height precisely match its enclosing span's ypos and height) and that would not do any painting of their own, but this change would obviously be involved and highly dangerous. That's how I would attack the problem though... avoid constructing line boxes for spans that would not have painted anything and that have a positioning and style that is identical to their parent inlines. (2) Nested divs (I assume there is content in each div like text) would result in a lot of anonymous blocks being made to enclose text. Nothing comes to mind regarding fixing it other than the hope that something obvious shows up on profiles. This behavior should not be O(n^2), so if it is, something is going wrong. (3) Doing block - inline - block - inline is going to be the hardest one to fix. The rendering contortions are necessary, but maybe profiling will show areas for improvement.
Dave Hyatt
Comment 4 2007-04-14 17:00:37 PDT
The principal reason I never bothered optimizing (1) (the nested inline case) is that this really only occurs if something has gone horribly wrong with a malformed page. In normal usage inlines just don't nest deeply at all, and setting a maximum nesting depth of 200 (like Firefox does) would probably be good enough to deal with this particular problem for now.
Geoffrey Garen
Comment 5 2007-04-14 18:35:10 PDT
(I assume there is content in each div like text) I put a single letter 'a' inside each tag.
Geoffrey Garen
Comment 6 2007-04-15 14:59:57 PDT
Let's use this bug to track the O(N^2) performance issue, regardless of whether we set a cap. We can track the cap -- or whatever solution we devise for hanging -- with bug 13337.
Dave Hyatt
Comment 7 2007-04-20 00:32:41 PDT
Analyzing the nested span case. Because everything is on one line, some pathological first-line behavior is occurring. Not sure this would occur in any real-world pages, since you inevitably get off the first line pretty quickly, but nevertheless: firstLineStyle() is O(n^2) verticalPositionHint() is O(n^2)
Dave Hyatt
Comment 8 2007-04-20 00:34:50 PDT
Also, parsing is not O(n^2) in the nested span case. It's just as fast as the span case. It's building up the render tree and laying it out that is O(n^2),.
Dave Hyatt
Comment 9 2007-04-20 00:45:17 PDT
isContentEditable is O(n^2). inlineWidth() in bidi.cpp is O(n^2).
Dave Hyatt
Comment 10 2007-04-20 01:07:26 PDT
I should clarify this and say that these functions are O(n) but are often called on each node in the chain.
Dave Hyatt
Comment 11 2007-04-20 01:08:07 PDT
Also, all four of these are small potatoes... removing them has little effect on the overall perf numbers, but I thought I'd list them as I keep investigating.
Dave Hyatt
Comment 12 2007-04-20 01:39:02 PDT
Actually these are huge. Geoff's tests don't measure layout (which is actually the bulk of the time).
Dave Hyatt
Comment 13 2007-04-20 01:41:26 PDT
Results for the nested span case for 8192 nesting level: Time to parse 8192 nested elements: 9ms Time to construct renderers for 8192 nested elements: 664ms Time to lay out 8192 nested elements: 3017ms TOTAL TIME 8192 nested elements: 3690ms Results for the non-nested span case for 8192 nesting level: Time to parse 8192 nested elements: 7ms Time to construct renderers 8192 nested elements: 5ms Time to lay out 8192 nested elements: 13ms TOTAL TIME 8192 nested elements: 25ms This is terrible. The layout result is just stunningly bad.
Dave Hyatt
Comment 14 2007-04-20 01:44:52 PDT
Here are the new results when the four functions mentioned are fixed (I just hacked them to be O(1)): Time to parse 8192 nested elements: 11ms Time to construct renderers for 8192 nested elements: 670ms Time to lay out 8192 nested elements: 381ms TOTAL TIME 8192 nested elements: 1063ms
Dave Hyatt
Comment 15 2007-04-20 01:55:22 PDT
Here are div results for construction and layout. Unnested case: Time to parse 8192 nested elements: 10ms Time to construct renderers for 8192 nested elements: 5ms Time to lay out 8192 nested elements: 26ms TOTAL TIME 8192 nested elements: 41ms Nested case: Time to parse 8192 nested elements: 9ms Time to construct renderers for 8192 nested elements: 1230ms Time to lay out 8192 nested elements: 51ms TOTAL TIME 8192 nested elements: 1291ms Note that layout of nested blocks is not adversely affected by render tree depth, and that the O(n^2) problem here now that layout has been taken into account is actually not as bad as the nested span case!
Dave Hyatt
Comment 16 2007-04-20 02:17:47 PDT
Numbers for the span/div case. Unnested: Time to parse 4096 nested elements: 8ms Time to construct renderers for 4096 nested elements: 9ms Time to lay out 4096 nested elements: 39ms TOTAL TIME 4096 nested elements: 56ms Nested: Time to parse 4096 nested elements: 9ms Time to construct renderers for 4096 nested elements: 1759ms Time to lay out 4096 nested elements: 73055ms TOTAL TIME 4096 nested elements: 74823ms Note the incredible result here. Layout itself is pathologically slow. The render tree construction, while bad, is nothing compared to the cost of layout.
Dave Hyatt
Comment 17 2007-04-20 02:47:16 PDT
By the way without the O(1) fixes to the nested span case, the inline/block nested case jumps to 230 seconds. So both are affected by those functions.
Dave Hyatt
Comment 18 2007-04-20 03:12:18 PDT
We are spending most of our time calculating minmaxwidth in this pathological case. Here are results if we could envision doing calcMinMaxWidth lazily. Time to parse 4096 nested elements: 9ms Time to construct renderers for 4096 nested elements: 1608ms Time to lay out 4096 nested elements: 14993ms TOTAL TIME 4096 nested elements: 16610ms
Dave Hyatt
Comment 19 2007-04-20 03:22:37 PDT
We always call nextOnLineExists/prevOnLineExists to try to figure out which edges of enclosing inlines should be included to get borders, padding and margins correct. Since 99% of the time, these spans have no borders, padding or margins, this should not be necessary. Simply turning off these calls (in determineSpacingForFlowBoxes) yields the following (very exciting) results: Time to parse 4096 nested elements: 10ms Time to construct renderers for 4096 nested elements: 1768ms Time to lay out 4096 nested elements: 2041ms TOTAL TIME 4096 nested elements: 3820ms This is down from 230 seconds assuming every problem described in this bug so far is fixed.
Dave Hyatt
Comment 20 2007-04-20 03:31:55 PDT
Here is the result for 8192 nested blocks inside inlines with all "fixes" applied (a level I doubt any of these real-world hangs have even achieved). Time to parse 8192 nested elements: 17ms Time to construct renderers for 8192 nested elements: 26192ms Time to lay out 8192 nested elements: 36236ms TOTAL TIME 8192 nested elements: 63005ms
Dave Hyatt
Comment 21 2007-04-20 03:35:32 PDT
Even if we don't do calcInlineMinMaxWidth lazily, there are some easy wins. The isBR() check is redundant and can be removed for a speedup. isInlineFlow() could be sped up to check a bit instead of making a virtual function call.
Dave Hyatt
Comment 22 2007-04-20 04:43:53 PDT
Another observation: it is both expected and natural that the nested-block-inside-inline case would be O(n^2). You will actually make n^2 RenderObjects in this case. Note that the block inside inline behavior is currently n^4, and that's why the slowdown is so pathological. The tree itself explodes into an O(n^2) phenomenon (unavoidable), and then O(n^2) algorithms run on each one (avoidable).
Dave Hyatt
Comment 23 2007-04-20 22:43:09 PDT
I just thought of a brilliant way to handle continuations. :) When a block is encountered inside an inline, wrap it in an anonymous inline-block. Make sure the anonymous inline-block has the following special characteristics: (1) It is set to width:100% so that it fills the width of the container. (2) It is flagged as special so that forced breaks occur before and after the inline-block. I think that should give us continuations practically for free (and even fix bugs where an enclosing inline was a relpositioned layer, etc.)
Gavin Sherlock
Comment 24 2009-02-01 10:06:26 PST
The performance of the test cases has improved enormously (tested with r40471 on 10.5.6, on a MacPro): not nested (span): Time to parse, render, and destroy 1024 nested elements: 1ms Time to parse, render, and destroy 2048 nested elements: 1ms Time to parse, render, and destroy 3072 nested elements: 3ms Time to parse, render, and destroy 4096 nested elements: 3ms Time to parse, render, and destroy 5120 nested elements: 5ms Time to parse, render, and destroy 6144 nested elements: 5ms Time to parse, render, and destroy 7168 nested elements: 7ms Time to parse, render, and destroy 8192 nested elements: 8ms Time to parse, render, and destroy 9216 nested elements: 8ms Time to parse, render, and destroy 10240 nested elements: 9ms Time to parse, render, and destroy 11264 nested elements: 10ms Time to parse, render, and destroy 12288 nested elements: 11ms Time to parse, render, and destroy 13312 nested elements: 12ms Time to parse, render, and destroy 14336 nested elements: 12ms Time to parse, render, and destroy 15360 nested elements: 13ms Time to parse, render, and destroy 16384 nested elements: 14ms nested (span): Time to parse, render, and destroy 1024 nested elements: 2ms Time to parse, render, and destroy 2048 nested elements: 7ms Time to parse, render, and destroy 3072 nested elements: 15ms Time to parse, render, and destroy 4096 nested elements: 27ms Time to parse, render, and destroy 5120 nested elements: 40ms Time to parse, render, and destroy 6144 nested elements: 52ms Time to parse, render, and destroy 7168 nested elements: 80ms Time to parse, render, and destroy 8192 nested elements: 98ms Time to parse, render, and destroy 9216 nested elements: 131ms Time to parse, render, and destroy 10240 nested elements: 140ms Time to parse, render, and destroy 11264 nested elements: 194ms Time to parse, render, and destroy 12288 nested elements: 227ms Time to parse, render, and destroy 13312 nested elements: 232ms Time to parse, render, and destroy 14336 nested elements: 307ms Time to parse, render, and destroy 15360 nested elements: 347ms Time to parse, render, and destroy 16384 nested elements: 395ms nested (div): Time to parse, render, and destroy 1024 nested elements: 2ms Time to parse, render, and destroy 2048 nested elements: 7ms Time to parse, render, and destroy 3072 nested elements: 16ms Time to parse, render, and destroy 4096 nested elements: 27ms Time to parse, render, and destroy 5120 nested elements: 42ms Time to parse, render, and destroy 6144 nested elements: 55ms Time to parse, render, and destroy 7168 nested elements: 79ms Time to parse, render, and destroy 8192 nested elements: 102ms Time to parse, render, and destroy 9216 nested elements: 129ms Time to parse, render, and destroy 10240 nested elements: 156ms Time to parse, render, and destroy 11264 nested elements: 189ms Time to parse, render, and destroy 12288 nested elements: 216ms Time to parse, render, and destroy 13312 nested elements: 265ms Time to parse, render, and destroy 14336 nested elements: 304ms Time to parse, render, and destroy 15360 nested elements: 319ms Time to parse, render, and destroy 16384 nested elements: 396ms nested (div span): Time to parse, render, and destroy 512 nested elements: 2ms Time to parse, render, and destroy 1024 nested elements: 4ms Time to parse, render, and destroy 1536 nested elements: 8ms Time to parse, render, and destroy 2048 nested elements: 15ms Time to parse, render, and destroy 2560 nested elements: 23ms It's still O(n^2) for everything except the not nested case, but these times are way better than they used to be, and significantly better than Safari 3.2.1 on the same machine. For example, the nested (div span) test case in Safari 3.2.1 gives: Time to parse, render, and destroy 512 nested elements: 46ms Time to parse, render, and destroy 1024 nested elements: 141ms Time to parse, render, and destroy 1536 nested elements: 251ms Time to parse, render, and destroy 2048 nested elements: 367ms Time to parse, render, and destroy 2560 nested elements: 491ms while the nested (div) case in Safari 3.2.1 gives: Time to parse, render, and destroy 1024 nested elements: 19ms Time to parse, render, and destroy 2048 nested elements: 72ms Time to parse, render, and destroy 3072 nested elements: 160ms Time to parse, render, and destroy 4096 nested elements: 281ms Time to parse, render, and destroy 5120 nested elements: 443ms Time to parse, render, and destroy 6144 nested elements: 630ms Time to parse, render, and destroy 7168 nested elements: 856ms Time to parse, render, and destroy 8192 nested elements: 1126ms Time to parse, render, and destroy 9216 nested elements: 1427ms Time to parse, render, and destroy 10240 nested elements: 1783ms Time to parse, render, and destroy 11264 nested elements: 2252ms Time to parse, render, and destroy 12288 nested elements: 2705ms Time to parse, render, and destroy 13312 nested elements: 3412ms Time to parse, render, and destroy 14336 nested elements: 4891ms Time to parse, render, and destroy 15360 nested elements: 6028ms Time to parse, render, and destroy 16384 nested elements: 7522ms so there have been enormous improvements.
Dave Hyatt
Comment 25 2009-08-22 22:21:04 PDT
It's just the new parser depth cap of 4096. This depth cap is basically making the tests faster.
Ryosuke Niwa
Comment 26 2012-04-15 11:12:06 PDT
It seems like the first step of resolving this bug will be to add a WebKit-style performance test. http://trac.webkit.org/wiki/Writing%20Performance%20Tests In fact, this might be a good demo on my perf. test talk at the contributor's meeting :)
Note You need to log in before you can comment on or make changes to this bug.