WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED INVALID
46695
Optimize TextBreakIteratorQt to avoid deep-copying the strings used by cached iterators
https://bugs.webkit.org/show_bug.cgi?id=46695
Summary
Optimize TextBreakIteratorQt to avoid deep-copying the strings used by cached...
Matthew Figg
Reported
2010-09-27 21:16:53 PDT
The setUpIterator function in WebCore/platform/text/qt/TextBreakIteratorQt.cpp:53 has two problems: TextBreakIterator* setUpIterator(TextBreakIterator& iterator, QTextBoundaryFinder::BoundaryType type, const UChar* string, int length) { if (!string || !length) return 0; if (iterator.isValid() && type == iterator.type() && length == iterator.length && memcmp(string, iterator.string, length) == 0) { iterator.toStart(); return &iterator; } iterator = TextBreakIterator(type, string, length); return &iterator; } 1) The memcmp doesn't compare the entire string. It should instead be memcmp(string, iterator.string, length * sizeof(UChar)) This is easily proven by loading the following webpage. The iterator is reused for all inputs: <html> <body> <input value="search0"/> <input value="search1"/> <input value="search2"/> <input value="search3"/> <input value="search4"/> <input value="search5"/> <input value="search6"/> <input value="search7"/> <input value="search8"/> <input value="search9"/> </body> </html> 2) The iterator.string is sometimes freed between calls. If the iterator is valid, the type is the same and the length is the same, it ends up comparing the string to random memory and potentially (if the random memory matches) returning an iterator with the invalid string pointer. I'm not sure if this could cause larger issues further downstream? This bug was introduced when fixing
https://bugs.webkit.org/show_bug.cgi?id=39958
- See
http://trac.webkit.org/changeset/60847/trunk/WebCore/platform/text/qt/TextBreakIteratorQt.cpp
Attachments
Incomplete - seek feedback on this approach.
(7.94 KB, patch)
2011-02-11 13:01 PST
,
chris reiss
menard
: review-
Details
Formatted Diff
Diff
one big <div> block
(122.45 KB, text/html)
2011-02-14 08:59 PST
,
chris reiss
no flags
Details
proposed approach to safe recycling without doing a deep copy of the string
(10.00 KB, patch)
2011-02-17 07:43 PST
,
chris reiss
no flags
Details
Formatted Diff
Diff
(fix to chromium build break)
(9.94 KB, patch)
2011-02-17 07:56 PST
,
chris reiss
kling
: review-
Details
Formatted Diff
Diff
style fixes
(10.46 KB, patch)
2011-02-18 08:16 PST
,
chris reiss
no flags
Details
Formatted Diff
Diff
Using RefPtr's to safely reuse textiterators
(10.45 KB, patch)
2011-02-21 10:25 PST
,
chris reiss
no flags
Details
Formatted Diff
Diff
Using RefPtr's to safely reuse textiterators
(10.45 KB, patch)
2011-02-21 10:40 PST
,
chris reiss
kling
: review-
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
Manuel Lazzari
Comment 1
2010-12-28 01:00:55 PST
I managed to verify problem (2) using debugging tool for windows and enabling global flags: - page heap - heap tail check - heap free checking - heap parameters checking - heap validation on call To reproduce the problem and have the program to crash you can simply start to type in an input field, delete all the input via backspace and type something else: when you start retyping the program crashes. Calls stack following: 5caf8f50 MSVCR90D!free+0x00000010 10c9fe6d QtWebKitd4!WTF::fastFree+0x0000000d 1008af7d QtWebKitd4!WTF::FastAllocBase::operator delete+0x0000001d 1008af40 QtWebKitd4!WebCore::StringImpl::`scalar deleting destructor'+0x00000020 1008aeef QtWebKitd4!WebCore::StringImpl::deref+0x0000003f 1008aea1 QtWebKitd4!WTF::derefIfNotNull<WebCore::StringImpl>+0x00000011 1008ae52 QtWebKitd4!WTF::RefPtr<WebCore::StringImpl>::~RefPtr<WebCore::StringImpl>+0x00000012 1008ad5f QtWebKitd4!WebCore::String::~String+0x0000000f 108e3d08 QtWebKitd4!WebCore::RenderText::~RenderText+0x00000038 108e3848 QtWebKitd4!WebCore::RenderBR::~RenderBR+0x00000018 108e3c9f QtWebKitd4!WebCore::RenderBR::`scalar deleting destructor'+0x0000000f 1092df7b QtWebKitd4!WebCore::RenderObject::arenaDelete+0x0000013b 1092de3b QtWebKitd4!WebCore::RenderObject::destroy+0x0000013b 1095021a QtWebKitd4!WebCore::RenderText::destroy+0x000000ca 1053b93f QtWebKitd4!WebCore::Node::detach+0x0000003f 104e41d9 QtWebKitd4!WebCore::ContainerNode::detach+0x00000049 1052241a QtWebKitd4!WebCore::Element::detach+0x0000003a 104e3457 QtWebKitd4!WebCore::ContainerNode::removeChild+0x00000157 1053a4da QtWebKitd4!WebCore::Node::remove+0x0000003a 105e2a9f QtWebKitd4!WebCore::RemoveNodeCommand::doApply+0x0000008f 105aafa8 QtWebKitd4!WebCore::EditCommand::apply+0x000000c8 105959ff QtWebKitd4!WebCore::CompositeEditCommand::applyCommandToComposite+0x0000004f 105964e4 QtWebKitd4!WebCore::CompositeEditCommand::removeNode+0x00000074 10598988 QtWebKitd4!WebCore::CompositeEditCommand::removePlaceholderAt+0x00000048 105d4f53 QtWebKitd4!WebCore::InsertTextCommand::input+0x00000463 10601769 QtWebKitd4!WebCore::TypingCommand::insertTextRunWithoutNewlines+0x00000119
Manuel Lazzari
Comment 2
2010-12-28 01:04:50 PST
EDIT - call stack obviously reports who frees the string
Benjamin Poulain
Comment 3
2011-01-30 07:46:22 PST
Please follow
http://trac.webkit.org/wiki/QtWebKitBugs
when reporing bugs here (missing Qt keyword). Your bug was not tracked because of the missing Qt keyword :(
chris reiss
Comment 4
2011-01-31 10:29:06 PST
starting on this ...
chris reiss
Comment 5
2011-01-31 11:06:37 PST
Just seeing this code for the first time, but my initial take is that this if (iterator.isValid() && type == iterator.type() && length == iterator.length && memcmp(string, iterator.string, length) == 0) { iterator.toStart(); return &iterator; } should really be this ... if (iterator.isValid() && type == iterator.type() && length == iterator.length && string == iterator.string) { iterator.toStart(); return &iterator; } That is, we just want to make sure iterator.string is at the same place in memory as 'string'. Then it's safe to recycle it. Let me do some more digging and testing ...
Benjamin Poulain
Comment 6
2011-01-31 11:23:25 PST
(In reply to
comment #5
)
> should really be this ... > > if (iterator.isValid() && type == iterator.type() && length == iterator.length && string == iterator.string) { > iterator.toStart(); > return &iterator; > } > > That is, we just want to make sure iterator.string is at the same place in memory as 'string'. Then it's safe to recycle it. Let me do some more digging and testing ...
Uh... "string" is a UChar*. I don't think you'll skip the memcmp so easily.
chris reiss
Comment 7
2011-01-31 11:44:40 PST
It seems to me that : We can recycle the iterator if - a) the strings are at the same location in memory, hence identical. A comparison of (string== iterator.string) tests for this. b) the strings happen to be at different locations, but have the same content. But in this case, we run the risk that 'string' points to freed memory (as Manuel pointed out). We could work around this by doing a string copy, but this may incur a worse performance hit than recreating the iterator. Of course, a) is useless if there is some copying being done somewhere up the stack. I'm testing for this now, that is, does the test ever pass.
chris reiss
Comment 8
2011-01-31 11:47:45 PST
EDIT : we run the risk that iterator.string points to free memory.
Andreas Kling
Comment 9
2011-01-31 12:26:40 PST
(In reply to
comment #7
)
> It seems to me that : > > We can recycle the iterator if - > a) the strings are at the same location in memory, hence identical. A comparison of (string== iterator.string) tests for this.
This is not sufficient. 'string' can be == 'iterator.string' for different strings in the case where the allocator re-uses the same address in a subsequent allocation. There are two solutions, as I see it. Either we hold a strong reference to the string for the lifetime of the cached iterator or we take a deep copy. The ref fix is superior since it eliminates the bug and avoids the deep copy. Unfortunately the API of these functions is character/length based, which means that we have no string to bump the refcount on. Is it possible to alter the setUpIterator() (and friends) API to take String objects instead?
chris reiss
Comment 10
2011-01-31 14:04:10 PST
> This is not sufficient. 'string' can be == 'iterator.string' for different strings in the case where the allocator re-uses the same address in a subsequent allocation.
I see - in the case that 'string' has been freed, reallocated and overwritten by 'iterator.string'. In which case there is no first string anymore :) Is this a problem, so long as the first thing we do to the TextBreakIterator is call toStart() ? It doesn't seem to me that TextBreakIterator maintains any sort of state which clings to the string once its been rewound. (its parent, QTextBoundaryFinder, doesn't do a copy either - or we cd compare against *that*)
chris reiss
Comment 11
2011-02-11 13:01:05 PST
Created
attachment 82164
[details]
Incomplete - seek feedback on this approach. Curious to hear feedback on this approach to avoiding the deep string copy while safely reusing the TextIterator. I found that the big performance occurs hit occurs down the stack from RenderBlockLineLayout::findNextLineBreak, WebCore::isBreakable, WebCore::nextBreakablePosition and WebCore::lineBreakIterator and finally to TextBreakIteratorQt.cpp. Here i protect the memory using an optional argument and PassRefPtr . Is this too disruptive? Also - this patch isn't complete, some function signatures would need to change in other ports of TextIterator. I just wanted to see if there were any fundamental objections to this approach before I roll up a complete patch. i also have timing results comparing no caching, the current (unsafe) code, and this code if ppl are interested. Also - Am i using PassRefPtr correctly? It doesn't crash, anyway :)
WebKit Review Bot
Comment 12
2011-02-14 07:13:36 PST
Attachment 82164
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/platform/text/TextBreakIter..." exit_code: 1 Source/WebCore/platform/text/qt/TextBreakIteratorQt.cpp:25: Alphabetical sorting problem. [build/include_order] [4] Source/WebCore/platform/text/qt/TextBreakIteratorQt.cpp:50: Missing spaces around = [whitespace/operators] [4] Source/WebCore/platform/text/qt/TextBreakIteratorQt.cpp:50: Use 0 instead of NULL. [readability/null] [5] Source/WebCore/platform/text/qt/TextBreakIteratorQt.cpp:66: Missing spaces around = [whitespace/operators] [4] Source/WebCore/platform/text/qt/TextBreakIteratorQt.cpp:66: Use 0 instead of NULL. [readability/null] [5] Source/WebCore/platform/text/qt/TextBreakIteratorQt.cpp:72: Missing spaces around == [whitespace/operators] [3] Source/WebCore/rendering/break_lines.h:25: Alphabetical sorting problem. [build/include_order] [4] Source/WebCore/rendering/break_lines.h:27: Alphabetical sorting problem. [build/include_order] [4] Source/WebCore/rendering/break_lines.h:32: Code inside a namespace should not be indented. [whitespace/indent] [4] Source/WebCore/rendering/break_lines.h:32: Missing spaces around = [whitespace/operators] [4] Source/WebCore/rendering/break_lines.h:32: Use 0 instead of NULL. [readability/null] [5] Source/WebCore/rendering/break_lines.h:34: Missing spaces around = [whitespace/operators] [4] Source/WebCore/rendering/break_lines.h:34: Use 0 instead of NULL. [readability/null] [5] Source/WebCore/platform/text/TextBreakIterator.h:26: Alphabetical sorting problem. [build/include_order] [4] Source/WebCore/platform/text/TextBreakIterator.h:49: Missing spaces around = [whitespace/operators] [4] Source/WebCore/platform/text/TextBreakIterator.h:49: Use 0 instead of NULL. [readability/null] [5] Source/WebCore/platform/text/gtk/TextBreakIteratorGtk.cpp:29: Alphabetical sorting problem. [build/include_order] [4] Source/WebCore/platform/text/gtk/TextBreakIteratorGtk.cpp:244: Missing spaces around = [whitespace/operators] [4] Source/WebCore/platform/text/gtk/TextBreakIteratorGtk.cpp:244: Use 0 instead of NULL. [readability/null] [5] Total errors found: 19 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 13
2011-02-14 07:16:43 PST
Attachment 82164
[details]
did not build on chromium: Build output:
http://queues.webkit.org/results/7907608
Collabora GTK+ EWS bot
Comment 14
2011-02-14 07:22:18 PST
Attachment 82164
[details]
did not build on gtk: Build output:
http://queues.webkit.org/results/7905638
Build Bot
Comment 15
2011-02-14 07:38:01 PST
Attachment 82164
[details]
did not build on win: Build output:
http://queues.webkit.org/results/7908612
chris reiss
Comment 16
2011-02-14 08:57:45 PST
Here are some timing results which motivate this caching. To recap, the code as it is has two errors, the length if the compare is incorrect and there is a risk of a dangling pointer if iterator.string gets destroyed. The simple fix is to *take a deep copy of the string in Iterator. The more extensive fix is the above patch (or something like it):
https://bugs.webkit.org/attachment.cgi?id=82164
, which (tries to) protect the string in memory using RefPtr's, and only recycles Iterators in the path where the performance hit seems to occur. Here is time spent in TextBreakIterator* setUpIterator(...), using QtTestBrowser on Ubuntu (debug build, 2.8Gz) for the different alternatives : ---- URL :
http://en.wikipedia.org/wiki/List_of_science_fiction_television_programs
No caching : 36 milliseconds (here we always create a new iterator.) deep copy : 23 milliseconds RefPtr appraoch : 16 milliseconds here is another set of results, using one enormous <div> (attached) : ---- No caching : 700 milliseconds (ouch) deep copy : 9 milliseconds RefPtr's : 5 milliseconds.
chris reiss
Comment 17
2011-02-14 08:59:07 PST
Created
attachment 82322
[details]
one big <div> block
Alexis Menard (darktears)
Comment 18
2011-02-14 10:46:51 PST
Comment on
attachment 82164
[details]
Incomplete - seek feedback on this approach. It breaks the build in many platforms you should take a look.
chris reiss
Comment 19
2011-02-17 07:43:38 PST
Created
attachment 82802
[details]
proposed approach to safe recycling without doing a deep copy of the string Posting this as sort of a straw man, a possible way to safely recycle the TextIterator without doing a deep copy of the string. Is this too disruptive to other ports? Am I using RefPtr's correctly. Please also see timing results above ...
WebKit Review Bot
Comment 20
2011-02-17 07:47:35 PST
Attachment 82802
[details]
did not build on chromium: Build output:
http://queues.webkit.org/results/7927642
chris reiss
Comment 21
2011-02-17 07:56:18 PST
Created
attachment 82806
[details]
(fix to chromium build break)
Andreas Kling
Comment 22
2011-02-18 02:47:23 PST
Comment on
attachment 82806
[details]
(fix to chromium build break) View in context:
https://bugs.webkit.org/attachment.cgi?id=82806&action=review
> Source/WebCore/ChangeLog:8 > + No new tests. (OOPS!)
This line should be replaced, either mention what tests are added, or explain why none are needed.
> Source/WebCore/ChangeLog:9 > +
I'm missing a comment block here explaining that you're adding a way for TextBreakIterators to hold a reference to the last string they operate on for reuse purposes.
> Source/WebCore/platform/text/qt/TextBreakIteratorQt.cpp:48 > + TextBreakIterator(QTextBoundaryFinder::BoundaryType type, const UChar* string, int length, PassRefPtr<StringImpl> origText = 0)
"origText" is a bad variable name, we stay away from abbreviations in WebKit style. Repeated below.
> Source/WebCore/platform/text/gtk/TextBreakIteratorGtk.cpp:244 > +TextBreakIterator* lineBreakIterator(const UChar* string, int length, PassRefPtr<StringImpl> origText = 0 /* used for memory-safe recycling of Iterator */)
No need for the " = 0 /* used for memory-safe recycling of Iterator */" here.
> Source/WebCore/rendering/break_lines.h:28 > +typedef PassRefPtr<StringImpl> Blerk;
Blerk?
chris reiss
Comment 23
2011-02-18 08:16:30 PST
Created
attachment 82961
[details]
style fixes (and, um, de-blerked :) )
Alexis Menard (darktears)
Comment 24
2011-02-21 09:17:46 PST
Comment on
attachment 82961
[details]
style fixes View in context:
https://bugs.webkit.org/attachment.cgi?id=82961&action=review
> Source/WebCore/ChangeLog:9 > + to the string they operate on. In this patch, the reference is only used to
Extra space here.
chris reiss
Comment 25
2011-02-21 10:25:17 PST
Created
attachment 83180
[details]
Using RefPtr's to safely reuse textiterators Fix spacing in changelog.
Alexis Menard (darktears)
Comment 26
2011-02-21 10:32:58 PST
Comment on
attachment 83180
[details]
Using RefPtr's to safely reuse textiterators View in context:
https://bugs.webkit.org/attachment.cgi?id=83180&action=review
> Source/WebCore/ChangeLog:9 > + to the string they operate on. In this patch, the reference is only used to
Sorry it's one space after the dot :(.
chris reiss
Comment 27
2011-02-21 10:40:38 PST
Created
attachment 83185
[details]
Using RefPtr's to safely reuse textiterators Another correction to spaces in Changelog. Happy to fix it, thanks Alexis.
Eric Seidel (no email)
Comment 28
2011-02-24 02:51:32 PST
I'm not sure I understand this patch. It's not Qt only however since you affect all platforms.
chris reiss
Comment 29
2011-02-24 08:39:34 PST
This is an optimization. When we're rendering lots of text in RenderBlock::findNextLineBreak( ), we make many calls to isBreakable( ), which in turns creates a lot of TextBreakIterators. This got very expensive in the Qt port, for a <div> with lots of text, 0.7 seconds were spent in setUpIterator. See
comment 16
. Within the RenderBlock::findNextLine( ) loop, we're passing the same string repeatedly to isBreakable( ). So we'd like to reuse the TextBreakIterator from last time. We'd like setupTextIterator to recognize that the string hasn't changed. The wrinkle is that a raw UChar* is passed to setUpIterator. The pointer may not have changed since last time, but there's an outside chance that the memory has been deleted and reallocated (isBreakable() has lots of call sites.) So setupTextIterator( ) is never sure that it can reuse the last iterator. This patch gives the call site the option to pass a RefPtr string to isBreakable( ). If the parameter is present, setupTextIterator will use it to avoid recreating iterators on the same string. So I needed to add that parameter to code outside the Qt port. The hope is - it's essentially a NOOP for other ports so they wouldn't mind too much, and perhaps even find it useful at some point.
mitz
Comment 30
2011-02-24 08:50:01 PST
I think the "optimization" has just been removed in
http://trac.webkit.org/changeset/79567
and Ned Holbrook has a patch that provides a safe, cross-platform optimization.
mitz
Comment 31
2011-02-24 08:51:29 PST
See
bug 54912
.
Andreas Kling
Comment 32
2011-03-18 09:13:55 PDT
Comment on
attachment 83185
[details]
Using RefPtr's to safely reuse textiterators This patch needs to be updated to match the current code in TextBreakIteratorQt. I'm not sure the patch makes sense anymore though, is there still a performance issue with the cross-platform optimization in place?
Andreas Kling
Comment 33
2011-03-18 09:16:14 PDT
Unblocking 2.2 release as this is no longer about the crash that was fixed in
bug 55139
.)
Jocelyn Turcotte
Comment 34
2014-02-03 03:50:48 PST
=== Bulk closing of Qt bugs === If you believe that this bug report is still relevant for a non-Qt port of webkit.org, please re-open it. If you believe that this is still an important QtWebKit bug, please fill a new report at
https://bugreports.qt-project.org
and add a link to this issue. See
http://qt-project.org/wiki/ReportingBugsInQt
for additional guidelines.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug