| Summary: | Text manipulation does not observe manipulated text after update | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | WebKit | Reporter: | Sihui Liu <sihui_liu> | ||||||||||||||||
| Component: | HTML Editing | Assignee: | Sihui Liu <sihui_liu> | ||||||||||||||||
| Status: | RESOLVED FIXED | ||||||||||||||||||
| Severity: | Normal | CC: | cdumez, darin, esprehn+autocc, ews-watchlist, ggaren, kangil.han, mifenton, webkit-bug-importer, wenson_hsieh | ||||||||||||||||
| Priority: | P2 | Keywords: | InRadar | ||||||||||||||||
| Version: | WebKit Nightly Build | ||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||
| Bug Depends on: | 213333 | ||||||||||||||||||
| Bug Blocks: | |||||||||||||||||||
| Attachments: |
|
||||||||||||||||||
|
Description
Sihui Liu
2020-06-17 15:13:30 PDT
Created attachment 402159 [details]
Patch
Created attachment 402160 [details]
Patch
Created attachment 402887 [details]
Patch
Comment on attachment 402887 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402887&action=review > Source/WebCore/dom/CharacterData.cpp:70 > + if (is<Text>(*this) && !m_data.contains(oldData)) { This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check. Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check. Comment on attachment 402887 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402887&action=review >> Source/WebCore/dom/CharacterData.cpp:70 >> + if (is<Text>(*this) && !m_data.contains(oldData)) { > > This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check. > > Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check. I believe `!m_data.contains(oldData)` is just a translation-specific heuristic in this case, since the translation framework does not handle text that has already been partially translated to the target language. For instance, if we have a text node like "one two" that is translated to "一 二", and then the page's script later sets this text to "three four", we'd want to re-mark it for translation so that it can be translated to "三 四". However, if the web page appends new text in the source language — for example, "一 二 three four" — it is better to avoid re-translating this text. That said, it does sound expensive to check for `contains` every time here; maybe we should just omit the `contains` check, or only do it if the text manipulation controller exists? Comment on attachment 402887 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402887&action=review >>> Source/WebCore/dom/CharacterData.cpp:70 >>> + if (is<Text>(*this) && !m_data.contains(oldData)) { >> >> This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check. >> >> Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check. > > I believe `!m_data.contains(oldData)` is just a translation-specific heuristic in this case, since the translation framework does not handle text that has already been partially translated to the target language. For instance, if we have a text node like "one two" that is translated to "一 二", and then the page's script later sets this text to "three four", we'd want to re-mark it for translation so that it can be translated to "三 四". However, if the web page appends new text in the source language — for example, "一 二 three four" — it is better to avoid re-translating this text. > > That said, it does sound expensive to check for `contains` every time here; maybe we should just omit the `contains` check, or only do it if the text manipulation controller exists? Can contains be a coincidence? Comment on attachment 402887 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402887&action=review >>>> Source/WebCore/dom/CharacterData.cpp:70 >>>> + if (is<Text>(*this) && !m_data.contains(oldData)) { >>> >>> This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check. >>> >>> Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check. >> >> I believe `!m_data.contains(oldData)` is just a translation-specific heuristic in this case, since the translation framework does not handle text that has already been partially translated to the target language. For instance, if we have a text node like "one two" that is translated to "一 二", and then the page's script later sets this text to "three four", we'd want to re-mark it for translation so that it can be translated to "三 四". However, if the web page appends new text in the source language — for example, "一 二 three four" — it is better to avoid re-translating this text. >> >> That said, it does sound expensive to check for `contains` every time here; maybe we should just omit the `contains` check, or only do it if the text manipulation controller exists? > > Can contains be a coincidence? Indeed — it could very well be a coincidence. For instance, maybe the original text was just a single space or symbol... My understanding is that this is just a heuristic that tends to work well in practice. @Sihui — were there any websites where this `contains` check is necessary for compat? If not, perhaps we could simplify this a bit by calling didUpdateContentForText if the text changed at all. (In reply to Wenson Hsieh from comment #8) > Comment on attachment 402887 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=402887&action=review > > >>>> Source/WebCore/dom/CharacterData.cpp:70 > >>>> + if (is<Text>(*this) && !m_data.contains(oldData)) { > >>> > >>> This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check. > >>> > >>> Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check. > >> > >> I believe `!m_data.contains(oldData)` is just a translation-specific heuristic in this case, since the translation framework does not handle text that has already been partially translated to the target language. For instance, if we have a text node like "one two" that is translated to "一 二", and then the page's script later sets this text to "three four", we'd want to re-mark it for translation so that it can be translated to "三 四". However, if the web page appends new text in the source language — for example, "一 二 three four" — it is better to avoid re-translating this text. > >> > >> That said, it does sound expensive to check for `contains` every time here; maybe we should just omit the `contains` check, or only do it if the text manipulation controller exists? > > > > Can contains be a coincidence? > > Indeed — it could very well be a coincidence. For instance, maybe the > original text was just a single space or symbol... > > My understanding is that this is just a heuristic that tends to work well in > practice. @Sihui — were there any websites where this `contains` check is > necessary for compat? If not, perhaps we could simplify this a bit by > calling didUpdateContentForText if the text changed at all. This check is not necessary for the radar. I will remove it. (In reply to Darin Adler from comment #5) > Comment on attachment 402887 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=402887&action=review > > > Source/WebCore/dom/CharacterData.cpp:70 > > + if (is<Text>(*this) && !m_data.contains(oldData)) { > > This contains check seems peculiar. The new data might happen to contain the > old data by coincidence; not sure if it’s a meaningful check. > > Also, the is<Text> check seems peculiar. Normally this would be done by > making the setData function virtual rather than adding a derived class check. I found setData is called in multiple places: CharacterData::setNodeValue, replaceChildrenWithFragment, Text::replaceWholeText, etc, so I just added a general check. I can add didUpdateContentForText to each of them if it's preferred. Created attachment 402912 [details]
Patch
Comment on attachment 402912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402912&action=review > Source/WebCore/dom/CharacterData.cpp:190 > + if (is<Text>(*this)) { Can we make the setNodeValue function be virtual and overridden in the Text class, instead of doing if here? That’s the pattern for C++ programming. Or is that poorer for performance? And/or why isn’t this in the setData function? Are there calls to setData that must not do this? (In reply to Darin Adler from comment #12) > Comment on attachment 402912 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=402912&action=review > > > Source/WebCore/dom/CharacterData.cpp:190 > > + if (is<Text>(*this)) { > > Can we make the setNodeValue function be virtual and overridden in the Text > class, instead of doing if here? That’s the pattern for C++ programming. Or > is that poorer for performance? > Sure, will update the patch by overriding setData in Text. > And/or why isn’t this in the setData function? Are there calls to setData > that must not do this? No, I just misunderstood your previous comment. Created attachment 402922 [details]
Patch
Comment on attachment 402922 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402922&action=review > Source/WebCore/dom/CharacterData.h:35 > + WEBCORE_EXPORT virtual void setData(const String&); Looking even more closely, it seems like the real bottleneck is CharacterData::setDataAndUpdate, called in various cases where setData is not called. So maybe it would be even better to put the new code there. (In reply to Darin Adler from comment #15) > Comment on attachment 402922 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=402922&action=review > > > Source/WebCore/dom/CharacterData.h:35 > > + WEBCORE_EXPORT virtual void setData(const String&); > > Looking even more closely, it seems like the real bottleneck is > CharacterData::setDataAndUpdate, called in various cases where setData is > not called. > > So maybe it would be even better to put the new code there. Sure, that should work too. Created attachment 403077 [details]
Patch
Comment on attachment 403077 [details]
Patch
r=me
Looks like all comments have been addressed and incorporated.
Created attachment 404153 [details]
Patch for landing
Committed r264305: <https://trac.webkit.org/changeset/264305> All reviewed patches have been landed. Closing bug and clearing flags on attachment 404153 [details]. |