Bug 213318

Summary: Text manipulation does not observe manipulated text after update
Product: WebKit Reporter: Sihui Liu <sihui_liu>
Component: HTML EditingAssignee: Sihui Liu <sihui_liu>
Status: RESOLVED FIXED    
Severity: Normal CC: cdumez, darin, esprehn+autocc, ews-watchlist, ggaren, kangil.han, mifenton, webkit-bug-importer, wenson_hsieh
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 213333    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch for landing none

Description Sihui Liu 2020-06-17 15:13:30 PDT
...
Comment 1 Sihui Liu 2020-06-17 15:14:42 PDT
<rdar://problem/63766703>
Comment 2 Sihui Liu 2020-06-17 15:18:39 PDT
Created attachment 402159 [details]
Patch
Comment 3 Sihui Liu 2020-06-17 15:20:36 PDT
Created attachment 402160 [details]
Patch
Comment 4 Sihui Liu 2020-06-26 12:46:54 PDT
Created attachment 402887 [details]
Patch
Comment 5 Darin Adler 2020-06-26 12:49:32 PDT
Comment on attachment 402887 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402887&action=review

> Source/WebCore/dom/CharacterData.cpp:70
> +    if (is<Text>(*this) && !m_data.contains(oldData)) {

This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check.

Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check.
Comment 6 Wenson Hsieh 2020-06-26 13:50:41 PDT
Comment on attachment 402887 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402887&action=review

>> Source/WebCore/dom/CharacterData.cpp:70
>> +    if (is<Text>(*this) && !m_data.contains(oldData)) {
> 
> This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check.
> 
> Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check.

I believe `!m_data.contains(oldData)` is just a translation-specific heuristic in this case, since the translation framework does not handle text that has already been partially translated to the target language. For instance, if we have a text node like "one two" that is translated to "一 二", and then the page's script later sets this text to "three four", we'd want to re-mark it for translation so that it can be translated to "三 四". However, if the web page appends new text in the source language — for example, "一 二 three four" — it is better to avoid re-translating this text.

That said, it does sound expensive to check for `contains` every time here; maybe we should just omit the `contains` check, or only do it if the text manipulation controller exists?
Comment 7 Darin Adler 2020-06-26 13:51:30 PDT
Comment on attachment 402887 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402887&action=review

>>> Source/WebCore/dom/CharacterData.cpp:70
>>> +    if (is<Text>(*this) && !m_data.contains(oldData)) {
>> 
>> This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check.
>> 
>> Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check.
> 
> I believe `!m_data.contains(oldData)` is just a translation-specific heuristic in this case, since the translation framework does not handle text that has already been partially translated to the target language. For instance, if we have a text node like "one two" that is translated to "一 二", and then the page's script later sets this text to "three four", we'd want to re-mark it for translation so that it can be translated to "三 四". However, if the web page appends new text in the source language — for example, "一 二 three four" — it is better to avoid re-translating this text.
> 
> That said, it does sound expensive to check for `contains` every time here; maybe we should just omit the `contains` check, or only do it if the text manipulation controller exists?

Can contains be a coincidence?
Comment 8 Wenson Hsieh 2020-06-26 13:59:09 PDT
Comment on attachment 402887 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402887&action=review

>>>> Source/WebCore/dom/CharacterData.cpp:70
>>>> +    if (is<Text>(*this) && !m_data.contains(oldData)) {
>>> 
>>> This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check.
>>> 
>>> Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check.
>> 
>> I believe `!m_data.contains(oldData)` is just a translation-specific heuristic in this case, since the translation framework does not handle text that has already been partially translated to the target language. For instance, if we have a text node like "one two" that is translated to "一 二", and then the page's script later sets this text to "three four", we'd want to re-mark it for translation so that it can be translated to "三 四". However, if the web page appends new text in the source language — for example, "一 二 three four" — it is better to avoid re-translating this text.
>> 
>> That said, it does sound expensive to check for `contains` every time here; maybe we should just omit the `contains` check, or only do it if the text manipulation controller exists?
> 
> Can contains be a coincidence?

Indeed — it could very well be a coincidence. For instance, maybe the original text was just a single space or symbol...

My understanding is that this is just a heuristic that tends to work well in practice. @Sihui — were there any websites where this `contains` check is necessary for compat? If not, perhaps we could simplify this a bit by calling didUpdateContentForText if the text changed at all.
Comment 9 Sihui Liu 2020-06-26 14:08:18 PDT
(In reply to Wenson Hsieh from comment #8)
> Comment on attachment 402887 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=402887&action=review
> 
> >>>> Source/WebCore/dom/CharacterData.cpp:70
> >>>> +    if (is<Text>(*this) && !m_data.contains(oldData)) {
> >>> 
> >>> This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check.
> >>> 
> >>> Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check.
> >> 
> >> I believe `!m_data.contains(oldData)` is just a translation-specific heuristic in this case, since the translation framework does not handle text that has already been partially translated to the target language. For instance, if we have a text node like "one two" that is translated to "一 二", and then the page's script later sets this text to "three four", we'd want to re-mark it for translation so that it can be translated to "三 四". However, if the web page appends new text in the source language — for example, "一 二 three four" — it is better to avoid re-translating this text.
> >> 
> >> That said, it does sound expensive to check for `contains` every time here; maybe we should just omit the `contains` check, or only do it if the text manipulation controller exists?
> > 
> > Can contains be a coincidence?
> 
> Indeed — it could very well be a coincidence. For instance, maybe the
> original text was just a single space or symbol...
> 
> My understanding is that this is just a heuristic that tends to work well in
> practice. @Sihui — were there any websites where this `contains` check is
> necessary for compat? If not, perhaps we could simplify this a bit by
> calling didUpdateContentForText if the text changed at all.

This check is not necessary for the radar. I will remove it.
Comment 10 Sihui Liu 2020-06-26 14:52:12 PDT
(In reply to Darin Adler from comment #5)
> Comment on attachment 402887 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=402887&action=review
> 
> > Source/WebCore/dom/CharacterData.cpp:70
> > +    if (is<Text>(*this) && !m_data.contains(oldData)) {
> 
> This contains check seems peculiar. The new data might happen to contain the
> old data by coincidence; not sure if it’s a meaningful check.
> 
> Also, the is<Text> check seems peculiar. Normally this would be done by
> making the setData function virtual rather than adding a derived class check.

I found setData is called in multiple places: CharacterData::setNodeValue, replaceChildrenWithFragment, Text::replaceWholeText, etc, so I just added a general check. I can add didUpdateContentForText to each of them if it's preferred.
Comment 11 Sihui Liu 2020-06-26 15:21:46 PDT
Created attachment 402912 [details]
Patch
Comment 12 Darin Adler 2020-06-26 15:38:37 PDT
Comment on attachment 402912 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402912&action=review

> Source/WebCore/dom/CharacterData.cpp:190
> +    if (is<Text>(*this)) {

Can we make the setNodeValue function be virtual and overridden in the Text class, instead of doing if here? That’s the pattern for C++ programming. Or is that poorer for performance?

And/or why isn’t this in the setData function? Are there calls to setData that must not do this?
Comment 13 Sihui Liu 2020-06-26 16:25:14 PDT
(In reply to Darin Adler from comment #12)
> Comment on attachment 402912 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=402912&action=review
> 
> > Source/WebCore/dom/CharacterData.cpp:190
> > +    if (is<Text>(*this)) {
> 
> Can we make the setNodeValue function be virtual and overridden in the Text
> class, instead of doing if here? That’s the pattern for C++ programming. Or
> is that poorer for performance?
> 

Sure, will update the patch by overriding setData in Text.

> And/or why isn’t this in the setData function? Are there calls to setData
> that must not do this?

No, I just misunderstood your previous comment.
Comment 14 Sihui Liu 2020-06-26 16:25:54 PDT
Created attachment 402922 [details]
Patch
Comment 15 Darin Adler 2020-06-28 12:50:48 PDT
Comment on attachment 402922 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402922&action=review

> Source/WebCore/dom/CharacterData.h:35
> +    WEBCORE_EXPORT virtual void setData(const String&);

Looking even more closely, it seems like the real bottleneck is CharacterData::setDataAndUpdate, called in various cases where setData is not called.

So maybe it would be even better to put the new code there.
Comment 16 Sihui Liu 2020-06-29 09:57:08 PDT
(In reply to Darin Adler from comment #15)
> Comment on attachment 402922 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=402922&action=review
> 
> > Source/WebCore/dom/CharacterData.h:35
> > +    WEBCORE_EXPORT virtual void setData(const String&);
> 
> Looking even more closely, it seems like the real bottleneck is
> CharacterData::setDataAndUpdate, called in various cases where setData is
> not called.
> 
> So maybe it would be even better to put the new code there.

Sure, that should work too.
Comment 17 Sihui Liu 2020-06-29 09:58:52 PDT
Created attachment 403077 [details]
Patch
Comment 18 Geoffrey Garen 2020-07-10 15:52:00 PDT
Comment on attachment 403077 [details]
Patch

r=me

Looks like all comments have been addressed and incorporated.
Comment 19 Sihui Liu 2020-07-13 10:01:48 PDT
Created attachment 404153 [details]
Patch for landing
Comment 20 EWS 2020-07-13 10:36:26 PDT
Committed r264305: <https://trac.webkit.org/changeset/264305>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 404153 [details].