Bug 210502

Summary: [GTK] TextNode::splitText() can lose content visually
Product: WebKit Reporter: Milan Crha <mcrha>
Component: WebKitGTKAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: bugs-noreply
Priority: P2    
Version: Other   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
How it looks like in Firefox none

Description Milan Crha 2020-04-14 09:42:09 PDT
Created attachment 396429 [details]
How it looks like in Firefox

Just noticed that calling splitText() in the middle of a multi-unicode character causes content lost on both sides. This is with trunk at r259630.

Steps:
a) run: MiniBrowser --editor-mode
b) open the Inspector and in its console run: document.body.innerText = "😏😉🙂"
c) still in the inspector run: document.body.firstChild.splitText(2)
   * all is fine, the Elements tab shows the text properly split into one and two Emojis
d) still in the inspector run: document.body.firstChild.nextSibling.splitText(1)

The outcome after d) are three text nodes in the body, the first showing the first Emoji, the second being empty text, the third with probably two letters, looks like whitespaces, though:

   document.body.firstChild.nextSibling.nodeValue.length
   1
   document.body.firstChild.nextSibling.nodeValue.charCodeAt(0)
   55357

   document.body.firstChild.nextSibling.nextSibling.nodeValue.length
   3
   document.body.firstChild.nextSibling.nextSibling.nodeValue.charCodeAt(0)
   56841
   document.body.firstChild.nextSibling.nextSibling.nodeValue.charCodeAt(1)
   55357
   document.body.firstChild.nextSibling.nextSibling.nodeValue.charCodeAt(2)
   56898

I do not know what to expect from this, but that one can break "a letter" in the middle and have it completely lost with the next letter is not ideal.

Calling:
 - document.body.normalize() fixes the situation like being after the step b).
 - it seems the splitText() is correct (see above), but the visual interpretation is broken (at least the second Emoji might be visible, it may not look like a whitespace).

I tried with Firefox (67.0) and it behaves similarly (also two characters per Emoji), but the splitText call has no impact on the visual interpretation in the document body. It has impact on the interpretation in the Inspector (the inspector shows letters it cannot visualize as rectangles with the hexa code).

-------------------------------------------

Side notes:

Are there any sequences using multi-unicode characters, like in some Chinese variants or such?

That the Emoji occupies two characters is impractical with line length calculations too, even though they are drawn as a single character. I know of "composite" Emojis, which is even bigger nightmare on many fronts.