Bug 210502 - [GTK] TextNode::splitText() can lose content visually
Summary: [GTK] TextNode::splitText() can lose content visually
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-14 09:42 PDT by Milan Crha
Modified: 2020-04-14 09:42 PDT (History)
1 user (show)

See Also:


Attachments
How it looks like in Firefox (8.67 KB, image/png)
2020-04-14 09:42 PDT, Milan Crha
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Milan Crha 2020-04-14 09:42:09 PDT
Created attachment 396429 [details]
How it looks like in Firefox

Just noticed that calling splitText() in the middle of a multi-unicode character causes content lost on both sides. This is with trunk at r259630.

Steps:
a) run: MiniBrowser --editor-mode
b) open the Inspector and in its console run: document.body.innerText = "😏😉🙂"
c) still in the inspector run: document.body.firstChild.splitText(2)
   * all is fine, the Elements tab shows the text properly split into one and two Emojis
d) still in the inspector run: document.body.firstChild.nextSibling.splitText(1)

The outcome after d) are three text nodes in the body, the first showing the first Emoji, the second being empty text, the third with probably two letters, looks like whitespaces, though:

   document.body.firstChild.nextSibling.nodeValue.length
   1
   document.body.firstChild.nextSibling.nodeValue.charCodeAt(0)
   55357

   document.body.firstChild.nextSibling.nextSibling.nodeValue.length
   3
   document.body.firstChild.nextSibling.nextSibling.nodeValue.charCodeAt(0)
   56841
   document.body.firstChild.nextSibling.nextSibling.nodeValue.charCodeAt(1)
   55357
   document.body.firstChild.nextSibling.nextSibling.nodeValue.charCodeAt(2)
   56898

I do not know what to expect from this, but that one can break "a letter" in the middle and have it completely lost with the next letter is not ideal.

Calling:
 - document.body.normalize() fixes the situation like being after the step b).
 - it seems the splitText() is correct (see above), but the visual interpretation is broken (at least the second Emoji might be visible, it may not look like a whitespace).

I tried with Firefox (67.0) and it behaves similarly (also two characters per Emoji), but the splitText call has no impact on the visual interpretation in the document body. It has impact on the interpretation in the Inspector (the inspector shows letters it cannot visualize as rectangles with the hexa code).

-------------------------------------------

Side notes:

Are there any sequences using multi-unicode characters, like in some Chinese variants or such?

That the Emoji occupies two characters is impractical with line length calculations too, even though they are drawn as a single character. I know of "composite" Emojis, which is even bigger nightmare on many fronts.