Bug 242822 - Em dash should not be separated from preceding word
Summary: Em dash should not be separated from preceding word
Status: RESOLVED MOVED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: Safari 15
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
: 21677 (view as bug list)
Depends on:
Blocks:
 
Reported: 2022-07-15 15:12 PDT by Brad Andalman
Modified: 2022-09-08 08:36 PDT (History)
7 users (show)

See Also:


Attachments
HTML that shows incorrect word wrap for a word followed by an em dash (1.33 KB, text/html)
2022-07-15 15:12 PDT, Brad Andalman
no flags Details
Screenshot of Safari, Chrome, and Firefox (601.96 KB, image/png)
2022-07-15 15:14 PDT, Brad Andalman
no flags Details
Test case (381 bytes, text/html)
2022-07-15 16:43 PDT, zalan
no flags Details
Apple Books showing em dash and quotation mark on its own line (653.75 KB, image/png)
2022-07-15 18:53 PDT, Brad Andalman
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Brad Andalman 2022-07-15 15:12:08 PDT
Created attachment 460938 [details]
HTML that shows incorrect word wrap for a word followed by an em dash

When an em dash immediately follows a word, and that em dash can't fit on a line, then both the preceding word and the em dash should be moved to the next line. This works for hyphens, en dashes, and figure dashes, but does not work for em dashes.

Both Safari and Chrome exhibit this bug. Firefox, however, behaves correctly.
Comment 1 Brad Andalman 2022-07-15 15:14:44 PDT
Created attachment 460939 [details]
Screenshot of Safari, Chrome, and Firefox

Screenshot of Safari, Chrome, and Firefox rendering the HTML in the first attachment. Safari and Chrome both exhibit the bug. Firefox, on the right, behaves correctly.
Comment 2 zalan 2022-07-15 16:43:50 PDT
Created attachment 460941 [details]
Test case

Apparently ubrk_following() returns
position 2 for XX[em dash]XX
and
position 3 for XX[figure dash]XX
so we find a soft wrap opportunity between XX and [em dash].
(not sure how FF resolve this. we strictly rely on ICU here)
Comment 3 Alexey Proskuryakov 2022-07-15 18:06:11 PDT
This looks like correct behavior per UAX #14. It also matches TextEdit.
Comment 4 Brad Andalman 2022-07-15 18:52:35 PDT
UAX#14 does assert that "Line breaks can occur before and after an EM DASH." It also claims that the only use for an EM DASH is to "set off parenthetical text." That is only one of the ways that an EM DASH can be used, however.

The Chicago Manual of Style, for instance, enumerates EIGHT different, valid uses for an EM DASH. In entry 6.87 of the 17th edition, the Chicago Manual of Style mentions that an EM DASH should be used for "sudden breaks or interruptions." One of the examples it uses is as follows:

"Well, I don't know," I began tentatively. "I thought I might—”
"Might what?" she demanded.

If that trailing EM DASH followed by a quotation mark were to end on its own line, it would look terrible. This is easy to make happen on a simple web page, as in my original attachment, but it is easily seen in Apple Books as well. (I'll attach a screenshot of The Invisible Man that illustrates this.)

The Chicago Manual of Style also addresses the problem of line breaks directly (in 6.90): "In printed publications, line breaks should generally be made after an em dash but not before, in the manner of hyphens. In the case of a closing quotation mark (or any other mark of punctuation) immediately following the dash, however, the quotation mark and dash MUST NOT BE BROKEN AT THE END OF A LINE" [emphasis mine].
Comment 5 Brad Andalman 2022-07-15 18:53:24 PDT
Created attachment 460950 [details]
Apple Books showing em dash and quotation mark on its own line
Comment 6 Alexey Proskuryakov 2022-07-16 13:34:57 PDT
An author can implement the desired behavior with a zero width joiner (e.g. "sir‍—" for the attached test), among other ways.

While the CSS spec is not fully prescriptive on exactly following UAX #14, it does reference it as the baseline. So WebKit is not wrong here, and given that Chrome behaves in the same way, keeping our current behavior is best for compatibility.

https://drafts.csswg.org/css-text/#soft-wrap-opportunity
Comment 7 Myles C. Maxfield 2022-07-16 21:05:07 PDT
WebKit treats ICU as the source-of-truth for line breaking behavior. If you want this to be fixed, I recommend reporting this to the ICU project instead at https://unicode-org.atlassian.net/jira/software/c/projects/ICU/issues/?filter=allissues
Comment 8 Myles C. Maxfield 2022-07-16 21:06:00 PDT
> If that trailing EM DASH followed by a quotation mark were to end on its own line, it would look terrible.

I agree, but this needs to be fixed in ICU, not WebKit.
Comment 9 Brad Andalman 2022-07-18 10:20:44 PDT
Filed with ICU here: https://unicode-org.atlassian.net/browse/ICU-22090
Comment 10 Brad Andalman 2022-07-18 10:27:28 PDT
Thanks for helping me find the right place to report this!
Comment 11 Brent Fulgham 2022-07-18 11:54:36 PDT
Reclassifying as MOVED (as the bug is in the ICU component). The bug is not INVALID.
Comment 12 Myles C. Maxfield 2022-07-18 12:14:53 PDT
Thank you fo refiling!
Comment 13 Brad Andalman 2022-07-19 10:56:11 PDT
I was informed that filing with the ICU wasn't correct, so I refiled it as an error against UAX#14. My comments have been added to PRI #446 for feedback:
https://www.unicode.org/review/pri446/

Once again, thanks to everyone for helping me submit this to the right venue. I truly appreciate it!
Comment 14 zalan 2022-07-19 11:00:41 PDT
(In reply to Brad Andalman from comment #13)
> I was informed that filing with the ICU wasn't correct, so I refiled it as
> an error against UAX#14. My comments have been added to PRI #446 for
> feedback:
> https://www.unicode.org/review/pri446/
> 
> Once again, thanks to everyone for helping me submit this to the right
> venue. I truly appreciate it!
Thank you for filing it! When the fix comes through both WebKit and Chrome will progress!
Comment 15 Myles C. Maxfield 2022-09-07 21:59:55 PDT
*** Bug 21677 has been marked as a duplicate of this bug. ***
Comment 16 Karl Dubost 2022-09-08 08:36:48 PDT
see Also the opposite behavior in https://bugzilla.mozilla.org/show_bug.cgi?id=1269147