| Summary: | Em dash should not be separated from preceding word | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | Brad Andalman <bya> |
| Component: | Text | Assignee: | Nobody <webkit-unassigned> |
| Status: | RESOLVED MOVED | ||
| Severity: | Normal | CC: | ap, bfulgham, jasneet, karlcow, mmaxfield, simon.fraser, zalan |
| Priority: | P2 | ||
| Version: | Safari 15 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Attachments: | |||
Created attachment 460939 [details]
Screenshot of Safari, Chrome, and Firefox
Screenshot of Safari, Chrome, and Firefox rendering the HTML in the first attachment. Safari and Chrome both exhibit the bug. Firefox, on the right, behaves correctly.
Created attachment 460941 [details]
Test case
Apparently ubrk_following() returns
position 2 for XX[em dash]XX
and
position 3 for XX[figure dash]XX
so we find a soft wrap opportunity between XX and [em dash].
(not sure how FF resolve this. we strictly rely on ICU here)
This looks like correct behavior per UAX #14. It also matches TextEdit. UAX#14 does assert that "Line breaks can occur before and after an EM DASH." It also claims that the only use for an EM DASH is to "set off parenthetical text." That is only one of the ways that an EM DASH can be used, however. The Chicago Manual of Style, for instance, enumerates EIGHT different, valid uses for an EM DASH. In entry 6.87 of the 17th edition, the Chicago Manual of Style mentions that an EM DASH should be used for "sudden breaks or interruptions." One of the examples it uses is as follows: "Well, I don't know," I began tentatively. "I thought I might—” "Might what?" she demanded. If that trailing EM DASH followed by a quotation mark were to end on its own line, it would look terrible. This is easy to make happen on a simple web page, as in my original attachment, but it is easily seen in Apple Books as well. (I'll attach a screenshot of The Invisible Man that illustrates this.) The Chicago Manual of Style also addresses the problem of line breaks directly (in 6.90): "In printed publications, line breaks should generally be made after an em dash but not before, in the manner of hyphens. In the case of a closing quotation mark (or any other mark of punctuation) immediately following the dash, however, the quotation mark and dash MUST NOT BE BROKEN AT THE END OF A LINE" [emphasis mine]. Created attachment 460950 [details]
Apple Books showing em dash and quotation mark on its own line
An author can implement the desired behavior with a zero width joiner (e.g. "sir‍—" for the attached test), among other ways. While the CSS spec is not fully prescriptive on exactly following UAX #14, it does reference it as the baseline. So WebKit is not wrong here, and given that Chrome behaves in the same way, keeping our current behavior is best for compatibility. https://drafts.csswg.org/css-text/#soft-wrap-opportunity WebKit treats ICU as the source-of-truth for line breaking behavior. If you want this to be fixed, I recommend reporting this to the ICU project instead at https://unicode-org.atlassian.net/jira/software/c/projects/ICU/issues/?filter=allissues > If that trailing EM DASH followed by a quotation mark were to end on its own line, it would look terrible.
I agree, but this needs to be fixed in ICU, not WebKit.
Filed with ICU here: https://unicode-org.atlassian.net/browse/ICU-22090 Thanks for helping me find the right place to report this! Reclassifying as MOVED (as the bug is in the ICU component). The bug is not INVALID. Thank you fo refiling! I was informed that filing with the ICU wasn't correct, so I refiled it as an error against UAX#14. My comments have been added to PRI #446 for feedback: https://www.unicode.org/review/pri446/ Once again, thanks to everyone for helping me submit this to the right venue. I truly appreciate it! (In reply to Brad Andalman from comment #13) > I was informed that filing with the ICU wasn't correct, so I refiled it as > an error against UAX#14. My comments have been added to PRI #446 for > feedback: > https://www.unicode.org/review/pri446/ > > Once again, thanks to everyone for helping me submit this to the right > venue. I truly appreciate it! Thank you for filing it! When the fix comes through both WebKit and Chrome will progress! *** Bug 21677 has been marked as a duplicate of this bug. *** see Also the opposite behavior in https://bugzilla.mozilla.org/show_bug.cgi?id=1269147 |
Created attachment 460938 [details] HTML that shows incorrect word wrap for a word followed by an em dash When an em dash immediately follows a word, and that em dash can't fit on a line, then both the preceding word and the em dash should be moved to the next line. This works for hyphens, en dashes, and figure dashes, but does not work for em dashes. Both Safari and Chrome exhibit this bug. Firefox, however, behaves correctly.