RESOLVED INVALID 53203
CSS 2.1 failure: bidi-breaking-002.htm test fails (PARAGRAPH SEPARATOR does not break line)
https://bugs.webkit.org/show_bug.cgi?id=53203
Summary CSS 2.1 failure: bidi-breaking-002.htm test fails (PARAGRAPH SEPARATOR does n...
Levi Weintraub
Reported 2011-01-26 16:47:03 PST
Created attachment 80264 [details] Test Case (bidi-breaking-002.htm) html4/bidi-breaking-002 fails
Attachments
Test Case (bidi-breaking-002.htm) (3.29 KB, text/html)
2011-01-26 16:47 PST, Levi Weintraub
no flags
Make lines break on LINE and PARAGRAPH SEPARATOR characters (3.12 KB, patch)
2011-01-27 16:20 PST, Levi Weintraub
no flags
Patch (50.03 KB, patch)
2011-01-31 13:57 PST, Levi Weintraub
no flags
Patch (51.55 KB, patch)
2011-02-01 14:41 PST, Levi Weintraub
no flags
Patch (52.83 KB, patch)
2011-05-02 18:12 PDT, Levi Weintraub
no flags
Patch (52.78 KB, patch)
2011-06-21 14:48 PDT, Levi Weintraub
eric: review-
the W3C test case that currently tests for this (http://test.csswg.org/source/approved/css2.1/src/bidi-text/bidi-breaking-003.xht) (80 bytes, text/plain)
2011-08-03 05:41 PDT, Aharon (Vladimir) Lanin
no flags
Updated Status in Safari 15.5 (1.03 MB, image/png)
2022-06-21 16:55 PDT, Ahmad Saleem
no flags
Levi Weintraub
Comment 1 2011-01-27 16:20:24 PST
Created attachment 80374 [details] Make lines break on LINE and PARAGRAPH SEPARATOR characters This adds the LINE and PARAGRAPH SEPARATOR characters as valid line break characters for the RenderBlock::findNextLineBreak algorithm. There is still a bug that the line is being measured as if the line break isn't happening. I'd love a hint about where to look to find how this is happening!
Levi Weintraub
Comment 2 2011-01-27 17:09:29 PST
(In reply to comment #1) > Created an attachment (id=80374) [details] > Make lines break on LINE and PARAGRAPH SEPARATOR characters > > This adds the LINE and PARAGRAPH SEPARATOR characters as valid line break characters for the RenderBlock::findNextLineBreak algorithm. > > There is still a bug that the line is being measured as if the line break isn't happening. I'd love a hint about where to look to find how this is happening! Got help finding it from Evan Martin. Spinning a patch now.
Levi Weintraub
Comment 3 2011-01-31 13:57:55 PST
Levi Weintraub
Comment 4 2011-01-31 13:59:33 PST
(In reply to comment #3) > Created an attachment (id=80678) [details] > Patch This renders properly and doesn't break any known layout tests, but we admittedly don't have real coverage for line and paragraph separators.
Darin Adler
Comment 5 2011-01-31 15:39:33 PST
Comment on attachment 80678 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=80678&action=review > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1711 > + bool betweenWords = c == '\n' || category(c) & (Separator_Line | Separator_Paragraph) || (currWS != PRE && !atStart && isBreakable(str, pos, strlen, nextBreakable, breakNBSP) && (style->hyphens() != HyphensNone || (pos && str[pos - 1] != softHyphen))); I think this is going to be a huge slowdown. The category function is expensive.
Levi Weintraub
Comment 6 2011-01-31 15:43:25 PST
(In reply to comment #5) > (From update of attachment 80678 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=80678&action=review > > > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1711 > > + bool betweenWords = c == '\n' || category(c) & (Separator_Line | Separator_Paragraph) || (currWS != PRE && !atStart && isBreakable(str, pos, strlen, nextBreakable, breakNBSP) && (style->hyphens() != HyphensNone || (pos && str[pos - 1] != softHyphen))); > > I think this is going to be a huge slowdown. The category function is expensive. Thanks for taking a look! I worried that could be an issue... Any advice on an optimization? I want to keep things clean and I'm afraid that Unicode category stuff is outside of my expertise.
Levi Weintraub
Comment 7 2011-01-31 16:40:40 PST
(In reply to comment #6) > (In reply to comment #5) > > (From update of attachment 80678 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=80678&action=review > > > > > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1711 > > > + bool betweenWords = c == '\n' || category(c) & (Separator_Line | Separator_Paragraph) || (currWS != PRE && !atStart && isBreakable(str, pos, strlen, nextBreakable, breakNBSP) && (style->hyphens() != HyphensNone || (pos && str[pos - 1] != softHyphen))); > > > > I think this is going to be a huge slowdown. The category function is expensive. > > Thanks for taking a look! > > I worried that could be an issue... Any advice on an optimization? I want to keep things clean and I'm afraid that Unicode category stuff is outside of my expertise. I also realized that the constant re-use of category(c) & (Line_Separator & Paragraph_Separator) should be replaced with a function. Now I need an optimization to make that check efficient :)
WebKit Review Bot
Comment 8 2011-01-31 16:52:53 PST
Levi Weintraub
Comment 9 2011-01-31 17:09:39 PST
Comment on attachment 80678 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=80678&action=review > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1796 > + if (c == '\n' && preserveNewline || category(c) & (Separator_Line | Separator_Paragraph)) { This needs parenthesis. Also removing r? while I figure out how to make this fast enough to be usable.
Early Warning System Bot
Comment 10 2011-01-31 17:41:30 PST
Levi Weintraub
Comment 11 2011-02-01 14:41:15 PST
Levi Weintraub
Comment 12 2011-02-01 14:45:30 PST
(In reply to comment #11) > Created an attachment (id=80831) [details] > Patch I'm now doing a direct comparison with the UChar values for Line and Paragraph Separator. Assuming this is acceptable, this should resolve the performance issue present before.
Levi Weintraub
Comment 13 2011-02-08 15:45:38 PST
(In reply to comment #11) > Created an attachment (id=80831) [details] > Patch Can someone take a look at this and see if it's acceptable? Also, if my method for checking against Paragraph and Line Separators is correct, I'd love to stick that re-occurring code into a function, but I'm not sure where the best place for it is.
Levi Weintraub
Comment 14 2011-02-22 17:47:33 PST
Anyone? Review please?
Eric Seidel (no email)
Comment 15 2011-02-24 03:00:08 PST
Mitz or Ap are your best bet here.
Levi Weintraub
Comment 16 2011-05-02 18:12:12 PDT
Levi Weintraub
Comment 17 2011-05-02 18:15:22 PDT
(In reply to comment #16) > Created an attachment (id=92025) [details] > Patch Updating since the original patch grew stale. Anyone willing to take a look before it happens again? You know you want WebKit to support Unicode Line and Paragraph Separators!
WebKit Review Bot
Comment 18 2011-05-02 20:47:13 PDT
Attachment 92025 [details] did not pass chromium-ews: Output: http://queues.webkit.org/results/8530853 New failing tests: fast/css/bidi-breaking-002.html
Adam Barth
Comment 19 2011-05-02 22:44:32 PDT
(In reply to comment #18) > Attachment 92025 [details] did not pass chromium-ews: > Output: http://queues.webkit.org/results/8530853 > New failing tests: > fast/css/bidi-breaking-002.html Looks like this patch will cause this test to fail on Chromium Linux.
Levi Weintraub
Comment 20 2011-05-04 14:08:10 PDT
(In reply to comment #19) > (In reply to comment #18) > > Attachment 92025 [details] [details] did not pass chromium-ews: > > Output: http://queues.webkit.org/results/8530853 > > New failing tests: > > fast/css/bidi-breaking-002.html > > Looks like this patch will cause this test to fail on Chromium Linux. Are the results available? This is the new test I'm adding with the patch :p
Levi Weintraub
Comment 21 2011-06-10 15:58:03 PDT
The disinterest in fixing this makes me sad :(
Eric Seidel (no email)
Comment 22 2011-06-10 17:53:14 PDT
You and I will make a date to go through this in person next week. Ok?
Levi Weintraub
Comment 23 2011-06-10 18:19:48 PDT
(In reply to comment #22) > You and I will make a date to go through this in person next week. Ok? Deal, thanks! :)
Levi Weintraub
Comment 24 2011-06-21 14:48:03 PDT
Eric Seidel (no email)
Comment 25 2011-06-21 15:05:08 PDT
Comment on attachment 98059 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98059&action=review > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2117 > bool betweenWords = c == '\n' || (currWS != PRE && !atStart && isBreakable(lineBreakIteratorInfo.second, current.m_pos, current.m_nextBreakablePosition, breakNBSP) > - && (style->hyphens() != HyphensNone || (current.previousInSameNode() != softHyphen))); > + && (style->hyphens() != HyphensNone || (current.previousInSameNode() != softHyphen))) || isUnicodeLineBreak(c); So this would be better if the \n check was rolled into the unicode one. > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2198 > + if ((c == '\n' && preserveNewline) || isUnicodeLineBreak(c)) { We need a comment here as to why unicode linebreaks ignore preserveNewline. > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2206 > - lineInfo.setPreviousLineBrokeCleanly(true); > + lineInfo.setPreviousLineBrokeCleanly(c != WTF::Unicode::lineSeparator); This needs a comment as well. :) If this only affects the bidi algorithm, we should have a better name for this. Like bidiAlgorithResetsOnNextLine... (not really that much better, but I'm sure we can come up with something better!) > Source/WebCore/rendering/RenderText.cpp:706 > + while (i + linelen < len && text[i + linelen] != '\n' && !(isUnicodeLineBreak(text[i + linelen]))) Good use for a function which includes both \n and the unicode ones. > Source/WebCore/rendering/RenderText.cpp:821 > + } else if (isUnicodeLineBreak(c)) { > + m_hasBreak = true; > + isNewline = true; > + isSpace = false; Seems OK. Should we be rolling this into the \n check above? > Source/WebCore/rendering/RenderText.cpp:852 > + while (c != '\n' && !isSpaceAccordingToStyle(c, style()) && c != '\t' && c != softHyphen && !(isUnicodeLineBreak(c))) { This should again use an inline whihc checks all of \n and the unicode newlines. Checking \n first of course. :) > Source/WebCore/rendering/break_lines.h:44 > +inline bool isUnicodeLineBreak(UChar c) > +{ > + return c == WTF::Unicode::lineSeparator || c == WTF::Unicode::paragraphSeparator; > +} > + I believe we should have a isLineBreak inline which includes \n and possibly \r. And then use that everywhere instead of calling this unicode-only one in addition to the explicit \n checks. When you add a second function here, we may need to add comments about when to use each of the two newline-checking-functions.
Aharon (Vladimir) Lanin
Comment 26 2011-07-18 14:14:10 PDT
This bug should block #50910 (Master bug for HTML5 Bidi).
Aharon (Vladimir) Lanin
Comment 27 2011-08-03 05:41:04 PDT
Aharon (Vladimir) Lanin
Comment 28 2011-08-03 05:42:54 PDT
(In reply to comment #26) > This bug should block #50910 (Master bug for HTML5 Bidi). Levi, could you please mark this bug as blocking 50910.
Levi Weintraub
Comment 29 2011-08-03 09:48:30 PDT
Of course :)
Randy Hudson
Comment 30 2013-04-24 09:42:08 PDT
The behavior requested by this bug is incorrect. This character is used in BiDi text. The purpose of the paragraph separator is to break apart unrelated segments of text within the SAME line/paragraph. For example: "A description of something that happened – 12 hours ago" When the "description" ends with RTL content, in order to make " – 12 hours ago" always appear at the trailing end of the line, a paragraph separator can be inserted in front of it. The only effect is should have is on bidi layout. The following should all render the same way ("WERBEH" is used to mean Hebrew, spelled using the Hebrew alphabet): A description ending in WERBEH &#x2029;- 12 hours ago &#x202a;A description ending in WERBEH&#x202c; - 12 hours ago <span style="unicode-bidi:isolate;direction=ltr;">A description ending in WERBEH </span>- 12 hours ago <bdi>A description ending in WERBEH </bdi>- 12 hours ago (202a is LRE, 202c is PDF) This character does not belong to the "line separator" category of the unicode specification.
Aharon (Vladimir) Lanin
Comment 31 2013-04-30 05:56:28 PDT
(In reply to comment #30) > The purpose of the paragraph separator is to break apart unrelated segments > of text within the SAME line > [...] > The only effect is should have is on bidi layout > [as opposed to causing a new line to be started - > if I understood you correctly] Can you give any reference for this? I believe this statement is incorrect. UTR#20 (http://www.unicode.org/reports/tr20/#Line) says that PARAGRAPH SEPARATOR is supposed to be used as an unambiguous replacement for newline (which implies it should indeed break a line), and that in HTML it should be replaced with the use of <xhtml:p> and </xhtml:p> (which, of course, will cause line breaks by default). (I obviously agree that in HTML, mark-up should be used to denote paragraphs. The test is about what should happen when a page nevertheless does use PARAGRAPH SEPARATOR, despite UTR#20 discouraging its use.)
Ahmad Saleem
Comment 32 2022-06-21 16:55:19 PDT
Created attachment 460391 [details] Updated Status in Safari 15.5 I am unclear on expected result across following test (attached) but please refer to attached screenshot for updated result across all browsers (Chrome Canary 105, Firefox Nightly) and Safari 15.5. Thanks!
Note You need to log in before you can comment on or make changes to this bug.