Bug 106303

Summary: text-align:justify separate U+3033 from U+3035
Product: WebKit Reporter: Yuki Sekiguchi <yuki.sekiguchi>
Component: Layout and RenderingAssignee: Nobody <webkit-unassigned>
Status: UNCONFIRMED    
Severity: Normal CC: ahmad.saleem792, dino, eric, glenn, koivisto, kojii, mitz, ojan.autocc, webkit.review.bot, zalan
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.7   
Bug Depends on: 89235    
Bug Blocks:    
Attachments:
Description Flags
inseparable.html. Reproduced content for justification.
none
inseparable-line-break.html. Reproduced content for line breaking.
none
Patch none

Yuki Sekiguchi
Reported 2013-01-07 23:23:23 PST
Created attachment 181652 [details] inseparable.html. Reproduced content for justification. In the attached inseparable.html, U+3033 should not be separated from U+3035, but it is separated. This bug is only reproduced on Mac, because other platforms don't expand between ideographs. Requirements for Japanese Text Layout say not to separate the characters. http://www.w3.org/TR/jlreq/#character_sequences_which_do_not_allow_space_insertion_as_part_of_line_adjustment_processing > Combinations of character classes which allow spaces to be inserted for line alignment, are described as a complete table in Appendix E Opportunities for Inter-character Space Expansion during Line Adjustment, following 3.9 About Character Classes. In 3.9 About Character Classes, U+3033 and U+3035 are Inseparable characters (cl-08). In 4th note in Appendix E.2 Notes: http://www.w3.org/TR/jlreq/#opportunities_for_intercharacter_space_expansion_during_line_adjustment > A third order opportunity exists for inter-character space expansion, to take up to a maximum of a quarter em space, with respect to the corresponding character size, between two consecutive inseparable characters (cl-08) which are of different kinds. Therefore, we should not separate separate U+3033 from U+3035. Line breaking also is occurred between U+3033 and U+3035. Please watch inseparable-line-break.html. Requirements for Japanese Text Layout say not to break line between the characters. http://www.w3.org/TR/jlreq/#possibilities_for_linebreaking_between_characters In 5th note in C.2 Notes: > There is no line break opportunity between following couple of consecutive inseparable characters (cl-08) as follows: > VERTICAL KANA REPEAT MARK UPPER HALF "〳", VERTICAL KANA REPEAT MARK LOWER HALF "〵" > VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF "〴", VERTICAL KANA REPEAT MARK LOWER HALF "〵"
Attachments
inseparable.html. Reproduced content for justification. (229 bytes, text/html)
2013-01-07 23:23 PST, Yuki Sekiguchi
no flags
inseparable-line-break.html. Reproduced content for line breaking. (171 bytes, text/html)
2013-01-07 23:24 PST, Yuki Sekiguchi
no flags
Patch (20.55 KB, patch)
2013-01-07 23:44 PST, Yuki Sekiguchi
no flags
Yuki Sekiguchi
Comment 1 2013-01-07 23:24:51 PST
Created attachment 181653 [details] inseparable-line-break.html. Reproduced content for line breaking.
Yuki Sekiguchi
Comment 2 2013-01-07 23:44:29 PST
Glenn Adams
Comment 3 2013-01-08 08:40:09 PST
(1) line break opportunities need to be determined by ICU and not use a hardcoded escape around ICU such as Font::isUnbreakableCharactersPair; (2) the JLREQ document [1] is not a W3C recommendation; it is a collection of input requirements being considered for preparing normative recommendations, such as CSS3 Text, the current draft of which defines the recommended behavior in [2][3]; (3) the current Unicode Line Break class database marks U+3033 and U+3035 as ID (Ideograph) class, and not IN (Inseperable); in general, ICU and CSS3 Text make normative reference to this database for determining line break classes; (4) there is already a pending patch in process [5] which will be adding line-break property support according to [3][6][7], so any change for JLREQ related line breaking should be handled as part of [5]; [1] http://www.w3.org/TR/2012/NOTE-jlreq-20120403/ [2] http://dev.w3.org/csswg/css3-text/#line-break-details [3] http://dev.w3.org/csswg/css3-text/#line-break [4] http://www.unicode.org/Public/UNIDATA/LineBreak.txt [5] http://bugs.webkit.org/show_bug.cgi?id=89235 [6] http://trac.webkit.org/wiki/LineBreaking [7] http://trac.webkit.org/wiki/LineBreakingCSS3Mapping
Glenn Adams
Comment 4 2013-01-08 08:41:04 PST
mark as dependent on bug 89235 to resolve line break semantics for japanese
Yuki Sekiguchi
Comment 5 2013-01-08 20:14:06 PST
Thank you, Glenn. Your advice is very helpful to me. I will ask CSS guys and Unicode guys to follow JLREQ behavior. Therefore, I currently remove review flag.
Koji Ishii
Comment 6 2013-03-02 03:03:26 PST
Unicode 6.3 will fix line break property for U+3035 to CM. It will be propagated when ICU incorporates new data from CLDR. Please be prepared, ANY * CM will not break, and not to justify between them.
Ahmad Saleem
Comment 7 2023-02-03 05:56:25 PST
inseparable.html. Reproduced content for justification. <- WebKit Trunk, Chrome Canary 112 and Firefox Nightly 111 match each other. inseparable-line-break.html. Reproduced content for line breaking. <- WebKit Trunk & Chrome Canary 112 match each other but Firefox Nightly 111 differ in this. I am not sure on the desired behavior in the last test, so will tag others to comment about whether it is something need to be fixed in WebKit or not. Thanks!
Note You need to log in before you can comment on or make changes to this bug.