RESOLVED INVALID 22962
Web page encoded as "Big 5 HKSCS" is not decoded properly
https://bugs.webkit.org/show_bug.cgi?id=22962
Summary Web page encoded as "Big 5 HKSCS" is not decoded properly
David Kilzer (:ddkilzer)
Reported 2008-12-22 08:11:32 PST
* SUMMARY Web page with "Big5" encoding specified in <meta> tag (and Content-Type sent as "text/html") is not detected as having "Big 5 HKSCS" encoding and is thus not decoded properly. The same page loaded in Firefox 3 is detected and decoded properly. * STEPS TO REPRODUCE 1. Launch Safari/WebKit. 2. Open URL: http://www.mingpaonews.com/20081222/gaa1h.htm * RESULTS Note square boxes in the text of the story, and how the text differs after switching to "Big 5 HKSCS" encoding via the "Text Encoding" item in the View menu. * REGRESSION Unknown. Tested Safari 3.2.1 on Mac OS X 10.5.6 and a local debug build of WebKit r39423. Both showed the same behavior. * NOTES Firefox 3 gets it right, so WebKit should be using a similar heuristic.
Attachments
David Kilzer (:ddkilzer)
Comment 1 2008-12-22 08:19:56 PST
Alexey Proskuryakov
Comment 2 2008-12-22 08:50:50 PST
This page uses an encoding that is different from either Big5 variant supported by Safari - note the replacement characters that appear after forcing the encoding to Big 5 HKSCS.
Alexey Proskuryakov
Comment 3 2008-12-22 08:54:45 PST
Dave, do you know for a fact that Firefox decodes the text 100% correctly? Or just that it has no square boxes, question marks and other obvious brokenness?
David Kilzer (:ddkilzer)
Comment 4 2008-12-22 09:04:34 PST
(In reply to comment #3) > Dave, do you know for a fact that Firefox decodes the text 100% correctly? Or > just that it has no square boxes, question marks and other obvious brokenness? Scrolling down the page, I see replacement characters in Firefox 3 as well. They're "?" characters without black diamonds around them.
David Kilzer (:ddkilzer)
Comment 5 2008-12-22 09:04:57 PST
I wonder if MSIE 6/7/8 handle this page any better?
Alexey Proskuryakov
Comment 6 2008-12-22 09:11:26 PST
(In reply to comment #4) > Scrolling down the page, I see replacement characters in Firefox 3 as well. > They're "?" characters without black diamonds around them. Are you sure about that? These looked like normal question marks to me.
David Kilzer (:ddkilzer)
Comment 7 2008-12-22 09:19:09 PST
(In reply to comment #6) > (In reply to comment #4) > > Scrolling down the page, I see replacement characters in Firefox 3 as well. > > They're "?" characters without black diamonds around them. > > Are you sure about that? These looked like normal question marks to me. No, I am not sure. I do not read Chinese. :) I don't see any "square boxes" or question-marks-in-black-diamonds on the page in Firefox 3. I *do* see a character that looks like "No" with the "o" superscript and underlined (&#8470;) in the Firefox page that doesn't appear in the Safari page with "Big 5 HKSCS" encoding. Also note that the black diamonds in Desktop Safari when switching text encoding to "Big 5 HKSCS" are simply colons on the Firefox 3 page. Could this be a missing glyph or a decoding bug?
David Kilzer (:ddkilzer)
Comment 8 2008-12-22 09:21:01 PST
The equivalent character from Desktop Safari (to the "No" character in Firefox 3): &#22050;
Eric Seidel (no email)
Comment 9 2012-10-24 12:44:03 PDT
It's unclear to me if this is still an issue.
Sam Sneddon [:gsnedders]
Comment 10 2022-09-16 15:20:48 PDT
Archive.org doesn't seem to have archived this either, so it's not meaningfully actionable as I can tell.
Note You need to log in before you can comment on or make changes to this bug.