Align ISO-2022-JP and Shift_JIS encodings with Chrome, Firefox, and the specification
Created attachment 407712 [details] Patch
Comment on attachment 407712 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=407712&action=review > Source/WebCore/platform/text/TextCodecCJK.cpp:244 > + auto characters = string.upconvertedCharacters(); > + for (WTF::CodePointIterator<UChar> iterator(characters.get(), characters.get() + string.length()); !iterator.atEnd(); ++iterator) { I didn’t notice this on previous passes. Why not just do this? for (auto codePoint : string.codePoints()) I don’t understand why it is better to use StringView::upconvertedCharacters and WTF::CodePointIterator<UChar> instead. > Source/WebCore/platform/text/TextCodecCJK.cpp:262 > + auto characters = string.upconvertedCharacters(); > + for (WTF::CodePointIterator<UChar> iterator(characters.get(), characters.get() + string.length()); !iterator.atEnd(); ++iterator) { Ditto. > Source/WebCore/platform/text/TextCodecCJK.h:40 > enum class Encoding : uint8_t { > EUC_JP, > + ISO2022JP, > + Shift_JIS, > Big5 > }; I guess you don’t agree, but I find that "all on one line" nicer for simple enumerations like this one. > Source/WebCore/platform/text/TextCodecCJK.h:60 > + enum class ISO2022JPEncoderState : uint8_t { > + ASCII, > + Roman, > + jis0208 > + }; Ditto.
Looks like I've got a few regressions to fix still. Will fix before landing. (In reply to Darin Adler from comment #2) > Comment on attachment 407712 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=407712&action=review > > > Source/WebCore/platform/text/TextCodecCJK.cpp:244 > > + auto characters = string.upconvertedCharacters(); > > + for (WTF::CodePointIterator<UChar> iterator(characters.get(), characters.get() + string.length()); !iterator.atEnd(); ++iterator) { > > I didn’t notice this on previous passes. Why not just do this? > > for (auto codePoint : string.codePoints()) > > I don’t understand why it is better to use StringView::upconvertedCharacters > and WTF::CodePointIterator<UChar> instead. StringView::CodePoints::Iterator has to check whether the StringView is 8-bit or 16-bit every time it increments to the next code unit. It also stores a pointer to the StringView which has a pointer to the code units, so there is lots of dereferencing. We should probably optimize it, but here CodePointIterator is faster. StringView::upconvertedCharacters is probably free if you are using CJK encodings because the string almost certainly contains at least one non-Latin1 code point. > > Source/WebCore/platform/text/TextCodecCJK.h:40 > > enum class Encoding : uint8_t { > > EUC_JP, > > + ISO2022JP, > > + Shift_JIS, > > Big5 > > }; > > I guess you don’t agree, but I find that "all on one line" nicer for simple > enumerations like this one. This one is being incrementally added to. This allows the version control history to be nicer. > > Source/WebCore/platform/text/TextCodecCJK.h:60 > > + enum class ISO2022JPEncoderState : uint8_t { > > + ASCII, > > + Roman, > > + jis0208 > > + }; > > Ditto. This one should probably be all on one line because it is simple and finished.
Comment on attachment 407712 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=407712&action=review > Source/WebCore/platform/text/TextCodecCJK.cpp:164 > + auto statefulUnencodableHandler = [&] (UChar32 codePoint, Vector<uint8_t>& result) { > + changeStateToASCIIIfNeeded(); Here I need to not switch the state to ASCII if the state is ISO2022JPEncoderState::Roman. I'll make that change to not regress anything. > Source/WebCore/platform/text/TextCodecCJK.cpp:169 > + if ((m_iso2022JPEncoderState == ISO2022JPEncoderState::ASCII || m_iso2022JPEncoderState == ISO2022JPEncoderState::Roman) && (codePoint == 0x000E || codePoint == 0x000F || codePoint == 0x001B)) { It looks like this line of the specification does not match what Chrome, Firefox, and Safari all do with these code points. I'll remove this statement to match behavior of browsers.
Created attachment 407720 [details] Patch
Committed r266448: <https://trac.webkit.org/changeset/266448> All reviewed patches have been landed. Closing bug and clearing flags on attachment 407720 [details].
<rdar://problem/68182742>