WebKit Bugzilla
Attachment 368529 Details for
Bug 195535
: WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support
Home
|
New
|
Browse
|
Search
|
[?]
|
Reports
|
Requests
|
Help
|
New Account
|
Log In
Remember
[x]
|
Forgot Password
Login:
[x]
[patch]
Patch
bug-195535-20190429190303.patch (text/plain), 69.53 KB, created by
Darin Adler
on 2019-04-29 19:03:07 PDT
(
hide
)
Description:
Patch
Filename:
MIME Type:
Creator:
Darin Adler
Created:
2019-04-29 19:03:07 PDT
Size:
69.53 KB
patch
obsolete
>Subversion Revision: 244762 >diff --git a/Source/JavaScriptCore/ChangeLog b/Source/JavaScriptCore/ChangeLog >index d0b0abac68bb43ae2556fd0357216ada96dbead2..8e516fb25eb1417af15c54b19b297894340780a8 100644 >--- a/Source/JavaScriptCore/ChangeLog >+++ b/Source/JavaScriptCore/ChangeLog >@@ -1,3 +1,28 @@ >+2019-04-29 Darin Adler <darin@apple.com> >+ >+ WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support >+ https://bugs.webkit.org/show_bug.cgi?id=195535 >+ >+ Reviewed by NOBODY (OOPS!). >+ >+ * API/JSClassRef.cpp: Removed uneeded include of UTF8Conversion.h. >+ >+ * API/JSStringRef.cpp: >+ (JSStringCreateWithUTF8CString): Updated for changes to convertUTF8ToUTF16. >+ (JSStringGetUTF8CString): Updated for changes to convertLatin1ToUTF8. >+ Removed unneeded "true" to get the strict version of convertUTF16ToUTF8, >+ since that is the default. Also updated for changes to CompletionResult. >+ >+ * runtime/JSGlobalObjectFunctions.cpp: >+ (JSC::decode): Stop using UTF8SequenceLength, and instead use U8_COUNT_TRAIL_BYTES >+ and U8_MAX_LENGTH. Instead of decodeUTF8Sequence, use U8_NEXT. Also use U_IS_BMP, >+ U_IS_SUPPLEMENTARY, U16_LEAD, U16_TRAIL, and U_IS_SURROGATE instead of our own >+ equivalents, since these macros from ICU are correct and efficient. >+ >+ * wasm/WasmParser.h: >+ (JSC::Wasm::Parser<SuccessType>::consumeUTF8String): Updated for changes to >+ convertUTF8ToUTF16. >+ > 2019-04-29 Yusuke Suzuki <ysuzuki@apple.com> > > normalizeMapKey should normalize NaN to one PureNaN bit pattern to make MapHash same >diff --git a/Source/WTF/ChangeLog b/Source/WTF/ChangeLog >index 83599a922aed400447c99120798be0e01a1fbdee..e13d85d2d25a588ab1bb226c90f2193fcc70913d 100644 >--- a/Source/WTF/ChangeLog >+++ b/Source/WTF/ChangeLog >@@ -1,3 +1,73 @@ >+2019-04-29 Darin Adler <darin@apple.com> >+ >+ WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support >+ https://bugs.webkit.org/show_bug.cgi?id=195535 >+ >+ Reviewed by NOBODY (OOPS!). >+ >+ * wtf/text/AtomicString.cpp: >+ (WTF::AtomicString::fromUTF8Internal): Added code to compute string length when the >+ end is nullptr; this behavior used to be implemented inside the >+ calculateStringHashAndLengthFromUTF8MaskingTop8Bits function. >+ >+ * wtf/text/AtomicStringImpl.cpp: >+ (WTF::HashAndUTF8CharactersTranslator::translate): Updated for change to >+ convertUTF8ToUTF16. >+ >+ * wtf/text/AtomicStringImpl.h: Took the WTF_EXPORT_PRIVATE off of the >+ AtomicStringImpl::addUTF8 function. This is used only inside a non-inlined function in >+ the AtomicString class and its behavior changed subtly in this patch; it's helpful >+ to document that it's not exported. >+ >+ * wtf/text/StringImpl.cpp: >+ (WTF::StringImpl::utf8Impl): Don't pass "true" for strictness to convertUTF16ToUTF8 >+ since strict is the default. Also updated for changes to ConversionResult. >+ (WTF::StringImpl::utf8ForCharacters): Updated for change to convertLatin1ToUTF8. >+ (WTF::StringImpl::tryGetUtf8ForRange const): Ditto. >+ >+ * wtf/text/StringView.cpp: Removed uneeded include of UTF8Conversion.h. >+ >+ * wtf/text/WTFString.cpp: >+ (WTF::String::fromUTF8): Updated for change to convertUTF8ToUTF16. >+ >+ * wtf/unicode/UTF8Conversion.cpp: >+ (WTF::Unicode::inlineUTF8SequenceLengthNonASCII): Deleted. >+ (WTF::Unicode::inlineUTF8SequenceLength): Deleted. >+ (WTF::Unicode::UTF8SequenceLength): Deleted. >+ (WTF::Unicode::decodeUTF8Sequence): Deleted. >+ (WTF::Unicode::convertLatin1ToUTF8): Use U8_APPEND, enabling us to remove >+ almost everything in the function. Also changed resturn value to be a boolean >+ to indicate success since there is only one possible failure (target exhausted). >+ There is room for further simplification, since most callers have lengths rather >+ than end pointers for the source buffer, and all but one caller supplies a buffer >+ size known to be sufficient, so those don't need a return value, nor do they need >+ to pass an end of buffer pointer. >+ (WTF::Unicode::convertUTF16ToUTF8): Use U_IS_LEAD, U_IS_TRAIL, >+ U16_GET_SUPPLEMENTARY, U_IS_SURROGATE, and U8_APPEND. Also changed behavior >+ for non-strict mode so that unpaired surrogates will be turned into the >+ replacement character instead of invalid UTF-8 sequences, because U8_APPEND >+ won't create an invalid UTF-8 sequence, and because we don't need to do that >+ for any good reason at any call site. >+ (WTF::Unicode::isLegalUTF8): Deleted. >+ (WTF::Unicode::readUTF8Sequence): Deleted. >+ (WTF::Unicode::convertUTF8ToUTF16): Use U8_NEXT instead of >+ inlineUTF8SequenceLength, isLegalUTF8, and readUTF8Sequence. Use >+ U16_APPEND instead of lots of code that does the same thing. There is >+ room for further simplification since most callers don't need the "all ASCII" >+ feature and could probably pass the arguments in a more natural way. >+ (WTF::Unicode::calculateStringHashAndLengthFromUTF8MaskingTop8Bits): >+ Use U8_NEXT instead of isLegalUTF8, readUTF8Sequence, and various >+ error handling checks for things that are handled by U8_NEXT. Also removed >+ support for passing nullptr for end to specify a null-terminated string. >+ (WTF::Unicode::equalUTF16WithUTF8): Ditto. >+ >+ * wtf/unicode/UTF8Conversion.h: Removed UTF8SequenceLength and >+ decodeUTF8Sequence. Changed the ConversionResult to match WebKit coding >+ style, with an eye toward perhaps removing it in the future. Changed >+ the convertUTF8ToUTF16 return value to a boolean and removed the "strict" >+ argument since no caller was passing false. Changed the convertLatin1ToUTF8 >+ return value to a boolean. Tweaked comments. >+ > 2019-04-29 Alex Christensen <achristensen@webkit.org> > > <rdar://problem/50299396> Fix internal High Sierra build >diff --git a/Source/WebCore/ChangeLog b/Source/WebCore/ChangeLog >index c0eea3e665eecc17a43323b8a922f2c3db0e7aa0..950246c40c503ceba267618748a512dc7e75f9b0 100644 >--- a/Source/WebCore/ChangeLog >+++ b/Source/WebCore/ChangeLog >@@ -1,3 +1,21 @@ >+2019-04-29 Darin Adler <darin@apple.com> >+ >+ WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support >+ https://bugs.webkit.org/show_bug.cgi?id=195535 >+ >+ Reviewed by NOBODY (OOPS!). >+ >+ * platform/SharedBuffer.cpp: >+ (WebCore::utf8Buffer): Removed unnecessary "strict" argument to convertUTF16ToUTF8 since >+ that is the default behavior. Also updated for changes to return values. >+ >+ * xml/XSLTProcessorLibxslt.cpp: >+ (WebCore::writeToStringBuilder): Removed unnecessary use of StringBuffer for a temporary >+ buffer for characters. Rewrote to use U8_NEXT and U16_APPEND directly. >+ >+ * xml/parser/XMLDocumentParserLibxml2.cpp: >+ (WebCore::convertUTF16EntityToUTF8): Updated for changes to CompletionResult. >+ > 2019-04-29 Truitt Savell <tsavell@apple.com> > > Unreviewed, rolling out r244755. >diff --git a/Source/WebKit/ChangeLog b/Source/WebKit/ChangeLog >index b20332c1a574a9ba6d271c9ddf4a7cd48b53b9fb..b8031c5d03e4e7c5369f7bff4c55bd07b815f87e 100644 >--- a/Source/WebKit/ChangeLog >+++ b/Source/WebKit/ChangeLog >@@ -1,3 +1,15 @@ >+2019-04-29 Darin Adler <darin@apple.com> >+ >+ WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support >+ https://bugs.webkit.org/show_bug.cgi?id=195535 >+ >+ Reviewed by NOBODY (OOPS!). >+ >+ * Shared/API/APIString.h: Removed uneeded includes and also switched to #pragma once. >+ >+ * Shared/API/c/WKString.cpp: Moved include of UTF8Conversion.h here. >+ (WKStringGetUTF8CStringImpl): Updated for changes to return values. >+ > 2019-04-29 Truitt Savell <tsavell@apple.com> > > Unreviewed, rolling out r244755. >diff --git a/Source/JavaScriptCore/API/JSClassRef.cpp b/Source/JavaScriptCore/API/JSClassRef.cpp >index 4cc97de40f3baca6973ca80be47dc907e9676496..d4583d8e16515d614a59c9acc383a3cc111f60ad 100644 >--- a/Source/JavaScriptCore/API/JSClassRef.cpp >+++ b/Source/JavaScriptCore/API/JSClassRef.cpp >@@ -35,10 +35,8 @@ > #include "ObjectPrototype.h" > #include "JSCInlines.h" > #include <wtf/text/StringHash.h> >-#include <wtf/unicode/UTF8Conversion.h> > > using namespace JSC; >-using namespace WTF::Unicode; > > const JSClassDefinition kJSClassDefinitionEmpty = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > >diff --git a/Source/JavaScriptCore/API/JSStringRef.cpp b/Source/JavaScriptCore/API/JSStringRef.cpp >index 04d1ca73d3eabfcc062d5f7628e16ae05965c6ec..f5cb875c3d2edbea7654528040ddc37cd492955c 100644 >--- a/Source/JavaScriptCore/API/JSStringRef.cpp >+++ b/Source/JavaScriptCore/API/JSStringRef.cpp >@@ -49,7 +49,7 @@ JSStringRef JSStringCreateWithUTF8CString(const char* string) > UChar* p = buffer.data(); > bool sourceIsAllASCII; > const LChar* stringStart = reinterpret_cast<const LChar*>(string); >- if (conversionOK == convertUTF8ToUTF16(&string, string + length, &p, p + length, &sourceIsAllASCII)) { >+ if (convertUTF8ToUTF16(string, string + length, &p, p + length, &sourceIsAllASCII)) { > if (sourceIsAllASCII) > return &OpaqueJSString::create(stringStart, length).leakRef(); > return &OpaqueJSString::create(buffer.data(), p - buffer.data()).leakRef(); >@@ -102,20 +102,18 @@ size_t JSStringGetUTF8CString(JSStringRef string, char* buffer, size_t bufferSiz > return 0; > > char* destination = buffer; >- ConversionResult result; >+ bool failed = false; > if (string->is8Bit()) { > const LChar* source = string->characters8(); >- result = convertLatin1ToUTF8(&source, source + string->length(), &destination, destination + bufferSize - 1); >+ convertLatin1ToUTF8(&source, source + string->length(), &destination, destination + bufferSize - 1); > } else { > const UChar* source = string->characters16(); >- result = convertUTF16ToUTF8(&source, source + string->length(), &destination, destination + bufferSize - 1, true); >+ auto result = convertUTF16ToUTF8(&source, source + string->length(), &destination, destination + bufferSize - 1); >+ failed = result != ConversionOK && result != TargetExhausted; > } > > *destination++ = '\0'; >- if (result != conversionOK && result != targetExhausted) >- return 0; >- >- return destination - buffer; >+ return failed ? 0 : destination - buffer; > } > > bool JSStringIsEqual(JSStringRef a, JSStringRef b) >diff --git a/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp b/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp >index 8edf49f1c843a179221f68cf4201dcb00b8af88e..4c37285306798642418117be69a316b0fc15f29d 100644 >--- a/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp >+++ b/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp >@@ -58,12 +58,9 @@ > #include <wtf/MathExtras.h> > #include <wtf/dtoa.h> > #include <wtf/text/StringBuilder.h> >-#include <wtf/unicode/UTF8Conversion.h> > > namespace JSC { > >-using namespace WTF::Unicode; >- > const ASCIILiteral ObjectProtoCalledOnNullOrUndefinedError { "Object.prototype.__proto__ called on null or undefined"_s }; > > template<unsigned charactersCount> >@@ -184,10 +181,10 @@ static JSValue decode(ExecState* exec, const CharType* characters, int length, c > int charLen = 0; > if (k <= length - 3 && isASCIIHexDigit(p[1]) && isASCIIHexDigit(p[2])) { > const char b0 = Lexer<CharType>::convertHex(p[1], p[2]); >- const int sequenceLen = UTF8SequenceLength(b0); >- if (sequenceLen && k <= length - sequenceLen * 3) { >+ const int sequenceLen = 1 + U8_COUNT_TRAIL_BYTES(b0); >+ if (k <= length - sequenceLen * 3) { > charLen = sequenceLen * 3; >- char sequence[5]; >+ uint8_t sequence[U8_MAX_LENGTH]; > sequence[0] = b0; > for (int i = 1; i < sequenceLen; ++i) { > const CharType* q = p + i * 3; >@@ -199,16 +196,20 @@ static JSValue decode(ExecState* exec, const CharType* characters, int length, c > } > } > if (charLen != 0) { >- sequence[sequenceLen] = 0; >- const int character = decodeUTF8Sequence(sequence); >- if (character < 0 || character >= 0x110000) >+ UChar32 character; >+ int32_t offset = 0; >+ U8_NEXT(sequence, offset, sequenceLen, character); >+ if (character < 0) > charLen = 0; >- else if (character >= 0x10000) { >+ else if (!U_IS_BMP(character)) { > // Convert to surrogate pair. >- builder.append(static_cast<UChar>(0xD800 | ((character - 0x10000) >> 10))); >- u = static_cast<UChar>(0xDC00 | ((character - 0x10000) & 0x3FF)); >- } else >+ ASSERT(U_IS_SUPPLEMENTARY(character)); >+ builder.append(U16_LEAD(character)); >+ u = U16_TRAIL(character); >+ } else { >+ ASSERT(!U_IS_SURROGATE(character)); > u = static_cast<UChar>(character); >+ } > } > } > } >diff --git a/Source/JavaScriptCore/wasm/WasmParser.h b/Source/JavaScriptCore/wasm/WasmParser.h >index fc500c033ac18bb4807ecf7607eaef2aa31e27a8..a9744e68b37421c6fedb37930636d5b16e75937a 100644 >--- a/Source/JavaScriptCore/wasm/WasmParser.h >+++ b/Source/JavaScriptCore/wasm/WasmParser.h >@@ -162,7 +162,7 @@ ALWAYS_INLINE bool Parser<SuccessType>::consumeUTF8String(Name& result, size_t s > > UChar* bufferCurrent = bufferStart; > const char* stringCurrent = reinterpret_cast<const char*>(stringStart); >- if (WTF::Unicode::convertUTF8ToUTF16(&stringCurrent, reinterpret_cast<const char *>(stringStart + stringLength), &bufferCurrent, bufferCurrent + buffer.size()) != WTF::Unicode::conversionOK) >+ if (!WTF::Unicode::convertUTF8ToUTF16(stringCurrent, reinterpret_cast<const char *>(stringStart + stringLength), &bufferCurrent, bufferCurrent + buffer.size())) > return false; > } > >diff --git a/Source/WTF/wtf/text/AtomicString.cpp b/Source/WTF/wtf/text/AtomicString.cpp >index fba7c9a774072869f58549d3a5117aea896d7bf7..ffdca7f9a76c0ca6df7c7a6d9c3912e7a57daad1 100644 >--- a/Source/WTF/wtf/text/AtomicString.cpp >+++ b/Source/WTF/wtf/text/AtomicString.cpp >@@ -113,19 +113,24 @@ AtomicString AtomicString::number(double number) > return numberToString(number, buffer); > } > >-AtomicString AtomicString::fromUTF8Internal(const char* charactersStart, const char* charactersEnd) >+AtomicString AtomicString::fromUTF8Internal(const char* start, const char* end) > { >- auto impl = AtomicStringImpl::addUTF8(charactersStart, charactersEnd); >- if (!impl) >- return nullAtom(); >- return impl.get(); >+ ASSERT(start); >+ >+ // Caller needs to handle empty string. >+ ASSERT(!end || end > start); >+ ASSERT(end || start[0]); >+ >+ return AtomicStringImpl::addUTF8(start, end ? end : start + std::strlen(start)); > } > > #ifndef NDEBUG >+ > void AtomicString::show() const > { > m_string.show(); > } >+ > #endif > > WTF_EXPORT_PRIVATE LazyNeverDestroyed<AtomicString> nullAtomData; >diff --git a/Source/WTF/wtf/text/AtomicStringImpl.cpp b/Source/WTF/wtf/text/AtomicStringImpl.cpp >index 14d82d2cc1cd066d7c44de37d9cab1fdbe9a7d91..601feb5fd33130bec46050236e2ec41d809c4fef 100644 >--- a/Source/WTF/wtf/text/AtomicStringImpl.cpp >+++ b/Source/WTF/wtf/text/AtomicStringImpl.cpp >@@ -219,7 +219,7 @@ struct HashAndUTF8CharactersTranslator { > > bool isAllASCII; > const char* source = buffer.characters; >- if (convertUTF8ToUTF16(&source, source + buffer.length, &target, target + buffer.utf16Length, &isAllASCII) != conversionOK) >+ if (!convertUTF8ToUTF16(source, source + buffer.length, &target, target + buffer.utf16Length, &isAllASCII)) > ASSERT_NOT_REACHED(); > > if (isAllASCII) >diff --git a/Source/WTF/wtf/text/AtomicStringImpl.h b/Source/WTF/wtf/text/AtomicStringImpl.h >index d84623c9c8ffbb55c2541a9dabb83bde2402b055..36a04f4c4b8c8c1518b18796168289df8a384ff2 100644 >--- a/Source/WTF/wtf/text/AtomicStringImpl.h >+++ b/Source/WTF/wtf/text/AtomicStringImpl.h >@@ -56,7 +56,8 @@ public: > WTF_EXPORT_PRIVATE static Ref<AtomicStringImpl> addLiteral(const char* characters, unsigned length); > > // Returns null if the input data contains an invalid UTF-8 sequence. >- WTF_EXPORT_PRIVATE static RefPtr<AtomicStringImpl> addUTF8(const char* start, const char* end); >+ static RefPtr<AtomicStringImpl> addUTF8(const char* start, const char* end); >+ > #if USE(CF) > WTF_EXPORT_PRIVATE static RefPtr<AtomicStringImpl> add(CFStringRef); > #endif >diff --git a/Source/WTF/wtf/text/StringImpl.cpp b/Source/WTF/wtf/text/StringImpl.cpp >index dd325ca0b3375faac065748afc4c6e5acfe070ec..0d80c32c7485d8f9c62934bb36efddefa475dde4 100644 >--- a/Source/WTF/wtf/text/StringImpl.cpp >+++ b/Source/WTF/wtf/text/StringImpl.cpp >@@ -1756,11 +1756,11 @@ UTF8ConversionError StringImpl::utf8Impl(const UChar* characters, unsigned lengt > char* bufferEnd = buffer + bufferSize; > while (characters < charactersEnd) { > // Use strict conversion to detect unpaired surrogates. >- ConversionResult result = convertUTF16ToUTF8(&characters, charactersEnd, &buffer, bufferEnd, true); >- ASSERT(result != targetExhausted); >+ auto result = convertUTF16ToUTF8(&characters, charactersEnd, &buffer, bufferEnd); >+ ASSERT(result != TargetExhausted); > // Conversion fails when there is an unpaired surrogate. > // Put replacement character (U+FFFD) instead of the unpaired surrogate. >- if (result != conversionOK) { >+ if (result != ConversionOK) { > ASSERT((0xD800 <= *characters && *characters <= 0xDFFF)); > // There should be room left, since one UChar hasn't been converted. > ASSERT((buffer + 3) <= bufferEnd); >@@ -1771,17 +1771,17 @@ UTF8ConversionError StringImpl::utf8Impl(const UChar* characters, unsigned lengt > } else { > bool strict = mode == StrictConversion; > const UChar* originalCharacters = characters; >- ConversionResult result = convertUTF16ToUTF8(&characters, characters + length, &buffer, buffer + bufferSize, strict); >- ASSERT(result != targetExhausted); // (length * 3) should be sufficient for any conversion >+ auto result = convertUTF16ToUTF8(&characters, characters + length, &buffer, buffer + bufferSize, strict); >+ ASSERT(result != TargetExhausted); // (length * 3) should be sufficient for any conversion > > // Only produced from strict conversion. >- if (result == sourceIllegal) { >+ if (result == SourceIllegal) { > ASSERT(strict); > return UTF8ConversionError::IllegalSource; > } > > // Check for an unconverted high surrogate. >- if (result == sourceExhausted) { >+ if (result == SourceExhausted) { > if (strict) > return UTF8ConversionError::SourceExhausted; > // This should be one unpaired high surrogate. Treat it the same >@@ -1809,8 +1809,8 @@ Expected<CString, UTF8ConversionError> StringImpl::utf8ForCharacters(const LChar > Vector<char, 1024> bufferVector(length * 3); > char* buffer = bufferVector.data(); > const LChar* source = characters; >- ConversionResult result = convertLatin1ToUTF8(&source, source + length, &buffer, buffer + bufferVector.size()); >- ASSERT_UNUSED(result, result != targetExhausted); // (length * 3) should be sufficient for any conversion >+ bool success = convertLatin1ToUTF8(&source, source + length, &buffer, buffer + bufferVector.size()); >+ ASSERT_UNUSED(success, success); // (length * 3) should be sufficient for any conversion > return CString(bufferVector.data(), buffer - bufferVector.data()); > } > >@@ -1854,9 +1854,8 @@ Expected<CString, UTF8ConversionError> StringImpl::tryGetUtf8ForRange(unsigned o > > if (is8Bit()) { > const LChar* characters = this->characters8() + offset; >- >- ConversionResult result = convertLatin1ToUTF8(&characters, characters + length, &buffer, buffer + bufferVector.size()); >- ASSERT_UNUSED(result, result != targetExhausted); // (length * 3) should be sufficient for any conversion >+ auto success = convertLatin1ToUTF8(&characters, characters + length, &buffer, buffer + bufferVector.size()); >+ ASSERT_UNUSED(success, success); // (length * 3) should be sufficient for any conversion > } else { > UTF8ConversionError error = utf8Impl(this->characters16() + offset, length, buffer, bufferVector.size(), mode); > if (error != UTF8ConversionError::None) >diff --git a/Source/WTF/wtf/text/StringView.cpp b/Source/WTF/wtf/text/StringView.cpp >index ee16b9cd0122c4469ce23fc714bc660d4d86c61f..78b5e2f5ad3f8fb2f746d23ddf28d0caad811ea3 100644 >--- a/Source/WTF/wtf/text/StringView.cpp >+++ b/Source/WTF/wtf/text/StringView.cpp >@@ -35,12 +35,9 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > #include <wtf/NeverDestroyed.h> > #include <wtf/Optional.h> > #include <wtf/text/TextBreakIterator.h> >-#include <wtf/unicode/UTF8Conversion.h> > > namespace WTF { > >-using namespace Unicode; >- > bool StringView::containsIgnoringASCIICase(const StringView& matchString) const > { > return findIgnoringASCIICase(matchString) != notFound; >diff --git a/Source/WTF/wtf/text/WTFString.cpp b/Source/WTF/wtf/text/WTFString.cpp >index dcd42cfb8f10d5a8574d152e0f2392de33d8780e..f5acba8374cba236f86eec5e7b7400b6bb6bb360 100644 >--- a/Source/WTF/wtf/text/WTFString.cpp >+++ b/Source/WTF/wtf/text/WTFString.cpp >@@ -859,7 +859,7 @@ String String::fromUTF8(const LChar* stringStart, size_t length) > > UChar* bufferCurrent = bufferStart; > const char* stringCurrent = reinterpret_cast<const char*>(stringStart); >- if (convertUTF8ToUTF16(&stringCurrent, reinterpret_cast<const char *>(stringStart + length), &bufferCurrent, bufferCurrent + buffer.size()) != conversionOK) >+ if (!convertUTF8ToUTF16(stringCurrent, reinterpret_cast<const char *>(stringStart + length), &bufferCurrent, bufferCurrent + buffer.size())) > return String(); > > unsigned utf16Length = bufferCurrent - bufferStart; >diff --git a/Source/WTF/wtf/unicode/UTF8Conversion.cpp b/Source/WTF/wtf/unicode/UTF8Conversion.cpp >index 6a4ee831573f2565ac6607804fac1f87006074e0..06eb92615cc5221f00d46fae5822b9acac7415ab 100644 >--- a/Source/WTF/wtf/unicode/UTF8Conversion.cpp >+++ b/Source/WTF/wtf/unicode/UTF8Conversion.cpp >@@ -1,5 +1,5 @@ > /* >- * Copyright (C) 2007, 2014 Apple Inc. All rights reserved. >+ * Copyright (C) 2007-2019 Apple Inc. All rights reserved. > * Copyright (C) 2010 Patrick Gansterer <paroga@paroga.com> > * > * Redistribution and use in source and binary forms, with or without >@@ -34,425 +34,131 @@ > namespace WTF { > namespace Unicode { > >-inline int inlineUTF8SequenceLengthNonASCII(char b0) >+bool convertLatin1ToUTF8(const LChar** sourceStart, const LChar* sourceEnd, char** targetStart, char* targetEnd) > { >- if ((b0 & 0xC0) != 0xC0) >- return 0; >- if ((b0 & 0xE0) == 0xC0) >- return 2; >- if ((b0 & 0xF0) == 0xE0) >- return 3; >- if ((b0 & 0xF8) == 0xF0) >- return 4; >- return 0; >-} >- >-inline int inlineUTF8SequenceLength(char b0) >-{ >- return isASCII(b0) ? 1 : inlineUTF8SequenceLengthNonASCII(b0); >-} >- >-int UTF8SequenceLength(char b0) >-{ >- return isASCII(b0) ? 1 : inlineUTF8SequenceLengthNonASCII(b0); >-} >- >-int decodeUTF8Sequence(const char* sequence) >-{ >- // Handle 0-byte sequences (never valid). >- const unsigned char b0 = sequence[0]; >- const int length = inlineUTF8SequenceLength(b0); >- if (length == 0) >- return -1; >- >- // Handle 1-byte sequences (plain ASCII). >- const unsigned char b1 = sequence[1]; >- if (length == 1) { >- if (b1) >- return -1; >- return b0; >- } >- >- // Handle 2-byte sequences. >- if ((b1 & 0xC0) != 0x80) >- return -1; >- const unsigned char b2 = sequence[2]; >- if (length == 2) { >- if (b2) >- return -1; >- const int c = ((b0 & 0x1F) << 6) | (b1 & 0x3F); >- if (c < 0x80) >- return -1; >- return c; >- } >- >- // Handle 3-byte sequences. >- if ((b2 & 0xC0) != 0x80) >- return -1; >- const unsigned char b3 = sequence[3]; >- if (length == 3) { >- if (b3) >- return -1; >- const int c = ((b0 & 0xF) << 12) | ((b1 & 0x3F) << 6) | (b2 & 0x3F); >- if (c < 0x800) >- return -1; >- // UTF-16 surrogates should never appear in UTF-8 data. >- if (c >= 0xD800 && c <= 0xDFFF) >- return -1; >- return c; >- } >- >- // Handle 4-byte sequences. >- if ((b3 & 0xC0) != 0x80) >- return -1; >- const unsigned char b4 = sequence[4]; >- if (length == 4) { >- if (b4) >- return -1; >- const int c = ((b0 & 0x7) << 18) | ((b1 & 0x3F) << 12) | ((b2 & 0x3F) << 6) | (b3 & 0x3F); >- if (c < 0x10000 || c > 0x10FFFF) >- return -1; >- return c; >- } >- >- return -1; >-} >- >-// Once the bits are split out into bytes of UTF-8, this is a mask OR-ed >-// into the first byte, depending on how many bytes follow. There are >-// as many entries in this table as there are UTF-8 sequence types. >-// (I.e., one byte sequence, two byte... etc.). Remember that sequencs >-// for *legal* UTF-8 will be 4 or fewer bytes total. >-static const unsigned char firstByteMark[7] = { 0x00, 0x00, 0xC0, 0xE0, 0xF0, 0xF8, 0xFC }; >- >-ConversionResult convertLatin1ToUTF8( >- const LChar** sourceStart, const LChar* sourceEnd, >- char** targetStart, char* targetEnd) >-{ >- ConversionResult result = conversionOK; >- const LChar* source = *sourceStart; >+ const LChar* source; > char* target = *targetStart; >- while (source < sourceEnd) { >- UChar32 ch; >- unsigned short bytesToWrite = 0; >- const UChar32 byteMask = 0xBF; >- const UChar32 byteMark = 0x80; >- const LChar* oldSource = source; // In case we have to back up because of target overflow. >- ch = static_cast<unsigned short>(*source++); >- >- // Figure out how many bytes the result will require >- if (ch < (UChar32)0x80) >- bytesToWrite = 1; >- else >- bytesToWrite = 2; >- >- target += bytesToWrite; >- if (target > targetEnd) { >- source = oldSource; // Back up source pointer! >- target -= bytesToWrite; >- result = targetExhausted; >- break; >- } >- switch (bytesToWrite) { // note: everything falls through. >- case 2: >- *--target = (char)((ch | byteMark) & byteMask); >- ch >>= 6; >- FALLTHROUGH; >- case 1: >- *--target = (char)(ch | firstByteMark[bytesToWrite]); >- } >- target += bytesToWrite; >+ unsigned i = 0; >+ for (source = *sourceStart; source < sourceEnd; ++source) { >+ UBool sawError = false; >+ // Work around bug in either Windows compiler or old version of ICU, where passing a uint8_t to >+ // U8_APPEND warns, by convering from uint8_t to a wider type. >+ UChar32 character = *source; >+ U8_APPEND(reinterpret_cast<uint8_t*>(target), i, targetEnd - *targetStart, character, sawError); >+ if (sawError) >+ return false; > } > *sourceStart = source; >- *targetStart = target; >- return result; >+ *targetStart = target + i; >+ return true; > } > >-ConversionResult convertUTF16ToUTF8( >- const UChar** sourceStart, const UChar* sourceEnd, >- char** targetStart, char* targetEnd, bool strict) >+ConversionResult convertUTF16ToUTF8(const UChar** sourceStart, const UChar* sourceEnd, char** targetStart, char* targetEnd, bool strict) > { >- ConversionResult result = conversionOK; >+ ConversionResult result = ConversionOK; > const UChar* source = *sourceStart; > char* target = *targetStart; >+ UBool sawError = false; >+ unsigned i = 0; > while (source < sourceEnd) { > UChar32 ch; >- unsigned short bytesToWrite = 0; >- const UChar32 byteMask = 0xBF; >- const UChar32 byteMark = 0x80; >- const UChar* oldSource = source; // In case we have to back up because of target overflow. >- ch = static_cast<unsigned short>(*source++); >- // If we have a surrogate pair, convert to UChar32 first. >- if (ch >= 0xD800 && ch <= 0xDBFF) { >- // If the 16 bits following the high surrogate are in the source buffer... >- if (source < sourceEnd) { >- UChar32 ch2 = static_cast<unsigned short>(*source); >- // If it's a low surrogate, convert to UChar32. >- if (ch2 >= 0xDC00 && ch2 <= 0xDFFF) { >- ch = ((ch - 0xD800) << 10) + (ch2 - 0xDC00) + 0x0010000; >- ++source; >- } else if (strict) { // it's an unpaired high surrogate >- --source; // return to the illegal value itself >- result = sourceIllegal; >- break; >- } >- } else { // We don't have the 16 bits following the high surrogate. >- --source; // return to the high surrogate >- result = sourceExhausted; >+ int j = 0; >+ U16_NEXT(source, j, sourceEnd - source, ch); >+ if (U_IS_SURROGATE(ch)) { >+ if (source + j == sourceEnd && U_IS_SURROGATE_LEAD(ch)) { >+ result = SourceExhausted; > break; > } >- } else if (strict) { >- // UTF-16 surrogate values are illegal in UTF-32 >- if (ch >= 0xDC00 && ch <= 0xDFFF) { >- --source; // return to the illegal value itself >- result = sourceIllegal; >+ if (strict) { >+ result = SourceIllegal; > break; > } >- } >- // Figure out how many bytes the result will require >- if (ch < (UChar32)0x80) { >- bytesToWrite = 1; >- } else if (ch < (UChar32)0x800) { >- bytesToWrite = 2; >- } else if (ch < (UChar32)0x10000) { >- bytesToWrite = 3; >- } else if (ch < (UChar32)0x110000) { >- bytesToWrite = 4; >- } else { >- bytesToWrite = 3; > ch = replacementCharacter; > } >- >- target += bytesToWrite; >- if (target > targetEnd) { >- source = oldSource; // Back up source pointer! >- target -= bytesToWrite; >- result = targetExhausted; >+ U8_APPEND(reinterpret_cast<uint8_t*>(target), i, targetEnd - target, ch, sawError); >+ if (sawError) { >+ result = TargetExhausted; > break; > } >- switch (bytesToWrite) { // note: everything falls through. >- case 4: *--target = (char)((ch | byteMark) & byteMask); ch >>= 6; FALLTHROUGH; >- case 3: *--target = (char)((ch | byteMark) & byteMask); ch >>= 6; FALLTHROUGH; >- case 2: *--target = (char)((ch | byteMark) & byteMask); ch >>= 6; FALLTHROUGH; >- case 1: *--target = (char)(ch | firstByteMark[bytesToWrite]); >- } >- target += bytesToWrite; >+ source += j; > } > *sourceStart = source; >- *targetStart = target; >+ *targetStart = target + i; > return result; > } > >-// This must be called with the length pre-determined by the first byte. >-// If presented with a length > 4, this returns false. The Unicode >-// definition of UTF-8 goes up to 4-byte sequences. >-static bool isLegalUTF8(const unsigned char* source, int length) >-{ >- unsigned char a; >- const unsigned char* srcptr = source + length; >- switch (length) { >- default: return false; >- // Everything else falls through when "true"... >- case 4: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false; FALLTHROUGH; >- case 3: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false; FALLTHROUGH; >- case 2: if ((a = (*--srcptr)) > 0xBF) return false; >- >- switch (*source) { >- // no fall-through in this inner switch >- case 0xE0: if (a < 0xA0) return false; break; >- case 0xED: if (a > 0x9F) return false; break; >- case 0xF0: if (a < 0x90) return false; break; >- case 0xF4: if (a > 0x8F) return false; break; >- default: if (a < 0x80) return false; >- } >- FALLTHROUGH; >- >- case 1: if (*source >= 0x80 && *source < 0xC2) return false; >- } >- if (*source > 0xF4) >- return false; >- return true; >-} >- >-// Magic values subtracted from a buffer value during UTF8 conversion. >-// This table contains as many values as there might be trailing bytes >-// in a UTF-8 sequence. >-static const UChar32 offsetsFromUTF8[6] = { 0x00000000UL, 0x00003080UL, 0x000E2080UL, 0x03C82080UL, static_cast<UChar32>(0xFA082080UL), static_cast<UChar32>(0x82082080UL) }; >- >-static inline UChar32 readUTF8Sequence(const char*& sequence, unsigned length) >-{ >- UChar32 character = 0; >- >- // The cases all fall through. >- switch (length) { >- case 6: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH; >- case 5: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH; >- case 4: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH; >- case 3: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH; >- case 2: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH; >- case 1: character += static_cast<unsigned char>(*sequence++); >- } >- >- return character - offsetsFromUTF8[length - 1]; >-} >- >-ConversionResult convertUTF8ToUTF16( >- const char** sourceStart, const char* sourceEnd, >- UChar** targetStart, UChar* targetEnd, bool* sourceAllASCII, bool strict) >+bool convertUTF8ToUTF16(const char* source, const char* sourceEnd, UChar** targetStart, UChar* targetEnd, bool* sourceAllASCII) > { >- ConversionResult result = conversionOK; >- const char* source = *sourceStart; >+ RELEASE_ASSERT(sourceEnd - source <= std::numeric_limits<int>::max()); >+ UBool error = false; > UChar* target = *targetStart; >- UChar orAllData = 0; >- while (source < sourceEnd) { >- int utf8SequenceLength = inlineUTF8SequenceLength(*source); >- if (sourceEnd - source < utf8SequenceLength) { >- result = sourceExhausted; >- break; >- } >- // Do this check whether lenient or strict >- if (!isLegalUTF8(reinterpret_cast<const unsigned char*>(source), utf8SequenceLength)) { >- result = sourceIllegal; >- break; >- } >- >- UChar32 character = readUTF8Sequence(source, utf8SequenceLength); >- >- if (target >= targetEnd) { >- source -= utf8SequenceLength; // Back up source pointer! >- result = targetExhausted; >- break; >- } >- >- if (U_IS_BMP(character)) { >- // UTF-16 surrogate values are illegal in UTF-32 >- if (U_IS_SURROGATE(character)) { >- if (strict) { >- source -= utf8SequenceLength; // return to the illegal value itself >- result = sourceIllegal; >- break; >- } else { >- *target++ = replacementCharacter; >- orAllData |= replacementCharacter; >- } >- } else { >- *target++ = character; // normal case >- orAllData |= character; >- } >- } else if (U_IS_SUPPLEMENTARY(character)) { >- // target is a character in range 0xFFFF - 0x10FFFF >- if (target + 1 >= targetEnd) { >- source -= utf8SequenceLength; // Back up source pointer! >- result = targetExhausted; >- break; >- } >- *target++ = U16_LEAD(character); >- *target++ = U16_TRAIL(character); >- orAllData = 0xffff; >- } else { >- if (strict) { >- source -= utf8SequenceLength; // return to the start >- result = sourceIllegal; >- break; // Bail out; shouldn't continue >- } else { >- *target++ = replacementCharacter; >- orAllData |= replacementCharacter; >- } >- } >+ UChar32 orAllData = 0; >+ unsigned targetOffset = 0; >+ for (int sourceOffset = 0; sourceOffset < sourceEnd - source; ) { >+ UChar32 character; >+ U8_NEXT(reinterpret_cast<const uint8_t*>(source), sourceOffset, sourceEnd - source, character); >+ if (character < 0) >+ return false; >+ U16_APPEND(target, targetOffset, targetEnd - target, character, error); >+ if (error) >+ return false; >+ orAllData |= character; > } >- *sourceStart = source; >- *targetStart = target; >- >+ *targetStart = target + targetOffset; > if (sourceAllASCII) >- *sourceAllASCII = !(orAllData & ~0x7f); >- >- return result; >+ *sourceAllASCII = isASCII(orAllData); >+ return true; > } > > unsigned calculateStringHashAndLengthFromUTF8MaskingTop8Bits(const char* data, const char* dataEnd, unsigned& dataLength, unsigned& utf16Length) > { >- if (!data) >- return 0; >- > StringHasher stringHasher; >- dataLength = 0; > utf16Length = 0; > >- while (data < dataEnd || (!dataEnd && *data)) { >- if (isASCII(*data)) { >- stringHasher.addCharacter(*data++); >- dataLength++; >- utf16Length++; >- continue; >- } >- >- int utf8SequenceLength = inlineUTF8SequenceLengthNonASCII(*data); >- dataLength += utf8SequenceLength; >- >- if (!dataEnd) { >- for (int i = 1; i < utf8SequenceLength; ++i) { >- if (!data[i]) >- return 0; >- } >- } else if (dataEnd - data < utf8SequenceLength) >+ int inputOffset = 0; >+ int inputLength = dataEnd - data; >+ while (inputOffset < inputLength) { >+ UChar32 character; >+ U8_NEXT(reinterpret_cast<const uint8_t*>(data), inputOffset, inputLength, character); >+ if (character < 0) > return 0; > >- if (!isLegalUTF8(reinterpret_cast<const unsigned char*>(data), utf8SequenceLength)) >- return 0; >- >- UChar32 character = readUTF8Sequence(data, utf8SequenceLength); >- ASSERT(!isASCII(character)); >- > if (U_IS_BMP(character)) { >- // UTF-16 surrogate values are illegal in UTF-32 >- if (U_IS_SURROGATE(character)) >- return 0; >- stringHasher.addCharacter(static_cast<UChar>(character)); // normal case >+ ASSERT(!U_IS_SURROGATE(character)); >+ stringHasher.addCharacter(character); > utf16Length++; >- } else if (U_IS_SUPPLEMENTARY(character)) { >- stringHasher.addCharacters(static_cast<UChar>(U16_LEAD(character)), >- static_cast<UChar>(U16_TRAIL(character))); >+ } else { >+ ASSERT(U_IS_SUPPLEMENTARY(character)); >+ stringHasher.addCharacters(U16_LEAD(character), U16_TRAIL(character)); > utf16Length += 2; >- } else >- return 0; >+ } > } > >+ dataLength = inputOffset; > return stringHasher.hashWithTop8BitsMasked(); > } > > bool equalUTF16WithUTF8(const UChar* a, const char* b, const char* bEnd) > { > while (b < bEnd) { >- if (isASCII(*a) || isASCII(*b)) { >- if (*a++ != *b++) >- return false; >- continue; >- } >- >- int utf8SequenceLength = inlineUTF8SequenceLengthNonASCII(*b); >- >- if (bEnd - b < utf8SequenceLength) >+ int offset = 0; >+ UChar32 character; >+ U8_NEXT(reinterpret_cast<const uint8_t*>(b), offset, bEnd - b, character); >+ if (character < 0) > return false; >- >- if (!isLegalUTF8(reinterpret_cast<const unsigned char*>(b), utf8SequenceLength)) >- return false; >- >- UChar32 character = readUTF8Sequence(b, utf8SequenceLength); >- ASSERT(!isASCII(character)); >+ b += offset; > > if (U_IS_BMP(character)) { >- // UTF-16 surrogate values are illegal in UTF-32 >- if (U_IS_SURROGATE(character)) >- return false; >+ ASSERT(!U_IS_SURROGATE(character)); > if (*a++ != character) > return false; >- } else if (U_IS_SUPPLEMENTARY(character)) { >+ } else { >+ ASSERT(U_IS_SUPPLEMENTARY(character)); > if (*a++ != U16_LEAD(character)) > return false; > if (*a++ != U16_TRAIL(character)) > return false; >- } else >- return false; >+ } > } > > return true; >diff --git a/Source/WTF/wtf/unicode/UTF8Conversion.h b/Source/WTF/wtf/unicode/UTF8Conversion.h >index 3764fe87f374a4899be4d8a3372dc41921d2e6ec..db598de235ac859236f6b20a00d69280a1a58c42 100644 >--- a/Source/WTF/wtf/unicode/UTF8Conversion.h >+++ b/Source/WTF/wtf/unicode/UTF8Conversion.h >@@ -1,5 +1,5 @@ > /* >- * Copyright (C) 2007 Apple Inc. All rights reserved. >+ * Copyright (C) 2007-2019 Apple Inc. All rights reserved. > * > * Redistribution and use in source and binary forms, with or without > * modification, are permitted provided that the following conditions >@@ -31,54 +31,28 @@ > namespace WTF { > namespace Unicode { > >- // Given a first byte, gives the length of the UTF-8 sequence it begins. >- // Returns 0 for bytes that are not legal starts of UTF-8 sequences. >- // Only allows sequences of up to 4 bytes, since that works for all Unicode characters (U-00000000 to U-0010FFFF). >- WTF_EXPORT_PRIVATE int UTF8SequenceLength(char); >+enum ConversionResult { >+ ConversionOK, // conversion successful >+ SourceExhausted, // partial character in source, but hit end >+ TargetExhausted, // insufficient room in target for conversion >+ SourceIllegal // source sequence is illegal/malformed >+}; > >- // Takes a null-terminated C-style string with a UTF-8 sequence in it and converts it to a character. >- // Only allows Unicode characters (U-00000000 to U-0010FFFF). >- // Returns -1 if the sequence is not valid (including presence of extra bytes). >- WTF_EXPORT_PRIVATE int decodeUTF8Sequence(const char*); >+// Conversion functions are strict, except for convertUTF16ToUTF8, which takes >+// "strict" argument. When strict, both illegal sequences and unpaired surrogates >+// will cause an error. When not, illegal sequences and unpaired surrogates are >+// converted to the replacement character, except for an unpaired lead surrogate >+// at the end of the source, which will instead cause a SourceExhausted error. > >- typedef enum { >- conversionOK, // conversion successful >- sourceExhausted, // partial character in source, but hit end >- targetExhausted, // insuff. room in target for conversion >- sourceIllegal // source sequence is illegal/malformed >- } ConversionResult; >+WTF_EXPORT_PRIVATE bool convertUTF8ToUTF16(const char* sourceStart, const char* sourceEnd, UChar** targetStart, UChar* targetEnd, bool* isSourceAllASCII = nullptr); >+WTF_EXPORT_PRIVATE bool convertLatin1ToUTF8(const LChar** sourceStart, const LChar* sourceEnd, char** targetStart, char* targetEnd); >+WTF_EXPORT_PRIVATE ConversionResult convertUTF16ToUTF8(const UChar** sourceStart, const UChar* sourceEnd, char** targetStart, char* targetEnd, bool strict = true); > >- // These conversion functions take a "strict" argument. When this >- // flag is set to strict, both irregular sequences and isolated surrogates >- // will cause an error. When the flag is set to lenient, both irregular >- // sequences and isolated surrogates are converted. >- // >- // Whether the flag is strict or lenient, all illegal sequences will cause >- // an error return. This includes sequences such as: <F4 90 80 80>, <C0 80>, >- // or <A0> in UTF-8, and values above 0x10FFFF in UTF-32. Conformant code >- // must check for illegal sequences. >- // >- // When the flag is set to lenient, characters over 0x10FFFF are converted >- // to the replacement character; otherwise (when the flag is set to strict) >- // they constitute an error. >+WTF_EXPORT_PRIVATE unsigned calculateStringHashAndLengthFromUTF8MaskingTop8Bits(const char* data, const char* dataEnd, unsigned& dataLength, unsigned& utf16Length); > >- WTF_EXPORT_PRIVATE ConversionResult convertUTF8ToUTF16( >- const char** sourceStart, const char* sourceEnd, >- UChar** targetStart, UChar* targetEnd, bool* isSourceAllASCII = 0, bool strict = true); >- >- WTF_EXPORT_PRIVATE ConversionResult convertLatin1ToUTF8( >- const LChar** sourceStart, const LChar* sourceEnd, >- char** targetStart, char* targetEnd); >- >- WTF_EXPORT_PRIVATE ConversionResult convertUTF16ToUTF8( >- const UChar** sourceStart, const UChar* sourceEnd, >- char** targetStart, char* targetEnd, bool strict = true); >- >- WTF_EXPORT_PRIVATE unsigned calculateStringHashAndLengthFromUTF8MaskingTop8Bits(const char* data, const char* dataEnd, unsigned& dataLength, unsigned& utf16Length); >- >- // The caller of these functions already knows that the lengths are the same, so we omit an end argument for UTF-16 and Latin-1. >- bool equalUTF16WithUTF8(const UChar* stringInUTF16, const char* stringInUTF8, const char* stringInUTF8End); >- bool equalLatin1WithUTF8(const LChar* stringInLatin1, const char* stringInUTF8, const char* stringInUTF8End); >+// Callers of these functions must check that the lengths are the same; accordingly we omit an end argument for UTF-16 and Latin-1. >+bool equalUTF16WithUTF8(const UChar* stringInUTF16, const char* stringInUTF8, const char* stringInUTF8End); >+bool equalLatin1WithUTF8(const LChar* stringInLatin1, const char* stringInUTF8, const char* stringInUTF8End); > > } // namespace Unicode > } // namespace WTF >diff --git a/Source/WebCore/platform/SharedBuffer.cpp b/Source/WebCore/platform/SharedBuffer.cpp >index 63cfe462581dbbc2fda1fb63993945f8fe06029c..be0149a4a6e539616942ed5253e8271111842908 100644 >--- a/Source/WebCore/platform/SharedBuffer.cpp >+++ b/Source/WebCore/platform/SharedBuffer.cpp >@@ -334,17 +334,16 @@ RefPtr<SharedBuffer> utf8Buffer(const String& string) > > // Convert to runs of 8-bit characters. > char* p = buffer.data(); >- WTF::Unicode::ConversionResult result; > if (length) { > if (string.is8Bit()) { > const LChar* d = string.characters8(); >- result = WTF::Unicode::convertLatin1ToUTF8(&d, d + length, &p, p + buffer.size()); >+ if (!WTF::Unicode::convertLatin1ToUTF8(&d, d + length, &p, p + buffer.size())) >+ return nullptr; > } else { > const UChar* d = string.characters16(); >- result = WTF::Unicode::convertUTF16ToUTF8(&d, d + length, &p, p + buffer.size(), true); >+ if (WTF::Unicode::convertUTF16ToUTF8(&d, d + length, &p, p + buffer.size()) != WTF::Unicode::ConversionOK) >+ return nullptr; > } >- if (result != WTF::Unicode::conversionOK) >- return nullptr; > } > > buffer.shrink(p - buffer.data()); >diff --git a/Source/WebCore/xml/XSLTProcessorLibxslt.cpp b/Source/WebCore/xml/XSLTProcessorLibxslt.cpp >index 95b1b6a51021282ab4660d7cd54e83f857657ffd..1e30fd1f9f664320251612d5c92230cc7c240748 100644 >--- a/Source/WebCore/xml/XSLTProcessorLibxslt.cpp >+++ b/Source/WebCore/xml/XSLTProcessorLibxslt.cpp >@@ -48,8 +48,6 @@ > #include <libxslt/xslt.h> > #include <libxslt/xsltutils.h> > #include <wtf/Assertions.h> >-#include <wtf/text/StringBuffer.h> >-#include <wtf/unicode/UTF8Conversion.h> > > #if OS(DARWIN) && !PLATFORM(GTK) > #include "SoftLinkLibxslt.h" >@@ -159,27 +157,41 @@ static inline void setXSLTLoadCallBack(xsltDocLoaderFunc func, XSLTProcessor* pr > globalCachedResourceLoader = cachedResourceLoader; > } > >-static int writeToStringBuilder(void* context, const char* buffer, int len) >+static int writeToStringBuilder(void* context, const char* buffer, int length) > { > StringBuilder& resultOutput = *static_cast<StringBuilder*>(context); > >- if (!len) >- return 0; >- >- StringBuffer<UChar> stringBuffer(len); >- UChar* bufferUChar = stringBuffer.characters(); >- UChar* bufferUCharEnd = bufferUChar + len; >- >- const char* stringCurrent = buffer; >- WTF::Unicode::ConversionResult result = WTF::Unicode::convertUTF8ToUTF16(&stringCurrent, buffer + len, &bufferUChar, bufferUCharEnd); >- if (result != WTF::Unicode::conversionOK && result != WTF::Unicode::sourceExhausted) { >- ASSERT_NOT_REACHED(); >- return -1; >+ // FIXME: Consider ways to make this more efficient by moving it into a >+ // StringBuilder::appendUTF8 function, and then optimizing to not need a >+ // Vector<UChar> and possibly optimize cases that can produce 8-bit Latin-1 >+ // strings, but that would need to be sophisticated about not processing >+ // trailing incomplete sequences and communicating that to the caller. >+ >+ Vector<UChar> outputBuffer(length); >+ >+ UBool error = false; >+ int inputOffset = 0; >+ int outputOffset = 0; >+ while (inputOffset < length) { >+ UChar32 character; >+ int nextInputOffset = inputOffset; >+ U8_NEXT(reinterpret_cast<const uint8_t*>(buffer), nextInputOffset, length, character); >+ if (character < 0) { >+ if (nextInputOffset == length) >+ break; >+ ASSERT_NOT_REACHED(); >+ return -1; >+ } >+ inputOffset = nextInputOffset; >+ U16_APPEND(outputBuffer.data(), outputOffset, length, character, error); >+ if (error) { >+ ASSERT_NOT_REACHED(); >+ return -1; >+ } > } > >- int utf16Length = bufferUChar - stringBuffer.characters(); >- resultOutput.append(stringBuffer.characters(), utf16Length); >- return stringCurrent - buffer; >+ resultOutput.append(outputBuffer.data(), outputOffset); >+ return inputOffset; > } > > static bool saveResultToString(xmlDocPtr resultDoc, xsltStylesheetPtr sheet, String& resultString) >diff --git a/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp b/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp >index 761ab10e31d3c3138384c9028efd1dbaa3f87d93..9dc2cb0963836f0c4aaf8d19c3a13e3d68d81a1a 100644 >--- a/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp >+++ b/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp >@@ -1154,7 +1154,7 @@ static size_t convertUTF16EntityToUTF8(const UChar* utf16Entity, size_t numberOf > { > const char* originalTarget = target; > auto conversionResult = WTF::Unicode::convertUTF16ToUTF8(&utf16Entity, utf16Entity + numberOfCodeUnits, &target, target + targetSize); >- if (conversionResult != WTF::Unicode::conversionOK) >+ if (conversionResult != WTF::Unicode::ConversionOK) > return 0; > > // Even though we must pass the length, libxml expects the entity string to be null terminated. >diff --git a/Source/WebKit/Shared/API/APIString.h b/Source/WebKit/Shared/API/APIString.h >index 339a7fae3ca4a8b10067be149a0c15a70a257f58..99d1a3e3d369e5e0b91329f8294632bc96043789 100644 >--- a/Source/WebKit/Shared/API/APIString.h >+++ b/Source/WebKit/Shared/API/APIString.h >@@ -23,14 +23,10 @@ > * THE POSSIBILITY OF SUCH DAMAGE. > */ > >-#ifndef APIString_h >-#define APIString_h >+#pragma once > > #include "APIObject.h" >-#include <wtf/Ref.h> > #include <wtf/text/StringView.h> >-#include <wtf/text/WTFString.h> >-#include <wtf/unicode/UTF8Conversion.h> > > namespace API { > >@@ -75,5 +71,3 @@ private: > }; > > } // namespace WebKit >- >-#endif // APIString_h >diff --git a/Source/WebKit/Shared/API/c/WKString.cpp b/Source/WebKit/Shared/API/c/WKString.cpp >index 02aa4cc0f0b6673e236a07a81c46fe39eab7feda..52bc355028c6be0fcc582364ba4663f1c1e3c575 100644 >--- a/Source/WebKit/Shared/API/c/WKString.cpp >+++ b/Source/WebKit/Shared/API/c/WKString.cpp >@@ -30,6 +30,7 @@ > #include "WKAPICast.h" > #include <JavaScriptCore/InitializeThreading.h> > #include <JavaScriptCore/OpaqueJSString.h> >+#include <wtf/unicode/UTF8Conversion.h> > > WKTypeID WKStringGetTypeID() > { >@@ -78,19 +79,18 @@ size_t WKStringGetUTF8CStringImpl(WKStringRef stringRef, char* buffer, size_t bu > auto stringView = WebKit::toImpl(stringRef)->stringView(); > > char* p = buffer; >- WTF::Unicode::ConversionResult result; > > if (stringView.is8Bit()) { > const LChar* characters = stringView.characters8(); >- result = WTF::Unicode::convertLatin1ToUTF8(&characters, characters + stringView.length(), &p, p + bufferSize - 1); >+ if (!WTF::Unicode::convertLatin1ToUTF8(&characters, characters + stringView.length(), &p, p + bufferSize - 1)) >+ return 0; > } else { > const UChar* characters = stringView.characters16(); >- result = WTF::Unicode::convertUTF16ToUTF8(&characters, characters + stringView.length(), &p, p + bufferSize - 1, strict); >+ auto result = WTF::Unicode::convertUTF16ToUTF8(&characters, characters + stringView.length(), &p, p + bufferSize - 1, strict); >+ if (result != WTF::Unicode::ConversionOK && result != WTF::Unicode::TargetExhausted) >+ return 0; > } > >- if (result != WTF::Unicode::conversionOK && result != WTF::Unicode::targetExhausted) >- return 0; >- > *p++ = '\0'; > return p - buffer; > } >diff --git a/LayoutTests/ChangeLog b/LayoutTests/ChangeLog >index 87c1bd7fe0c99d8f3fb1538ca81b6d9d35be00e5..f6688ab0da670ca72b02216bad527e55dedec337 100644 >--- a/LayoutTests/ChangeLog >+++ b/LayoutTests/ChangeLog >@@ -1,3 +1,26 @@ >+2019-04-29 Darin Adler <darin@apple.com> >+ >+ WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support >+ https://bugs.webkit.org/show_bug.cgi?id=195535 >+ >+ Reviewed by NOBODY (OOPS!). >+ >+ * css3/escape-dom-api-expected.txt: >+ * fast/text/dangling-surrogates-expected.txt: >+ * js/dom/webidl-type-mapping-expected.txt: >+ * js/invalid-utf8-in-syntax-error-expected.txt: >+ Updated expected results to have the Unicode replacement character in cases where the >+ text contains unpaired surrogates. The tests are still doing the same operations, and >+ still getting the same results, but the text output no longer includes illegal UTF-8. >+ >+ * js/invalid-utf8-in-syntax-error.html: Added. Before adding this, the test was >+ run, but unlike the rest of the tests in this directory, was only run as part of >+ run-javascriptcore-tests. There are two reasons for adding this. One is to be >+ consistent with the rest of the tests here and run a second time as part of the >+ broader WebKit tests. The second is that we can now use "--reset-results" to generate >+ new expected results, something that run-webkit-tests has but run-javascriptcore-tests >+ does not have. >+ > 2019-04-29 Truitt Savell <tsavell@apple.com> > > Unreviewed, rolling out r244755. >diff --git a/LayoutTests/imported/w3c/ChangeLog b/LayoutTests/imported/w3c/ChangeLog >index 3f52a49fb91a496a499f821c4f5535362d4b26c9..b3d945402329d5488facaf634b5a7cb3c99162fa 100644 >--- a/LayoutTests/imported/w3c/ChangeLog >+++ b/LayoutTests/imported/w3c/ChangeLog >@@ -1,3 +1,15 @@ >+2019-04-29 Darin Adler <darin@apple.com> >+ >+ WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support >+ https://bugs.webkit.org/show_bug.cgi?id=195535 >+ >+ Reviewed by NOBODY (OOPS!). >+ >+ * web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt: >+ Updated expected results to have the Unicode replacement character in cases where the >+ text contains unpaired surrogates. The tests are still doing the same operations, and >+ still getting the same results, but the text output no longer includes illegal UTF-8. >+ > 2019-04-29 Javier Fernandez <jfernandez@igalia.com> > > line should not be broken before the first space after a word >diff --git a/LayoutTests/css3/escape-dom-api-expected.txt b/LayoutTests/css3/escape-dom-api-expected.txt >index c851c96ef745e0118673126ad8982a3104b8d2c1..f54fb3b77a927a05474f1e1374cb831ed1acb866 100644 >--- a/LayoutTests/css3/escape-dom-api-expected.txt >+++ b/LayoutTests/css3/escape-dom-api-expected.txt >@@ -4,14 +4,14 @@ On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE > > > PASS CSS.escape.length is 1 >-PASS CSS.escape('\0') is "�" >-PASS CSS.escape('a\0') is "a�" >-PASS CSS.escape('\0b') is "�b" >-PASS CSS.escape('a\0b') is "a�b" >-PASS CSS.escape('�') is "�" >-PASS CSS.escape('a�') is "a�" >-PASS CSS.escape('�b') is "�b" >-PASS CSS.escape('a�b') is "a�b" >+PASS CSS.escape('\0') is "�" >+PASS CSS.escape('a\0') is "a�" >+PASS CSS.escape('\0b') is "�b" >+PASS CSS.escape('a\0b') is "a�b" >+PASS CSS.escape('�') is "�" >+PASS CSS.escape('a�') is "a�" >+PASS CSS.escape('�b') is "�b" >+PASS CSS.escape('a�b') is "a�b" > PASS CSS.escape() threw exception TypeError: Not enough arguments. > PASS CSS.escape(undefined) is "undefined" > PASS CSS.escape(true) is "true" >@@ -53,16 +53,16 @@ PASS CSS.escape('-') is "\\-" > PASS CSS.escape('-a') is "-a" > PASS CSS.escape('--') is "--" > PASS CSS.escape('--a') is "--a" >-PASS CSS.escape('ÃÂ-_é') is "ÃÂ-_é" >-PASS CSS.escape('ÃÂÃÂÃÂÃÂÃÂàÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ') is "\\7f ÃÂÃÂÃÂÃÂÃÂàÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ" >-PASS CSS.escape('àáâ') is "àáâ" >+PASS CSS.escape('Â-_©') is "Â-_©" >+PASS CSS.escape(' ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ') is "\\7f  ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ" >+PASS CSS.escape(' ¡¢') is " ¡¢" > PASS CSS.escape('a0123456789b') is "a0123456789b" > PASS CSS.escape('abcdefghijklmnopqrstuvwxyz') is "abcdefghijklmnopqrstuvwxyz" > PASS CSS.escape('ABCDEFGHIJKLMNOPQRSTUVWXYZ') is "ABCDEFGHIJKLMNOPQRSTUVWXYZ" > PASS CSS.escape(' !xy') is "\\ \\!xy" >-PASS CSS.escape('ðÂÂÂ') is "ðÂÂÂ" >-PASS CSS.escape('üÂ') is "\udf06" >-PASS CSS.escape('à´') is "\ud834" >+PASS CSS.escape('ð') is "ð" >+PASS CSS.escape('�') is "\udf06" >+PASS CSS.escape('�') is "\ud834" > PASS successfullyParsed is true > > TEST COMPLETE >diff --git a/LayoutTests/fast/text/dangling-surrogates-expected.txt b/LayoutTests/fast/text/dangling-surrogates-expected.txt >index cd1fd817f3a9e5aa8ae0af3d3f5a0bf668d7b7e3..45a378e0e00df56092dbb33d98e859efe2efd590 100644 >--- a/LayoutTests/fast/text/dangling-surrogates-expected.txt >+++ b/LayoutTests/fast/text/dangling-surrogates-expected.txt >@@ -3,8 +3,8 @@ This tests verifies that the test tools can handle a dangling surrogate characte > On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE". > > >-PASS danglingFirst is "àÂ" >-PASS danglingSecond is "ðÂ" >+PASS danglingFirst is "�" >+PASS danglingSecond is "�" > PASS successfullyParsed is true > > TEST COMPLETE >diff --git a/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt b/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt >index b847acfbc257d470908f125323aa75caca5f3111..900fe37ded95dc67bb763c8916223a2ccbd50491 100644 >--- a/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt >+++ b/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt >@@ -1,21 +1,21 @@ > >-FAIL utf-16le - lone surrogate lead assert_equals: expected "\ufffd" but got "àÂ" >+FAIL utf-16le - lone surrogate lead assert_equals: expected "\ufffd" but got "�" > FAIL utf-16le - lone surrogate lead (fatal flag set) assert_throws: function "function () { > new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input)) > }" did not throw >-FAIL utf-16le - lone surrogate trail assert_equals: expected "\ufffd" but got "ðÂ" >+FAIL utf-16le - lone surrogate trail assert_equals: expected "\ufffd" but got "�" > FAIL utf-16le - lone surrogate trail (fatal flag set) assert_throws: function "function () { > new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input)) > }" did not throw >-FAIL utf-16le - unmatched surrogate lead assert_equals: expected "\ufffd\0" but got "àÂ\0" >+FAIL utf-16le - unmatched surrogate lead assert_equals: expected "\ufffd\0" but got "�\0" > FAIL utf-16le - unmatched surrogate lead (fatal flag set) assert_throws: function "function () { > new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input)) > }" did not throw >-FAIL utf-16le - unmatched surrogate trail assert_equals: expected "\ufffd\0" but got "ðÂ\0" >+FAIL utf-16le - unmatched surrogate trail assert_equals: expected "\ufffd\0" but got "�\0" > FAIL utf-16le - unmatched surrogate trail (fatal flag set) assert_throws: function "function () { > new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input)) > }" did not throw >-FAIL utf-16le - swapped surrogate pair assert_equals: expected "\ufffd\ufffd" but got "ðÂàÂ" >+FAIL utf-16le - swapped surrogate pair assert_equals: expected "\ufffd\ufffd" but got "��" > FAIL utf-16le - swapped surrogate pair (fatal flag set) assert_throws: function "function () { > new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input)) > }" did not throw >diff --git a/LayoutTests/js/dom/webidl-type-mapping-expected.txt b/LayoutTests/js/dom/webidl-type-mapping-expected.txt >index 75a4da60b81ecfbbc72f6cd504deb9e2a15321c6..7ec70f9b8bccb0bf4f8fe737af3f5ffcf8cc9962 100644 >--- a/LayoutTests/js/dom/webidl-type-mapping-expected.txt >+++ b/LayoutTests/js/dom/webidl-type-mapping-expected.txt >@@ -1009,48 +1009,48 @@ PASS converter.testEnforceRangeUnsignedShort = {valueOf:function(){throw new Err > > converter.testUSVString = '!@#123ABCabc\x00\x80\xFF\r\n\t' > converter.testString = '!@#123ABCabc\x00\x80\xFF\r\n\t' >-PASS converter.testUSVString is "!@#123ABCabc\u0000ÃÂÿ\r\n\t" >-PASS converter.testString is "!@#123ABCabc\u0000ÃÂÿ\r\n\t" >+PASS converter.testUSVString is "!@#123ABCabc\u0000Âÿ\r\n\t" >+PASS converter.testString is "!@#123ABCabc\u0000Âÿ\r\n\t" > converter.testUSVString = '\u0100' > converter.testString = '\u0100' >-PASS converter.testUSVString is "ÃÂ" >-PASS converter.testString is "ÃÂ" >+PASS converter.testUSVString is "Ä" >+PASS converter.testString is "Ä" > PASS converter.testUSVString = {toString: function() { throw Error(); }} threw exception Error. > PASS converter.testString = {toString: function() { throw Error(); }} threw exception Error. >-PASS converter.testUSVString is "ÃÂ" >-PASS converter.testString is "ÃÂ" >+PASS converter.testUSVString is "Ä" >+PASS converter.testString is "Ä" > converter.testUSVString = "\ud800" > converter.testString = "\ud800" >-PASS converter.testUSVString is "�" >+PASS converter.testUSVString is "�" > PASS converter.testString is "\ud800" > converter.testUSVString = "\udc00" > converter.testString = "\udc00" >-PASS converter.testUSVString is "�" >+PASS converter.testUSVString is "�" > PASS converter.testString is "\udc00" > converter.testUSVString = "\ud800\u0000" > converter.testString = "\ud800\u0000" >-PASS converter.testUSVString is "�\u0000" >+PASS converter.testUSVString is "�\u0000" > PASS converter.testString is "\ud800\u0000" > converter.testUSVString = "\udc00\u0000" > converter.testString = "\udc00\u0000" >-PASS converter.testUSVString is "�\u0000" >+PASS converter.testUSVString is "�\u0000" > PASS converter.testString is "\udc00\u0000" > converter.testUSVString = "\udc00\ud800" > converter.testString = "\udc00\ud800" >-PASS converter.testUSVString is "��" >+PASS converter.testUSVString is "��" > PASS converter.testString is "\udc00\ud800" >-converter.testUSVString = "ðÂÂÂ" >-converter.testString = "ðÂÂÂ" >-PASS converter.testUSVString is "ðÂÂÂ" >-PASS converter.testString is "ðÂÂÂ" >+converter.testUSVString = "ð" >+converter.testString = "ð" >+PASS converter.testUSVString is "ð" >+PASS converter.testString is "ð" > converter.testByteString = '!@#123ABCabc\x00\x80\xFF\r\n\t' >-PASS converter.testByteString is "!@#123ABCabc\u0000ÃÂÿ\r\n\t" >+PASS converter.testByteString is "!@#123ABCabc\u0000Âÿ\r\n\t" > converter.testByteString = '\u00FF' >-PASS converter.testByteString is "ÿ" >+PASS converter.testByteString is "ÿ" > PASS converter.testByteString = '\u0100' threw exception TypeError: Type error. >-PASS converter.testByteString is "ÿ" >+PASS converter.testByteString is "ÿ" > PASS converter.testByteString = {toString: function() { throw Error(); }} threw exception Error. >-PASS converter.testByteString is "ÿ" >+PASS converter.testByteString is "ÿ" > converter.testUSVString = true > converter.testString = true > converter.testByteString = true >@@ -1180,37 +1180,37 @@ PASS converter.testNodeRecord().hasOwnProperty('key2') is true > PASS 'key2' in converter.testNodeRecord() is true > PASS converter.testNodeRecord()['key2'] is document.documentElement > PASS converter.setTestNodeRecord({ key: 'hello' }) threw exception TypeError: Type error. >-converter.setTestLongRecord({'àÂ': 1 }) >-PASS converter.testLongRecord()['àÂ'] is 1 >-converter.setTestNodeRecord({'àÂ': document }) >-PASS converter.testNodeRecord()['�'] is document >-converter.setTestLongRecord({'ðÂ': 1 }) >-PASS converter.testLongRecord()['ðÂ'] is 1 >-converter.setTestNodeRecord({'ðÂ': document }) >-PASS converter.testNodeRecord()['�'] is document >-converter.setTestLongRecord({'àÂ': 1 }) >-PASS converter.testLongRecord()['àÂ\0'] is 1 >-converter.setTestNodeRecord({'àÂ': document }) >-PASS converter.testNodeRecord()['�\0'] is document >-converter.setTestLongRecord({'ðÂ': 1 }) >-PASS converter.testLongRecord()['ðÂ\0'] is 1 >-converter.setTestNodeRecord({'ðÂ': document }) >-PASS converter.testNodeRecord()['�\0'] is document >-converter.setTestLongRecord({'ðÂàÂ': 1 }) >-PASS converter.testLongRecord()['ðÂàÂ'] is 1 >-converter.setTestNodeRecord({'ðÂàÂ': document }) >-PASS converter.testNodeRecord()['��'] is document >-converter.setTestLongRecord({'ðÂÂÂ': 1 }) >-PASS converter.testLongRecord()['ðÂÂÂ'] is 1 >-converter.setTestNodeRecord({'ðÂÂÂ': document }) >-PASS converter.testNodeRecord()['ðÂÂÂ'] is document >+converter.setTestLongRecord({'�': 1 }) >+PASS converter.testLongRecord()['�'] is 1 >+converter.setTestNodeRecord({'�': document }) >+PASS converter.testNodeRecord()['�'] is document >+converter.setTestLongRecord({'�': 1 }) >+PASS converter.testLongRecord()['�'] is 1 >+converter.setTestNodeRecord({'�': document }) >+PASS converter.testNodeRecord()['�'] is document >+converter.setTestLongRecord({'�': 1 }) >+PASS converter.testLongRecord()['�\0'] is 1 >+converter.setTestNodeRecord({'�': document }) >+PASS converter.testNodeRecord()['�\0'] is document >+converter.setTestLongRecord({'�': 1 }) >+PASS converter.testLongRecord()['�\0'] is 1 >+converter.setTestNodeRecord({'�': document }) >+PASS converter.testNodeRecord()['�\0'] is document >+converter.setTestLongRecord({'��': 1 }) >+PASS converter.testLongRecord()['��'] is 1 >+converter.setTestNodeRecord({'��': document }) >+PASS converter.testNodeRecord()['��'] is document >+converter.setTestLongRecord({'ð': 1 }) >+PASS converter.testLongRecord()['ð'] is 1 >+converter.setTestNodeRecord({'ð': document }) >+PASS converter.testNodeRecord()['ð'] is document > converter.setTestSequenceRecord({ key: ['value', 'other value'] }) > PASS converter.testSequenceRecord().hasOwnProperty('key') is true > PASS 'key' in converter.testSequenceRecord() is true > PASS converter.testSequenceRecord()['key'] is ['value', 'other value'] >-PASS converter.setTestSequenceRecord({ 'ÃÂ': ['value'] }) threw exception TypeError: Type error. >-converter.setTestSequenceRecord({ 'ÿ': ['value'] }) >-PASS converter.testSequenceRecord()['ÿ'] is ['value'] >+PASS converter.setTestSequenceRecord({ 'Ä': ['value'] }) threw exception TypeError: Type error. >+converter.setTestSequenceRecord({ 'ÿ': ['value'] }) >+PASS converter.testSequenceRecord()['ÿ'] is ['value'] > PASS converter.testImpureNaNUnrestrictedDouble is NaN > PASS converter.testImpureNaN2UnrestrictedDouble is NaN > PASS converter.testQuietNaNUnrestrictedDouble is NaN >diff --git a/LayoutTests/js/invalid-utf8-in-syntax-error-expected.txt b/LayoutTests/js/invalid-utf8-in-syntax-error-expected.txt >index 43dd9e463fbcaddb416ce73760acbe81cfb26a1b..32e1a7bf3cc443225996f479c1e16111cfb73876 100644 >--- a/LayoutTests/js/invalid-utf8-in-syntax-error-expected.txt >+++ b/LayoutTests/js/invalid-utf8-in-syntax-error-expected.txt >@@ -3,7 +3,7 @@ Ensures that we correctly propagate the error message for lexer errors containin > On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE". > > >-PASS ({f("íº")}) threw exception SyntaxError: Unexpected string literal "úÂ". Expected a parameter pattern or a ')' in parameter list.. >+PASS ({f("�")}) threw exception SyntaxError: Unexpected string literal "�". Expected a parameter pattern or a ')' in parameter list.. > PASS successfullyParsed is true > > TEST COMPLETE >diff --git a/LayoutTests/js/invalid-utf8-in-syntax-error.html b/LayoutTests/js/invalid-utf8-in-syntax-error.html >new file mode 100644 >index 0000000000000000000000000000000000000000..8fb0a757bc0e3339dab604e670d8334c85d11290 >--- /dev/null >+++ b/LayoutTests/js/invalid-utf8-in-syntax-error.html >@@ -0,0 +1,10 @@ >+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> >+<html> >+<head> >+<script src="../resources/js-test-pre.js"></script> >+</head> >+<body> >+<script src="script-tests/invalid-utf8-in-syntax-error.js"></script> >+<script src="../resources/js-test-post.js"></script> >+</body> >+</html>
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Diff
View Attachment As Raw
Flags:
ap
:
review+
Actions:
View
|
Formatted Diff
|
Diff
Attachments on
bug 195535
:
364198
|
364200
|
364203
|
364204
|
364208
|
367915
|
367917
|
367919
|
368432
| 368529