| Summary: | [Intl] Locale validation/canonicalization should defer to ICU | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | WebKit | Reporter: | Ross Kirsling <ross.kirsling> | ||||||||||||||||
| Component: | New Bugs | Assignee: | Ross Kirsling <ross.kirsling> | ||||||||||||||||
| Status: | RESOLVED FIXED | ||||||||||||||||||
| Severity: | Normal | CC: | annulen, ap, darin, ews-watchlist, gyuyoung.kim, keith_miller, mark.lam, mmaxfield, msaboff, ryuan.choi, saam, sergio, tzagallo, webkit-bug-importer, ysuzuki | ||||||||||||||||
| Priority: | P2 | Keywords: | InRadar | ||||||||||||||||
| Version: | WebKit Nightly Build | ||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||
| Attachments: |
|
||||||||||||||||||
|
Description
Ross Kirsling
2020-04-22 02:18:39 PDT
Created attachment 397180 [details]
Patch
Additional notes:
1. For ease of review, here's what the two functions in the generated header end up looking like:
static String intlPreferredLanguageTag(const String& tag)
{
// 354 possible replacements
static const auto map = makeNeverDestroyed<HashMap<String, String>>({
{ "aam"_s, "aas"_s },
...
});
return map.get().get(tag);
}
2. A ~5000-line file is much nicer than a ~50000-line file, but you might ask why ICU can't do this for us, since it has the data. The answer is that uloc_[for|to]LanguageTag unfortunately don't produce results that conform to ECMA-402 (and are also supposedly quite slow?). Still, there are certain transformations that we still need to apply.
Created attachment 397227 [details]
Patch
(In reply to Ross Kirsling from comment #2) > 2. A ~5000-line file is much nicer than a ~50000-line file, but you might > ask why ICU can't do this for us, since it has the data. The answer is that > uloc_[for|to]LanguageTag unfortunately don't produce results that conform to > ECMA-402 (and are also supposedly quite slow?). Still, there are certain > transformations that we still need to apply. Hmm, it turns out there's an XML version of this that will be much easier (and future-proof) to parse. I'll try using that instead. Apparently just as we were going to catch up with the existing tests for locale canonicalization, many more landed that raise the bar significantly higher (https://github.com/tc39/test262/pull/2554). This seems utterly unreasonable to ask each engine to implement. We should probably just rely on ICU and accept whatever failures occur in a given version. Created attachment 397431 [details]
Patch
By Yusuke's suggestion, let's just roll with what ICU gives us. What's cool is that there's almost zero loss to doing so in ICU 64 -- the only thing missing is the "-true" elimination of r260151. (And we could even add it back temporarily if we felt it were worth the perf hit.) Comment on attachment 397431 [details]
Patch
I’ll review once the JSC stuff compiles.
Created attachment 397505 [details]
Patch
Comment on attachment 397505 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=397505&action=review > Source/JavaScriptCore/runtime/IntlObject.cpp:356 > + return String(result.data(), resultLength); Are these language tags ASCII? Latin-1? UTF-8? This code is fine if they are ASCII or Latin-1, not if they are UTF-8. (In reply to Darin Adler from comment #10) > Comment on attachment 397505 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=397505&action=review > > > Source/JavaScriptCore/runtime/IntlObject.cpp:356 > > + return String(result.data(), resultLength); > > Are these language tags ASCII? Latin-1? UTF-8? > > This code is fine if they are ASCII or Latin-1, not if they are UTF-8. Yeah, the output is all ASCII (just alphanumeric and separator characters). Created attachment 397528 [details]
Patch for landing
(I'll make sure I see JSC-ARMv7 pass tests before landing.) I like this new strategy. But: I think it’s OK to add extra code to perform better with older versions of ICU or with things ICU didn’t get to yet, depending on the importance. Created attachment 397538 [details]
Patch for landing
Created attachment 397549 [details]
Patch for landing
Committed r260697: <https://trac.webkit.org/changeset/260697> All reviewed patches have been landed. Closing bug and clearing flags on attachment 397549 [details]. For some reason this hits an unchecked exception assert only via run-javascript-core and not via run-jsc. Will fix... (In reply to Ross Kirsling from comment #19) > For some reason this hits an unchecked exception assert only via > run-javascript-core and not via run-jsc. Will fix... Committed r260710: <https://trac.webkit.org/changeset/260710> |