Bug 244174

Summary: Dataset setter should throw InvalidCharacterError when the parsed name doesn't match XML Name production
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: DOMAssignee: Nobody <webkit-unassigned>
Status: RESOLVED INVALID    
Severity: Normal CC: annevk, ap, cdumez
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   

Description Ryosuke Niwa 2022-08-21 16:02:27 PDT
Caught by discussion in https://bugs.webkit.org/show_bug.cgi?id=126200
Comment 1 Ryosuke Niwa 2022-08-21 16:53:00 PDT
Turns out WebKit's implementation matches the latest spec:
https://commits.webkit.org/246206@main
Comment 2 Alexey Proskuryakov 2022-08-21 17:14:59 PDT
I wonder why the spec changed if all browsers agreed. Fun, though.
Comment 3 Alexey Proskuryakov 2022-08-21 17:52:14 PDT
I'm not sure if the change we made in 246206@main was a good one. Compatibility characters are a footgun, as they are canonically equivalent to other characters. So what are we supposed to do when both variants appear? Attribute names in dataset are a good example here.

Simply referring to QName production in the XML standard superficially seems like a simplification, but restrictions that are still implemented in Blink (https://github.com/chromium/chromium/blob/main/third_party/blink/renderer/core/dom/document.cc#L444) seem more author friendly - at least one can copy/paste code in the editor without changing its meaning.

Are any other browsers making this change?
Comment 4 Anne van Kesteren 2022-08-23 04:37:59 PDT
I filed https://github.com/whatwg/html/issues/8215 to consider aligning dataset with other planned changes (to which other browsers already agreed). If that goes through we can then consider all those changes as a package.

(I'm personally not too worried about canonical equivalence as we don't normalize in the DOM and intentionally so.)