Caught by discussion in https://bugs.webkit.org/show_bug.cgi?id=126200
Turns out WebKit's implementation matches the latest spec: https://commits.webkit.org/246206@main
I wonder why the spec changed if all browsers agreed. Fun, though.
I'm not sure if the change we made in 246206@main was a good one. Compatibility characters are a footgun, as they are canonically equivalent to other characters. So what are we supposed to do when both variants appear? Attribute names in dataset are a good example here. Simply referring to QName production in the XML standard superficially seems like a simplification, but restrictions that are still implemented in Blink (https://github.com/chromium/chromium/blob/main/third_party/blink/renderer/core/dom/document.cc#L444) seem more author friendly - at least one can copy/paste code in the editor without changing its meaning. Are any other browsers making this change?
I filed https://github.com/whatwg/html/issues/8215 to consider aligning dataset with other planned changes (to which other browsers already agreed). If that goes through we can then consider all those changes as a package. (I'm personally not too worried about canonical equivalence as we don't normalize in the DOM and intentionally so.)