| Summary: | Add ability to mask URLs from DOM and JavaScript | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | WebKit | Reporter: | Timothy Hatcher <timothy> | ||||||||||||||||||
| Component: | WebKit API | Assignee: | Timothy Hatcher <timothy> | ||||||||||||||||||
| Status: | RESOLVED FIXED | ||||||||||||||||||||
| Severity: | Normal | CC: | achristensen, ap, bfulgham, cdumez, changseok, cmarcelo, darin, eoconnor, esprehn+autocc, ews-watchlist, ggaren, gyuyoung.kim, jond, kangil.han, keith_miller, kondapallykalyan, mark.lam, mifenton, msaboff, saam, tzagallo, webkit-bug-importer, wilander | ||||||||||||||||||
| Priority: | P2 | Keywords: | InRadar | ||||||||||||||||||
| Version: | WebKit Local Build | ||||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||
| See Also: | https://bugs.webkit.org/show_bug.cgi?id=246010 | ||||||||||||||||||||
| Bug Depends on: | |||||||||||||||||||||
| Bug Blocks: | 242278 | ||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||
|
Description
Timothy Hatcher
2022-03-30 15:00:30 PDT
Created attachment 456529 [details]
Patch
Comment on attachment 456529 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=456529&action=review > Source/JavaScriptCore/ChangeLog:3 > + Add ability to mask URLs from DOM and JavaScript Is this what other browsers do? Closing every way in which JS code can see URLs seems like a losing game. Also, there is some hot code being touched here, so I'm not optimistic about perf impact. Created attachment 456580 [details]
Patch
(In reply to Alexey Proskuryakov from comment #2) > Comment on attachment 456529 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=456529&action=review > > > Source/JavaScriptCore/ChangeLog:3 > > + Add ability to mask URLs from DOM and JavaScript > > Is this what other browsers do? Other browsers don't do this, but should consider it! > Closing every way in which JS code can see URLs seems like a losing game. Yeah, I completely understand that. Should we just give up and jet this be a fingerprinting vector then? > Also, there is some hot code being touched here, so I'm not optimistic about > perf impact. I'm also aware of this. Any suggestions? One suggestion is to please A/B test this change before landing.
> Yeah, I completely understand that. Should we just give up and jet this be a fingerprinting vector then?
"[extension code]" is a fingerprinting giveaway too, as is checking where exactly those appear, as is the actual impact of what extensions do. I know that this is not what the report is about, but (1) I don't feel comfortable discussing the specific report here, and (2) not sure how useful it is to focus just on that.
Do we have a forum where extension behaviors of this kind can be discussed with other browser engine vendors?
(In reply to Alexey Proskuryakov from comment #5) > One suggestion is to please A/B test this change before landing. > > > Yeah, I completely understand that. Should we just give up and jet this be a fingerprinting vector then? > > "[extension code]" is a fingerprinting giveaway too, as is checking where > exactly those appear, as is the actual impact of what extensions do. I know > that this is not what the report is about, but (1) I don't feel comfortable > discussing the specific report here, and (2) not sure how useful it is to > focus just on that. If extensions are active, that is almost impossible to hide since they change the can change the page in uniquely identifiable ways. It is the URL we want to hide, since that is 100% unique. Hiding the fact that extensions are active is not a goal. > Do we have a forum where extension behaviors of this kind can be discussed > with other browser engine vendors? Yes, we have the Web Extensions Community Group and I plan to bring this up. But some of this problem are unique to Safari too. Created attachment 456593 [details]
Patch
Created attachment 456598 [details]
Patch
Comment on attachment 456598 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=456598&action=review > Source/JavaScriptCore/runtime/StackFrame.cpp:71 > + // FIXME: This should not be hardcoded here, but come from some API with what schemes to mask. > + if (sourceURL.startsWithIgnoringASCIICase("safari-"_s)) This might be fine for performance testing this patch, but I don't think we should land code like this. This is a layering violation and if it's not easy to fix now then it will be even harder once we've committed to this functionality being here. (In reply to Alex Christensen from comment #9) > Comment on attachment 456598 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=456598&action=review > > > Source/JavaScriptCore/runtime/StackFrame.cpp:71 > > + // FIXME: This should not be hardcoded here, but come from some API with what schemes to mask. > > + if (sourceURL.startsWithIgnoringASCIICase("safari-"_s)) > > This might be fine for performance testing this patch, but I don't think we > should land code like this. This is a layering violation and if it's not > easy to fix now then it will be even harder once we've committed to this > functionality being here. Yeah, I agree. I'm continuing to look at the best way to add a JSC API for this. > It is the URL we want to hide, since that is 100% unique. Why does it have to be unique enough to identify the user, but not random enough to prevent identifying the user? > Hiding the fact that extensions are active is not a goal. I understand that it is not a goal of this patch. I'm seeking a confirmation that this patch solves a problem that is worth solving - in other words, whether having extensions provides enough bits to identify the user with or without this patch. (In reply to Alexey Proskuryakov from comment #11) > > It is the URL we want to hide, since that is 100% unique. > > Why does it have to be unique enough to identify the user, but not random > enough to prevent identifying the user? > > > Hiding the fact that extensions are active is not a goal. > > I understand that it is not a goal of this patch. I'm seeking a confirmation > that this patch solves a problem that is worth solving - in other words, > whether having extensions provides enough bits to identify the user with or > without this patch. I’m not sure I fully follow. Extensions in Safari get a new UUID for the host part of their URL on launch. This UUID is the same for all tabs, private or otherwise. That UUID is what we need to hide to prevent that cross site/tab tracking with a 128-bit id. Without that UUID, only a single bit of knowing an extension, and maybe a specific extension, could be determined if they modify a page. The more popular the extension, like a password manager, the less useful that bit is. I do not know what the average bucket size is today, so I'm not sure if installing an extension that 1% (or say 0.1%) or users have means that the browser becomes unique in the bucket. This is what I was seeking for someone to confirm. My other question was about why the UUID behaved like that, shared across all tabs. (and why it's a UUID, can we just have incrementing numbers instead, for example) (In reply to Alexey Proskuryakov from comment #13) > I do not know what the average bucket size is today, so I'm not sure if > installing an extension that 1% (or say 0.1%) or users have means that the > browser becomes unique in the bucket. This is what I was seeking for someone > to confirm. A good percentage of users have at least one extension active. > My other question was about why the UUID behaved like that, shared across > all tabs. > > (and why it's a UUID, can we just have incrementing numbers instead, for example) We register the UUID with a scheme handler to handle loading resources for extensions. It is the host name essentially. Do we need to meet about this? I feel we could clear things up quicker in a meeting. Created attachment 456658 [details]
Patch
I was able to A/B test this patch: Speedometer: A: 84.07 pt ± 3.0 pt B: 84.39 pt ± 2.3 pt 0.38% better (insignificant) PLT (Mac): A: 532.9 ms ± 3.8 ms B: 532.4 ms ± 4.8 ms 0.10% better (insignificant) PLT (iPhone): A: 489.8 ms ± 6.7 ms B: 487.3 ms ± 7.2 ms 0.51% better (insignificant) I'm still checking Jetstream (waiting on a retry for results). JetStream 2: A: 114.6 pt ± 1.1 pt B: 114.2 pt ± 1.8 pt 0.35% worse (insignificant) Thank you for running the performance tests. My other concerns stand, but I don't think that I'm the right person to decide if this should be landed, or to work with you on an alternative solution in Safari. So Im deferring to others' judgement. Seems like api-ios EWS failure is relevant. (In reply to Alexey Proskuryakov from comment #19) > Thank you for running the performance tests. My other concerns stand, but I > don't think that I'm the right person to decide if this should be landed, or > to work with you on an alternative solution in Safari. So Im deferring to > others' judgement. > > Seems like api-ios EWS failure is relevant. I need to look more into this, this look overly intrusive to me. I bet there is a better way. Created attachment 457274 [details]
Patch
Created attachment 457479 [details]
Patch
Created attachment 457495 [details]
Patch
Comment on attachment 457495 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457495&action=review > Source/WebCore/page/Page.cpp:3737 > + return m_maskedURLSchemes.contains(url.protocol().toStringWithoutCopying()); This can be done without calling toString, using the StringView hash translator, which should be considerably more efficient. Comment on attachment 457495 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457495&action=review Does the origin leak any information we don't want to leak or is it only about full URLs? Thinking about self.origin for example. And window.location.ancestorOrigins. Also, service workers can see the URL of their clients via ServiceWorkerClient.url. Not sure if that's covered and if it matters for your use cases. Also, HashChangeEvent contains the old and new URL when changing the hash of the URL. Wouldn't it be an issue here? > Source/WebCore/dom/Document.cpp:5635 > + static std::once_flag onceFlag; Why std::call_once()? I doubt this can be called from multiple threads given that this is on Document. Why not use a regular MainThreadNeverDestroyed? > Source/WebCore/dom/Document.h:723 > + bool shouldMaskURLForBindings(const URL&) const; Doesn't seem like this needs to be public. > Source/WebCore/dom/Document.h:727 > + const URL& maskedURLForBindings() const; ditto. > Source/WebCore/dom/Element.cpp:728 > + synchronizeAttribute(name); Can't this call getAttribute() and then do some post-processing based on resolveURLs? This would reduce code duplication. > Source/WebCore/dom/Element.cpp:1829 > +AtomString Element::getAttributeForBindings(const AtomString& localName, ResolveURLs resolveURLs) const localName -> qualifiedName, per spec. > Source/WebCore/dom/Element.cpp:1834 > + synchronizeAttribute(localName); Can't this call getAttribute() and then do some post-processing based on resolveURLs? This would reduce code duplication. > Source/WebCore/dom/Element.h:84 > +enum class ResolveURLs : uint8_t { No, NoExcludingURLsForPrivacy, Yes, YesExcludingURLsForPrivacy }; Maybe this should be 2 separate enums? One for resolvingURLs and one for excludingURLsForPrivacy? > Source/WebCore/dom/Element.h:144 > + WEBCORE_EXPORT const AtomString& getAttribute(const AtomString& localName) const; Please revert this change. The new name is wrong and the new name is correct. See the spec here: https://dom.spec.whatwg.org/#dom-element-getattribute This is a qualified name as a string (e.g. "ns:foo"). > Source/WebCore/dom/Element.h:147 > + AtomString getAttributeForBindings(const AtomString& localName, ResolveURLs = ResolveURLs::NoExcludingURLsForPrivacy) const; Should probably use qualifiedName too given that this is the equivalent function for bindings now. > Source/WebCore/dom/Element.idl:44 > DOMString? getAttributeNS(DOMString? namespaceURI, DOMString localName); I am a little surprised you didn't have to hijack getAttributeNS() too. After all, getAttribute() and getAttributeNS() are almost identical except that you can specify a namespace. As a matter of fact, I believe passing null as namespace will make it identical to getAttribute(). >> Source/WebCore/page/Page.cpp:3737 >> + return m_maskedURLSchemes.contains(url.protocol().toStringWithoutCopying()); > > This can be done without calling toString, using the StringView hash translator, which should be considerably more efficient. m_maskedURLSchemes.contains<StringViewHashTranslator>(url.protocol()); > Tools/TestWebKitAPI/Tests/WebKitCocoa/WKWebViewConfiguration.mm:236 > + We may want to add tests for window.location.href and probably window.location.protocol / port / host / pathname / search / hash, document.url, document.documentURI. Based on Chris’s comments above we must add getAttributeNS test coverage. (In reply to Chris Dumez from comment #25) > Comment on attachment 457495 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=457495&action=review > > Does the origin leak any information we don't want to leak or is it only > about full URLs? Thinking about self.origin for example. And > window.location.ancestorOrigins. The origin is the important part to hide, but the full URL can contain info too (depending on the extension). However, these URLs only need masked when the web view is showing external pages that can have extension content injected. The main document will never be an extension loaded origin, so things like self.origin, etc are not a concern for this use-case. I agree if other clients needed this SPI, we would need to protect more. > Also, service workers can see the URL of their clients via > ServiceWorkerClient.url. Not sure if that's covered and if it matters for > your use cases. Similarly, service workers are not a concern. > Also, HashChangeEvent contains the old and new URL when changing the hash of > the URL. Wouldn't it be an issue here? Ditto. (In reply to Darin Adler from comment #26) > Based on Chris’s comments above we must add getAttributeNS test coverage. Yep! Good catch Chris. We may need some adversarial testing to find places URLs might leak out. Comment on attachment 457495 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457495&action=review >> Source/WebCore/dom/Document.cpp:5635 >> + static std::once_flag onceFlag; > > Why std::call_once()? I doubt this can be called from multiple threads given that this is on Document. Why not use a regular MainThreadNeverDestroyed? Fixed. >> Source/WebCore/dom/Document.h:723 >> + bool shouldMaskURLForBindings(const URL&) const; > > Doesn't seem like this needs to be public. It is called from Element. >> Source/WebCore/dom/Document.h:727 >> + const URL& maskedURLForBindings() const; > > ditto. Ditto. >> Source/WebCore/dom/Element.cpp:728 >> + synchronizeAttribute(name); > > Can't this call getAttribute() and then do some post-processing based on resolveURLs? This would reduce code duplication. I added an inline getAttributeInternal since we need the Attribute, and getAttribute returns an AtomString. >> Source/WebCore/dom/Element.cpp:1829 >> +AtomString Element::getAttributeForBindings(const AtomString& localName, ResolveURLs resolveURLs) const > > localName -> qualifiedName, per spec. Fixed. >> Source/WebCore/dom/Element.cpp:1834 >> + synchronizeAttribute(localName); > > Can't this call getAttribute() and then do some post-processing based on resolveURLs? This would reduce code duplication. See above. >> Source/WebCore/dom/Element.h:84 >> +enum class ResolveURLs : uint8_t { No, NoExcludingURLsForPrivacy, Yes, YesExcludingURLsForPrivacy }; > > Maybe this should be 2 separate enums? One for resolvingURLs and one for excludingURLsForPrivacy? That made the switches more unwieldy in my opinion. >> Source/WebCore/dom/Element.h:144 >> + WEBCORE_EXPORT const AtomString& getAttribute(const AtomString& localName) const; > > Please revert this change. The new name is wrong and the new name is correct. See the spec here: > https://dom.spec.whatwg.org/#dom-element-getattribute > > This is a qualified name as a string (e.g. "ns:foo"). Fixed. >> Source/WebCore/dom/Element.h:147 >> + AtomString getAttributeForBindings(const AtomString& localName, ResolveURLs = ResolveURLs::NoExcludingURLsForPrivacy) const; > > Should probably use qualifiedName too given that this is the equivalent function for bindings now. Fixed. >> Source/WebCore/dom/Element.idl:44 >> DOMString? getAttributeNS(DOMString? namespaceURI, DOMString localName); > > I am a little surprised you didn't have to hijack getAttributeNS() too. After all, getAttribute() and getAttributeNS() are almost identical except that you can specify a namespace. As a matter of fact, I believe passing null as namespace will make it identical to getAttribute(). Good catch! I did miss that one because an earlier version of my change just modified Element::getAttribute, which getAttributeNS uses. Once I switched to a getAttributeForBindings I needed to do similar for getAttributeNS. >>> Source/WebCore/page/Page.cpp:3737 >>> + return m_maskedURLSchemes.contains(url.protocol().toStringWithoutCopying()); >> >> This can be done without calling toString, using the StringView hash translator, which should be considerably more efficient. > > m_maskedURLSchemes.contains<StringViewHashTranslator>(url.protocol()); Awesome! >> Tools/TestWebKitAPI/Tests/WebKitCocoa/WKWebViewConfiguration.mm:236 >> + > > We may want to add tests for window.location.href and probably window.location.protocol / port / host / pathname / search / hash, document.url, document.documentURI. Those URLs are not covered by this API. The intent of this API is to hide sub resources and links, not the main resource URL. Pull request: https://github.com/WebKit/WebKit/pull/1783 Committed 251962@main (c0c652e4bd3c): <https://commits.webkit.org/251962@main> Reviewed commits have been landed. Closing PR #1783 and removing active labels. |