RESOLVED FIXED175597
XMLHttpRequest should not sniff content encoding
https://bugs.webkit.org/show_bug.cgi?id=175597
Summary XMLHttpRequest should not sniff content encoding
JF Bastien
Reported 2017-08-15 13:50:37 PDT
XHRs currently don't do content sniffing, whereas Chrome and Firefox seem to. This is a problem when Content-Encoding isn't set yet the content is gzip. This occurs on webpages such as https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html A smaller repro is to take the test from #175579 and change the URL to https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz and try it out on each browser. Chrome and Firefox don't see any gzip header, but definitely unzip the content. A misleading thing is that both curl and nscurl get the Content-Endoding header from AWS!
Attachments
Patch and layout tests (54.74 KB, patch)
2017-10-27 19:25 PDT, Daniel Bates
no flags
Patch and layout tests (57.07 KB, patch)
2017-10-27 20:03 PDT, Daniel Bates
no flags
Patch (57.70 KB, patch)
2017-10-27 20:04 PDT, Daniel Bates
no flags
Patch (61.63 KB, patch)
2017-10-27 22:15 PDT, Daniel Bates
no flags
Patch (61.70 KB, patch)
2017-10-27 22:38 PDT, Daniel Bates
no flags
Patch (61.61 KB, patch)
2017-10-27 22:42 PDT, Daniel Bates
no flags
Patch (61.64 KB, patch)
2017-10-27 22:54 PDT, Daniel Bates
no flags
Patch (63.81 KB, patch)
2017-10-30 12:04 PDT, Daniel Bates
achristensen: review+
JF Bastien
Comment 1 2017-08-15 14:02:05 PDT
FWIW I tried the "obvious" fix in XMLHttpRequest.cpp: options.sniffContent = SniffContent; That doesn't seem to do the trick, or I did it wrong. Further, we'd need to check that what other browsers do is actually what the XHR spec imports indirectly from the MIME Sniffing spec.
JF Bastien
Comment 2 2017-08-15 14:05:18 PDT
Alexey Proskuryakov
Comment 3 2017-08-15 15:34:57 PDT
I think that the first question to answer is why this resource is served with different response headers to Safari than to curl. You can try adding request headers from Safari to curl until it matches. Checking with telnet, a very minimal request gets the header too. Without actually looking at it, my guess is that CFNetwork decodes the response for us, and removes the header before WebKit gets to see it.
Daniel Bates
Comment 4 2017-08-15 15:48:49 PDT
WebKit is receiving an HTTP response that is comparable with the response seen using curl. Setting a breakpoint in DocumentThreadableLoader::didReceiveResponse(), <https://trac.webkit.org/browser/trunk/Source/WebCore/loader/DocumentThreadableLoader.cpp?rev=220750#L299> (where we process the XHR response) I see: (lldb) po response.m_nsResponse.m_ptr <NSHTTPURLResponse: 0x6180000222c0> { URL: http://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz } { status code: 200, headers { "Accept-Ranges" = bytes; "Content-Encoding" = gzip; "Content-Length" = 90198148; "Content-Type" = "application/octet-stream"; Date = "Tue, 15 Aug 2017 22:44:19 GMT"; Etag = "\"abd8ecef2479f13cc29e248e294b5330-11\""; "Last-Modified" = "Thu, 08 Jun 2017 16:29:01 GMT"; Server = AmazonS3; "x-amz-id-2" = "DoBSu33Sovmsp3CEZKWhkhnQZXGWyq40Yb2h0UjCvUdtVIecMM0iFyCQIlUyA3pZUtdyCIlVzxs="; "x-amz-request-id" = 0A63140CDBC51938; } }
Daniel Bates
Comment 5 2017-08-17 14:26:40 PDT
(In reply to Daniel Bates from comment #4) > WebKit is receiving an HTTP response that is comparable with the response > seen using curl. Setting a breakpoint in > DocumentThreadableLoader::didReceiveResponse(), > <https://trac.webkit.org/browser/trunk/Source/WebCore/loader/ > DocumentThreadableLoader.cpp?rev=220750#L299> (where we process the XHR > response) I see: > > (lldb) po response.m_nsResponse.m_ptr > <NSHTTPURLResponse: 0x6180000222c0> { URL: > http://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz } { > status code: 200, headers { > "Accept-Ranges" = bytes; > "Content-Encoding" = gzip; > "Content-Length" = 90198148; > "Content-Type" = "application/octet-stream"; > Date = "Tue, 15 Aug 2017 22:44:19 GMT"; > Etag = "\"abd8ecef2479f13cc29e248e294b5330-11\""; > "Last-Modified" = "Thu, 08 Jun 2017 16:29:01 GMT"; > Server = AmazonS3; > "x-amz-id-2" = > "DoBSu33Sovmsp3CEZKWhkhnQZXGWyq40Yb2h0UjCvUdtVIecMM0iFyCQIlUyA3pZUtdyCIlVzxs= > "; > "x-amz-request-id" = 0A63140CDBC51938; > } } This response depends on the networking layer to compensate. If the intention of the web developer was to serve pre-compressed data for the browser to auto decompress then the server is misconfigured by definition of the HTTP header Content-Type in RFC 7231 and Chrome and Firefox are compensating for this misconfiguration. The Content-Type header is defined as: The "Content-Type" header field indicates the media type of the associated representation...The indicated media type defines .... how that data is intended to be processed by a recipient ... after any content codings indicated by Content-Encoding are decoded. The key phrase to look for is "after any content codings indicated by Content-Encoding are decoded". CFNetwork treats a response with "Content-Type: application/octet-stream" and "Content-Encoding: gzip" as a response with "Content-Type: application/gzip" and "Content-Encoding: identity". To achieve auto decompressing of the pre-compressed data, the web server should have returned a response with a Content-Type that is handled by WebKit, say text/plain, such that response.m_nsResponse.m_ptr would look like: (lldb) po response.m_nsResponse.m_ptr <NSHTTPURLResponse: 0x6180000222c0> { URL: http://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz } { status code: 200, headers { "Accept-Ranges" = bytes; "Content-Encoding" = gzip; "Content-Length" = 90198148; "Content-Type" = "text/plain"; Date = "Tue, 15 Aug 2017 22:44:19 GMT"; Etag = "\"abd8ecef2479f13cc29e248e294b5330-11\""; "Last-Modified" = "Thu, 08 Jun 2017 16:29:01 GMT"; Server = AmazonS3; "x-amz-id-2" = "DoBSu33Sovmsp3CEZKWhkhnQZXGWyq40Yb2h0UjCvUdtVIecMM0iFyCQIlUyA3pZUtdyCIlVzxs="; "x-amz-request-id" = 0A63140CDBC51938; } }
Daniel Bates
Comment 6 2017-08-17 14:32:02 PDT
(In reply to JF Bastien from comment #1) > FWIW I tried the "obvious" fix in XMLHttpRequest.cpp: > options.sniffContent = SniffContent; > That doesn't seem to do the trick, or I did it wrong. : ( I would have expected that enabling sniffing would have made CFNetwork override the Content-Type in the response with text/plain and hence auto-decompress it.
JF Bastien
Comment 7 2017-08-17 14:34:23 PDT
(In reply to Daniel Bates from comment #5) > (In reply to Daniel Bates from comment #4) [... snip ...] So you're saying we're doing the right thing, and the server is just misconfigured?
Daniel Bates
Comment 8 2017-08-17 14:40:34 PDT
(In reply to Daniel Bates from comment #6) > (In reply to JF Bastien from comment #1) > > FWIW I tried the "obvious" fix in XMLHttpRequest.cpp: > > options.sniffContent = SniffContent; > > That doesn't seem to do the trick, or I did it wrong. > > : ( I would have expected that enabling sniffing would have made CFNetwork > override the Content-Type in the response with text/plain and hence > auto-decompress it. Disregard this remark.
Daniel Bates
Comment 9 2017-08-17 14:44:33 PDT
(In reply to JF Bastien from comment #7) > (In reply to Daniel Bates from comment #5) > > (In reply to Daniel Bates from comment #4) > [... snip ...] > > So you're saying we're doing the right thing, and the server is just > misconfigured? Yes, we are doing the "right thing". We should check how Microsoft Edge and IE handle a response with "Content-Encoding: gzip" and "Content-Type: application/octet-stream". Do they treat it as "Content-Encoding: gzip" and "Content-Type: text/plain"? If WebKit is the only browser engine that does not auto decompress <http://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz> then we may want to consider adding a quirk either in WebKit or CFNetwork.
Daniel Bates
Comment 10 2017-08-17 15:47:17 PDT
After reading over the definition of Content-Type and Content-Encoding in RFC 7231 again, I no longer think this bug represents a server misconfiguration. We should look to understand why CF Network explicitly chose to treat "Content-Type: application/octet-stream" and "Content-Encoding: gzip" as "Content-Type: application/gzip" and "Content-Encoding: identity", respectively.
Daniel Bates
Comment 11 2017-10-27 18:25:02 PDT
(In reply to JF Bastien from comment #2) > <rdar://problem/33822249> This tracks the CFNetwork support infrastructure needed to fix this bug.
Daniel Bates
Comment 12 2017-10-27 18:25:35 PDT
The following radar tracks the WebKit work associated with this bug: <rdar://problem/34912624>
Daniel Bates
Comment 13 2017-10-27 19:25:13 PDT
Created attachment 325232 [details] Patch and layout tests
Daniel Bates
Comment 14 2017-10-27 20:03:47 PDT
Created attachment 325233 [details] Patch and layout tests
Daniel Bates
Comment 15 2017-10-27 20:04:44 PDT
JF Bastien
Comment 16 2017-10-27 21:23:17 PDT
Comment on attachment 325234 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325234&action=review A small suggestion and bugs look sad: /home/ews/ltilve-gtk-wk2-ews/WebKit/Source/WebKit/NetworkProcess/NetworkDataTask.cpp:59:206: error: no matching function for call to 'WebKit::NetworkDataTaskSoup::create(WebKit::NetworkSession&, WebKit::NetworkDataTaskClient&, const WebCore::ResourceRequest&, const WebCore::StoredCredentialsPolicy&, const WebCore::ContentEncodingSniffingPolicy&, const bool&)' Otherwise r=me > Source/WebCore/PAL/pal/spi/cf/CFNetworkSPI.h:211 > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302 This doesn't also affect iOS? > Source/WebCore/loader/ResourceLoader.cpp:219 > + m_handle = ResourceHandle::create(frameLoader()->networkingContext(), m_request, this, m_defersLoading, m_options.sniffContent == SniffContent, m_options.sniffContentEncoding == ContentEncodingSniffingPolicy::Sniff); I tend to like enum classes rather than booleans for this type of thing: passing in multiple booleans is error prone and harder to read, whereas having an enum class as parameter can't be messed up. So I'd just pass the ContentEncodingSnifingPolicy strait into create(). Is that not practical? > Source/WebCore/loader/appcache/ApplicationCacheGroup.cpp:480 > + bool shouldContentEncodingSniff = true; Even here, it's way more fool-proof to have an enum class.
Daniel Bates
Comment 17 2017-10-27 22:15:19 PDT
Build Bot
Comment 18 2017-10-27 22:17:31 PDT Comment hidden (obsolete)
Daniel Bates
Comment 19 2017-10-27 22:36:01 PDT
(In reply to JF Bastien from comment #16) > > Source/WebCore/PAL/pal/spi/cf/CFNetworkSPI.h:211 > > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302 > > This doesn't also affect iOS? > No, according to <rdar://problem/33822249> this SPI does not affect iOS because this bug represents a macOS quirk. iOS does not sniff content encoding based on the URL. > > Source/WebCore/loader/ResourceLoader.cpp:219 > > + m_handle = ResourceHandle::create(frameLoader()->networkingContext(), m_request, this, m_defersLoading, m_options.sniffContent == SniffContent, m_options.sniffContentEncoding == ContentEncodingSniffingPolicy::Sniff); > > I tend to like enum classes rather than booleans for this type of thing: > passing in multiple booleans is error prone and harder to read, whereas > having an enum class as parameter can't be messed up. So I'd just pass the > ContentEncodingSnifingPolicy strait into create(). Is that not practical? > We should clean this up as passing consecutive booleans is error prone. I hope you do not mind that defer this to a follow up bug so as to keep this patch focused on the bug fix (*) due to its time sensitive nature. Filed bug #178979. (*) I thought to limit style changes to either lines I needed to touch or lines immediately above or below lines I needed to touch. Although I could make the change you mentioned now we may also want to consider whether passing a struct of options (similar to NetworkLoadParameters) to ResourceHandle::create()/ResourceHandle::ResourceHandle() could simplify this code and callers. > > Source/WebCore/loader/appcache/ApplicationCacheGroup.cpp:480 > > + bool shouldContentEncodingSniff = true; > > Even here, it's way more fool-proof to have an enum class. See above remark.
Daniel Bates
Comment 20 2017-10-27 22:38:34 PDT
Build Bot
Comment 21 2017-10-27 22:40:40 PDT Comment hidden (obsolete)
Daniel Bates
Comment 22 2017-10-27 22:42:45 PDT
Build Bot
Comment 23 2017-10-27 22:45:37 PDT Comment hidden (obsolete)
Daniel Bates
Comment 24 2017-10-27 22:54:43 PDT
Build Bot
Comment 25 2017-10-27 22:57:51 PDT Comment hidden (obsolete)
JF Bastien
Comment 26 2017-10-28 07:32:16 PDT
(In reply to Daniel Bates from comment #19) > (In reply to JF Bastien from comment #16) > > > Source/WebCore/PAL/pal/spi/cf/CFNetworkSPI.h:211 > > > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302 > > > > This doesn't also affect iOS? > > > > No, according to <rdar://problem/33822249> this SPI does not affect iOS > because this bug represents a macOS quirk. iOS does not sniff content > encoding based on the URL. Ok good! > > > Source/WebCore/loader/ResourceLoader.cpp:219 > > > + m_handle = ResourceHandle::create(frameLoader()->networkingContext(), m_request, this, m_defersLoading, m_options.sniffContent == SniffContent, m_options.sniffContentEncoding == ContentEncodingSniffingPolicy::Sniff); > > > > I tend to like enum classes rather than booleans for this type of thing: > > passing in multiple booleans is error prone and harder to read, whereas > > having an enum class as parameter can't be messed up. So I'd just pass the > > ContentEncodingSnifingPolicy strait into create(). Is that not practical? > > > > We should clean this up as passing consecutive booleans is error prone. I > hope you do not mind that defer this to a follow up bug so as to keep this > patch focused on the bug fix (*) due to its time sensitive nature. Filed bug > #178979. No problem at all. > (*) I thought to limit style changes to either lines I needed to touch or > lines immediately above or below lines I needed to touch. Although I could > make the change you mentioned now we may also want to consider whether > passing a struct of options (similar to NetworkLoadParameters) to > ResourceHandle::create()/ResourceHandle::ResourceHandle() could simplify > this code and callers. > > > > Source/WebCore/loader/appcache/ApplicationCacheGroup.cpp:480 > > > + bool shouldContentEncodingSniff = true; > > > > Even here, it's way more fool-proof to have an enum class. > > See above remark.
Brady Eidson
Comment 27 2017-10-30 11:03:15 PDT
I think you meant for your Changelogs reference radar 34912624, not 33822249
Alex Christensen
Comment 28 2017-10-30 11:12:23 PDT
Comment on attachment 325249 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325249&action=review > Source/WebKit/NetworkProcess/cocoa/NetworkDataTaskCocoa.mm:108 > + auto mutableRequest = adoptNS([nsRequest mutableCopy]); Before, the common case would make 0 copies of the NSURLRequest. Now the common case makes 1. Let's not increase that.
Daniel Bates
Comment 29 2017-10-30 11:40:56 PDT
(In reply to Brady Eidson from comment #27) > I think you meant for your Changelogs reference radar 34912624, not 33822249 Will fix.
Daniel Bates
Comment 30 2017-10-30 11:41:21 PDT
(In reply to Alex Christensen from comment #28) > Comment on attachment 325249 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=325249&action=review > > > Source/WebKit/NetworkProcess/cocoa/NetworkDataTaskCocoa.mm:108 > > + auto mutableRequest = adoptNS([nsRequest mutableCopy]); > > Before, the common case would make 0 copies of the NSURLRequest. Now the > common case makes 1. Let's not increase that. Will fix.
Daniel Bates
Comment 31 2017-10-30 12:04:23 PDT
Created attachment 325365 [details] Patch Updated patch based on feedback from Brady Eidson and Alex Christensen.
Build Bot
Comment 32 2017-10-30 12:06:49 PDT
Attachment 325365 [details] did not pass style-queue: ERROR: Source/WebCore/platform/network/BlobResourceHandle.cpp:137: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 1 in 35 files If any of these errors are false positives, please file a bug against check-webkit-style.
Daniel Bates
Comment 33 2017-10-30 14:17:46 PDT
Comment on attachment 325365 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325365&action=review > Source/WebCore/ChangeLog:13 > + Fixes an issue where the body of an HTTP response associated with an XHR request to a .gz file > + would be automatically gzipped decompressed if the HTTP response omitted a Content-Encoding HTTP > + header. Specifically, such a response would be treated analogous to a response with headers > + "Content-Type: application/gzip" and "Content-Encoding: identity". This behavior does not conform > + the HTTP 1.1 spec. and breaks Epic Zen Garden, <https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html>. Will change this paragraph and the same paragraph in Source/WebKit/ChangeLog to read: Fixes an issue where the body of an HTTP response with headers "Content-Type: application/octet-stream" and "Content-Encoding: gzip" associated with an XHR request to a .gz file would not be automatically gzipped decompressed. Specifically, such a response would be treated analogous to a response with headers "Content-Type: application/gzip" and "Content-Encoding: identity". This behavior does not conform to the behavior of the Content-Encoding header as defined in the HTTP 1.1 and later specs. Moreover this behavior breaks the Epic Zen Garden demo: <https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html>.
Alex Christensen
Comment 34 2017-11-01 10:58:33 PDT
Comment on attachment 325365 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325365&action=review > Source/WebCore/platform/network/PingHandle.h:55 > + bool defersLoading = false; It would be nice if these were two-state enums. enum class DefersLoading { No, Yes }; > LayoutTests/platform/mac/TestExpectations:35 > +http/tests/xmlhttprequest/gzip-content-type-no-content-encoding.html [ Skip ] I think this could be marked as pass/fail instead of skipping.
Daniel Bates
Comment 35 2017-11-01 15:04:07 PDT
(In reply to Alex Christensen from comment #34) > Comment on attachment 325365 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=325365&action=review > > > Source/WebCore/platform/network/PingHandle.h:55 > > + bool defersLoading = false; > > It would be nice if these were two-state enums. > enum class DefersLoading { No, Yes }; > I plan to clean this up in bug #178979 as mentioned in comment #18. > > LayoutTests/platform/mac/TestExpectations:35 > > +http/tests/xmlhttprequest/gzip-content-type-no-content-encoding.html [ Skip ] > > I think this could be marked as pass/fail instead of skipping. OK.
Daniel Bates
Comment 36 2017-11-01 15:23:53 PDT
Daniel Bates
Comment 37 2017-11-01 21:04:11 PDT
Fixed change log description in <https://trac.webkit.org/changeset/224315>.
Alexey Proskuryakov
Comment 38 2018-10-25 12:10:26 PDT
Comment on attachment 325365 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325365&action=review > Source/WebCore/PAL/pal/spi/cf/CFNetworkSPI.h:211 > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302 This check is not correct. The deployment target for 10.13 stays at 101300, so this check is equivalent to "__MAC_OS_X_VERSION_MIN_REQUIRED >= 101400". > Source/WebCore/platform/network/cf/ResourceHandleCFNet.cpp:165 > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302 Ditto. > Source/WebCore/platform/network/mac/ResourceHandleMac.mm:129 > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED < 101302 Ditto.
Alexey Proskuryakov
Comment 39 2018-10-25 13:05:58 PDT
> No, according to <rdar://problem/33822249> this SPI does not affect iOS because this bug represents a macOS quirk. iOS does not sniff content encoding based on the URL. If so, why was the test not enabled on iOS? Enabled the test for Mojave+ and for iOS in r237425 and r237426. Let's see if iOS actually passes.
jujjyl
Comment 40 2022-11-11 01:11:49 PST
I am the original author of https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html , and have now been made aware of this bug in Safari on Zen Garden the first time. This bug reads fixed, but essentially the same bug is still occurring in the wild in bug https://bugs.webkit.org/show_bug.cgi?id=247421 . Did the patch in this bug ever land?
Darin Adler
Comment 41 2022-11-11 01:43:09 PST
Yes, the patch did land.
Note You need to log in before you can comment on or make changes to this bug.