Bug 4120
| Summary: | Servers that need encoding sniffing to be rendered properly | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | Alexey Proskuryakov <ap> |
| Component: | Layout and Rendering | Assignee: | Dave Hyatt <hyatt> |
| Status: | RESOLVED CONFIGURATION CHANGED | ||
| Severity: | Normal | CC: | gavin.sharp, ian, karlcow |
| Priority: | P2 | ||
| Version: | 312.x | ||
| Hardware: | Mac | ||
| OS: | OS X 10.3 | ||
| URL: | http://www.museum.ru/museum/Ostankino/5.htm | ||
| Bug Depends on: | 245305 | ||
| Bug Blocks: | |||
Alexey Proskuryakov
This server (using Microsoft-IIS/5.0) auto-guesses the encoding, and sends Mac Cyrillic to Safari. For
whatever reason, the charset sent is quite broken - "mac" is ambiguous and thus unsupported by
WebKit.
Still, it should be possible to disambiguate "mac" by using the system primary language's Mac
encoding.
% curl -I --header "User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; ru-ru) Apple WebKit/312.1
(KHTML, like Gecko) Safari/312" http://www.museum.ru/museum/Ostankino/5.htm
HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Sun, 24 Jul 2005 08:58:33 GMT
Accept-Ranges: bytes
Last-Modified: Wed, 15 Jun 2005 12:58:15 GMT
ETag: "ed49fce4a971c51:804"
Content-Length: 7318
Set-Cookie: charset=mac; path=/; expires=Mon, 10 May 2032 23:12:40 GMT
Content-Type: text/html; charset=mac
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |
Alexey Proskuryakov
Oops, in fact "mac" and "macintosh" charsets are defined in RFC 1345 (as MacRoman), and WebKit
explicitly supports them.
So, the implementation is correct, and probably shouldn't be changed. However, this example may need to
be considered in a future encoding sniffer - museum.ru is a rather important server.
Alexey Proskuryakov
I propose to use this bug to track servers whose encoding cannot be determined via HTTP or HTML
headers, so content sniffing is required. Two more:
http://stats.distributed.net/team/tmsummary.php?project_id=8&team=11269
http://www.mdf.ru
Alexey Proskuryakov
http://www.zoo.ru (also sends charset=mac instead of x-mac-cyrillic).
Alexey Proskuryakov
Bug 17405: http://tianya.cn - no charset information; encoded as Simplified Chinese.
Karl Dubost
From the sites in this bug
PASS http://www.museum.ru/museum/Ostankino/5.htm
PASS https://stats.distributed.net/team/tmsummary.php?project_id=8&team=11269
PASS http://www.mdf.ru after redirect to https://www.mamm-mdf.ru
ERR http://www.zoo.ru Domain is for sale.
ERR http://tianya.cn Domain not available anymore.
Let's close this bug as Bug 245305
is about addressing the requirements of Content Sniffing.