WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
82577
Work around an entity parsing bug in libxml2 2.7.3 (supplied with Lion) and unskip tests
https://bugs.webkit.org/show_bug.cgi?id=82577
Summary
Work around an entity parsing bug in libxml2 2.7.3 (supplied with Lion) and u...
Nikolas Zimmermann
Reported
2012-03-29 00:36:42 PDT
These tests are currently skipped on Lion, w/o a comment :( svg/W3C-SVG-1.1/coords-viewattr-01-b.svg svg/W3C-SVG-1.1/coords-viewattr-02-b.svg svg/custom/preserve-aspect-ratio-syntax.svg svg/custom/viewbox-syntax.svg svg/zoom/page/zoom-coords-viewattr-01-b.svg svg/zoom/text/zoom-coords-viewattr-01-b.svg svg/W3C-SVG-1.1/render-elems-03-t.svg I checked them, and it turns out entity parsing is broken, due to a libxml2 bug. Those tests use them like: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1 Tiny//EN" "
http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-tiny.dtd
" [ <!ENTITY shape " <path d='M60,0 l60,0 l60,60 l0,60 l-60,60 l-60,0 l-60,-60 l0,-60 z'/>"> ]> ... <g fill="yellow" stroke="red" stroke-width="8" >&shape;</g> We're creating our libxml2 parser with the NOENT mode, which means libxml2 replaces the entity, and parses the contained elements, just if they were part of the original tree. Though upon parsing them it doesn't see the "svg" namespace, and thus in the example above an Element is created instead of a SVGPathElement, as expected.
https://bugzilla.gnome.org/show_bug.cgi?id=502960
libxml 2.7.4 already fixed this in 2009 (!). Anyhow, we need to find a workaround, these are important tests that aren't run on Lion at the moment. (I stumbled across this as I broke preserve-aspect-ratio-syntax.svg for non-Mac platforms, and couldn't notice this, because the test is skipped).
Attachments
Patch
(787.15 KB, patch)
2012-04-01 04:33 PDT
,
Nikolas Zimmermann
fpizlo
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Dirk Schulze
Comment 1
2012-03-29 06:55:31 PDT
Ha, did the fix of libxml finally land in lion? :) We have some hacks in the controller of libxml because of bugs in older versions of libxml, which where shipped till SL. IIRC we already have a bunch of these bug reports about broken entity support in newer versions of libxml.
Nikolas Zimmermann
Comment 2
2012-03-30 05:56:18 PDT
(In reply to
comment #1
)
> Ha, did the fix of libxml finally land in lion? :) We have some hacks in the controller of libxml because of bugs in older versions of libxml, which where shipped till SL. IIRC we already have a bunch of these bug reports about broken entity support in newer versions of libxml.
No Lion includes libxml 2.7.3, which does not include the fix. We need to add a new workaround for that :( I have a patch ready, need to tidy it up and submit.
Nikolas Zimmermann
Comment 3
2012-04-01 04:33:41 PDT
Created
attachment 134993
[details]
Patch
Nikolas Zimmermann
Comment 4
2012-04-01 04:33:51 PDT
Uploading my workaround with --suggest-reviewers, looking for comments! I tried to do it as un-intrusive as possible, I hope it's fine as-is.
Nikolas Zimmermann
Comment 5
2012-04-01 04:34:35 PDT
Forgot to say: the actual patch is only 7k, the rest are rebaselines - no worries ;-)
Adam Barth
Comment 6
2012-04-01 10:20:49 PDT
Comment on
attachment 134993
[details]
Patch Can you create a USE(LIBXML_ENTITY_PARSER_WORKAROUND) ifdef and put all this code behind the ifdef? That will make it easier for us to remove it in the future.
Nikolas Zimmermann
Comment 7
2012-04-01 10:56:07 PDT
(In reply to
comment #6
)
> (From update of
attachment 134993
[details]
) > Can you create a USE(LIBXML_ENTITY_PARSER_WORKAROUND) ifdef and put all this code behind the ifdef? That will make it easier for us to remove it in the future.
I followed the existing style for another libxml workaround in the same file. I don't see the benefit of adding USE() macro support for hacks like this - if you disable it in Platform.h at some point, you still need to remove it from the cpp file as well. If I shall change it, the other libxml workaround needs to be changed as well. Is it worth it?
Eric Seidel (no email)
Comment 8
2012-04-01 11:34:37 PDT
Comment on
attachment 134993
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=134993&action=review
> Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp:548 > + , m_depthTriggeringEntityExpansion(-1) > + , m_isParsingEntityDeclaration(false)
Seems these should also be removed at compile time if the workaround is not in place.
> Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp:743 > +#if LIBXML_VERSION >= 20704
Which suggests that this needs to be turned into a #define WORKAROUND_ENTITY_PARSING_BUG or similar, so it can be re-used as a compile time check in various places throughtout this file.
Filip Pizlo
Comment 9
2012-04-01 11:54:13 PDT
(In reply to
comment #6
)
> (From update of
attachment 134993
[details]
) > Can you create a USE(LIBXML_ENTITY_PARSER_WORKAROUND) ifdef and put all this code behind the ifdef? That will make it easier for us to remove it in the future.
I think that this patch makes removing the hack pretty easy already, since the things it does are guarded by hackAroundLibXMLEntityParsingBug(). Furthermore, the behavior of hackAroundLibXMLEntityParsingBug() already depends on the libxml version, and already has a comment describing that it is a workaround for a bug that existed before 2.7.4. I also think it's good for code quality to have this patch use a guard that can be checked using a C++ conditional rather than a C preprocessor conditional, since the latter tends to make the code ugly. Though in this patch, it may not make a big difference, and if it were me, I'd probably have used C preprocessor conditionals, for the reasons that Eric stated. The thing that C preprocessor conditionals instead of C++ conditionals would give us here is to remove the 8 bytes of data that seem to be added to the XMLDocumentParser class. But that does not seem like much, since the class already has a lot of state. In particular, most of the other cruft that this code adds ought to be removed at compile-time already. hackAroundLibXMLEntityParsingBug() is a static inline that returns a constant, so that should result in simplification of control flow. entityDeclarationHandler() is the only source of time overhead; at best it will be a tail call, leading to a useless jump. At worst, it'll be a handful of instructions that do some stack fiddling. But I can't imagine that matters much. Nikolas, do you have insight into what the performance and memory usage impacts of this patch are?
Alexey Proskuryakov
Comment 10
2012-04-01 12:45:57 PDT
> Which suggests that this needs to be turned into a #define WORKAROUND_ENTITY_PARSING_BUG or similar, so it can be re-used as a compile time check in various places throughtout this file.
This seems like a better approach to me, too.
Nikolas Zimmermann
Comment 11
2012-04-01 12:47:35 PDT
(In reply to
comment #9
)
> (In reply to
comment #6
) > > (From update of
attachment 134993
[details]
[details]) > > Can you create a USE(LIBXML_ENTITY_PARSER_WORKAROUND) ifdef and put all this code behind the ifdef? That will make it easier for us to remove it in the future. > > I think that this patch makes removing the hack pretty easy already, since the things it does are guarded by hackAroundLibXMLEntityParsingBug(). Furthermore, the behavior of hackAroundLibXMLEntityParsingBug() already depends on the libxml version, and already has a comment describing that it is a workaround for a bug that existed before 2.7.4.
Completely agreed.
> Nikolas, do you have insight into what the performance and memory usage impacts of this patch are?
Nope, I didn't measure any performance with this patch. Do you think this should be tested? Can you suggest a file that should be benchmarked?
Nikolas Zimmermann
Comment 12
2012-04-01 12:48:20 PDT
(In reply to
comment #10
)
> > Which suggests that this needs to be turned into a #define WORKAROUND_ENTITY_PARSING_BUG or similar, so it can be re-used as a compile time check in various places throughtout this file. > > This seems like a better approach to me, too.
I just followed the existing pattern, as I didn't specifically want to avoid initializing the variables if not needed. I thought it's not worth it.
Nikolas Zimmermann
Comment 13
2012-04-04 06:55:52 PDT
Any further comments?
Filip Pizlo
Comment 14
2012-04-04 17:49:27 PDT
Comment on
attachment 134993
[details]
Patch Sorry for the delay! I've been distracted this week. R=me.
Adam Barth
Comment 15
2012-04-04 18:02:44 PDT
Filip, any thoughts on
Comment #6
,
Comment #8
, and
Comment #10
?
Filip Pizlo
Comment 16
2012-04-04 18:13:19 PDT
(In reply to
comment #15
)
> Filip, any thoughts on
Comment #6
,
Comment #8
, and
Comment #10
?
#6, #8: see my previous reply. #10: spoke to Alexey in person. He still views Nikolas' style as being somewhat unusual. I still view it as being desirable, because it reduces #ifdef's, which make the code harder to read. I don't think these stylistic issues are so grotesque as to preclude this patch from landing. I don't think that using "if (fooBar()) { ... }" makes it any harder to remove the code than using "#if FOO_BAR \n ... \n #endif". I don't think there is a performance implication to using "if (fooBar())" instead of "#if FOO_BAR", since the function in question is static inline. I haven't heard further replies from you or Eric, and nobody flagged the patch as r-. This led me to believe that you didn't feel strongly enough about your views on the style of if statement being used to block the patch from landing as-is. The patch has sat around for long enough that I decided that it would be better to let the patch land than to continue arguing about these small things. If you do feel strongly about preferring "#if FOO_BAR" over "if (fooBar())" then I suppose there is still time to r- the patch. It's your call. I like the patch as is and would rather err on the side of letting it land but I don't have strong feelings either way.
Mark Rowe (bdash)
Comment 17
2012-04-04 18:17:53 PDT
For what it's worth, #if's *are* easier to remove since you can simply feed the files in question through unifdef.
Nikolas Zimmermann
Comment 18
2012-04-04 23:31:33 PDT
(In reply to
comment #17
)
> For what it's worth, #if's *are* easier to remove since you can simply feed the files in question through unifdef.
I'll take care of removing the code once Lion ships with a newer libxml, okay? Thanks for the review Filip! I'll land it as-is today. I can rework the patch to use ifdefs but only if we change the other libxml bug workaround to the same style, as a mixture doesn't make much sense. Agreed?
Nikolas Zimmermann
Comment 19
2012-04-05 01:18:45 PDT
Committed
r113299
: <
http://trac.webkit.org/changeset/113299
>
Simon Fraser (smfr)
Comment 20
2012-04-05 11:00:30 PDT
This appears to have made a bunch of tests start failing, esp. some in css3/selectors3/xml
Jon Lee
Comment 21
2012-04-05 15:23:42 PDT
Rolled out in
r113385
because of large number of failing tests. Reopening bug.
Nikolas Zimmermann
Comment 22
2012-04-06 23:29:01 PDT
(In reply to
comment #20
)
> This appears to have made a bunch of tests start failing, esp. some in css3/selectors3/xml
*sigh* I forgot that only cr-linux EWS runs tests. The cr-linux slave has a libxml2 version > 2.7.3. I'll investigate.
Nikolas Zimmermann
Comment 23
2012-04-07 00:07:01 PDT
(In reply to
comment #20
)
> This appears to have made a bunch of tests start failing, esp. some in css3/selectors3/xml
Funny, css3/selectors3/xhtml/css3-modsel-120.xml - this fails but has no entities at all. if (hackAroundLibXMLEntityParsingBug() && context()->depth > depthTriggeringEntityExpansion() && uri.isNull() && prefix.isNull()) I should check if depthTriggeringEntityExpansion is actually != -1. I'll reland this with this obvious fix, after rerunning all layout tests on Mac.
Nikolas Zimmermann
Comment 24
2012-04-07 00:40:07 PDT
Relanded in
r113542
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug