WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
134767
Add support for automated W3C test baseline computation
https://bugs.webkit.org/show_bug.cgi?id=134767
Summary
Add support for automated W3C test baseline computation
youenn fablet
Reported
2014-07-09 07:08:59 PDT
To ease the process of importing W3C tests as WebKit layout tests, it may be convenient to provide a tool that imports W3C tests, generates expected.txt files and related TestExpectation.
Attachments
Patch
(13.19 KB, patch)
2014-07-09 07:13 PDT
,
youenn fablet
no flags
Details
Formatted Diff
Diff
skipping all tests by default
(15.12 KB, patch)
2014-07-16 11:43 PDT
,
youenn fablet
no flags
Details
Formatted Diff
Diff
add support for merge of TestExpectations
(19.20 KB, patch)
2014-09-16 03:16 PDT
,
youenn fablet
no flags
Details
Formatted Diff
Diff
Improving test expectation merge
(22.65 KB, patch)
2014-12-23 07:27 PST
,
youenn fablet
no flags
Details
Formatted Diff
Diff
Patch
(22.60 KB, patch)
2014-12-23 15:39 PST
,
youenn fablet
no flags
Details
Formatted Diff
Diff
Adding CSS testsuite support
(12.67 KB, patch)
2015-01-13 07:55 PST
,
youenn fablet
no flags
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
youenn fablet
Comment 1
2014-07-09 07:13:27 PDT
Created
attachment 234636
[details]
Patch
youenn fablet
Comment 2
2014-07-16 11:43:52 PDT
Created
attachment 235013
[details]
skipping all tests by default
youenn fablet
Comment 3
2014-09-16 03:16:00 PDT
Created
attachment 238170
[details]
add support for merge of TestExpectations
youenn fablet
Comment 4
2014-12-23 07:27:56 PST
Created
attachment 243676
[details]
Improving test expectation merge
youenn fablet
Comment 5
2014-12-23 15:39:13 PST
Created
attachment 243704
[details]
Patch
Bem Jones-Bey
Comment 6
2015-01-05 14:08:56 PST
Comment on
attachment 243704
[details]
Patch I'm not convinced that we want to be automatically adding entries to TestExpectations for imported tests, especially without bugs. Also, often multiple imported tests will fail for the same reason, so it can be good to have a human look at them and intelligently add a single bug for these TestExpectations lines. That being said, I'm all for a tool that makes it easier to quickly see which imported tests have failed an in what way to make it easier to create the test expectations entries and relevant bugs.
Ryosuke Niwa
Comment 7
2015-01-05 14:55:11 PST
(In reply to
comment #6
)
> Comment on
attachment 243704
[details]
> Patch > > I'm not convinced that we want to be automatically adding entries to > TestExpectations for imported tests, especially without bugs. Also, often > multiple imported tests will fail for the same reason, so it can be good to > have a human look at them and intelligently add a single bug for these > TestExpectations lines. That being said, I'm all for a tool that makes it > easier to quickly see which imported tests have failed an in what way to > make it easier to create the test expectations entries and relevant bugs.
I agree. I think having a tool to help humans triage failures is good. Mindlessly adding test expectations for all failing tests is not.
youenn fablet
Comment 8
2015-01-06 01:40:14 PST
(In reply to
comment #7
)
> (In reply to
comment #6
) > > Comment on
attachment 243704
[details]
> > Patch > > > > I'm not convinced that we want to be automatically adding entries to > > TestExpectations for imported tests, especially without bugs. Also, often > > multiple imported tests will fail for the same reason, so it can be good to > > have a human look at them and intelligently add a single bug for these > > TestExpectations lines. That being said, I'm all for a tool that makes it > > easier to quickly see which imported tests have failed an in what way to > > make it easier to create the test expectations entries and relevant bugs. > > I agree. I think having a tool to help humans triage failures is good. > Mindlessly adding test expectations for all failing tests is not.
Right, manual inspection of the results and augmentation of the generated test expectations is needed. The tool to ease test results study may well be w3c_conformance_results.html from
bug 134766
. Automatically generating WPT test expectations has some additional benefits, like ordering of the tests, clustering of the test suites... One important use case is resyncing with the latest version of the WPT repo. We should strive to make that effort minimal. This patch is an initial step towards that goal.
Bem Jones-Bey
Comment 9
2015-01-06 08:26:21 PST
(In reply to
comment #8
)
> Right, manual inspection of the results and augmentation of the generated > test expectations is needed. > The tool to ease test results study may well be w3c_conformance_results.html > from
bug 134766
.
From a quick look, that does seem like a useful tool.
> Automatically generating WPT test expectations has some additional benefits, > like ordering of the tests, clustering of the test suites...
I think those are outweighed by the detriment that if no one is triaging or filing bugs, issues just won't get fixed. I'm also worried that a test that didn't fail before would start failing on import and the person running the import wouldn't notice.
> One important use case is resyncing with the latest version of the WPT repo. > We should strive to make that effort minimal. This patch is an initial step > towards that goal.
I don't think that updating the test expectations file is the hardest part of the import. Blink has a script that imports both the WPT tests and the CSSWG tests (based on the same import-w3c-tests infrastructure). I recently used it to import the CSS Shapes tests, and since it's all or nothing, it imported some new WPT tests as well. There were some new failures, and I don't think automatically adding expectations would have made the task any easier, since I would have had to still triage them and file bugs. One thing to note: on the Blink side, they use a whitelist to determine what parts of the CSSWG and W3C test suites to import. Perhaps if we went that route, it wouldn't seem like we need so many expectations? We could simply not import parts of the suite for things that we don't implement yet, and also could import in parts so that each piece could be triaged separately. Dirk Pranke built a lot of that stuff over there, and may have other suggestions.
youenn fablet
Comment 10
2015-01-06 13:12:43 PST
(In reply to
comment #9
)
> (In reply to
comment #8
) > > Right, manual inspection of the results and augmentation of the generated > > test expectations is needed. > > The tool to ease test results study may well be w3c_conformance_results.html > > from
bug 134766
. > > From a quick look, that does seem like a useful tool. > > > Automatically generating WPT test expectations has some additional benefits, > > like ordering of the tests, clustering of the test suites... > > I think those are outweighed by the detriment that if no one is triaging or > filing bugs, issues just won't get fixed. I'm also worried that a test that > didn't fail before would start failing on import and the person running the > import wouldn't notice.
Ah, maybe the term 'automated' in the title is a bit misleading... The purpose of this script is to help the authoring of test-import patches by developers. It is the responsibility of the developer to do the triage, augment test expectations accordingly and submit the patch. I guess the patch would be accepted if sufficient triage is done. This patch handles the case of partially passing tests: test expectations are added so that there is an incentive to fix them but they are marked as PASS since they should comply with the provided -expected.txt file. It also allows incremental import: (sub)folder after (sub)folder.
> > One important use case is resyncing with the latest version of the WPT repo. > > We should strive to make that effort minimal. This patch is an initial step > > towards that goal. > > I don't think that updating the test expectations file is the hardest part > of the import. Blink has a script that imports both the WPT tests and the > CSSWG tests (based on the same import-w3c-tests infrastructure). I recently > used it to import the CSS Shapes tests, and since it's all or nothing, it > imported some new WPT tests as well. There were some new failures, and I > don't think automatically adding expectations would have made the task any > easier, since I would have had to still triage them and file bugs.
From what I saw a while ago, Blink is not importing a lot of WPT tests right now. Also Blink is not importing any expected file, which simplifies things a bit.
> > One thing to note: on the Blink side, they use a whitelist to determine what > parts of the CSSWG and W3C test suites to import. Perhaps if we went that > route, it wouldn't seem like we need so many expectations? We could simply > not import parts of the suite for things that we don't implement yet, and > also could import in parts so that each piece could be triaged separately. > Dirk Pranke built a lot of that stuff over there, and may have other > suggestions.
To limit adding a lot of test expectation lines, we may only import tests that have/had at least one passing assertion.
Bug 134766
tool can be used to manage the other tests.
Dirk Pranke
Comment 11
2015-01-06 18:37:34 PST
> From what I saw a while ago, Blink is not importing a lot of WPT tests right now.
This is correct.
> Also Blink is not importing any expected file, which simplifies things a bit.
I'm not sure what is meant by "any expected file" in this context? Blink currently imports only reftests and script-only (js) tests, not tests that require pixel tests (manual review). script-only test may still produce partial failures, which can result in -expected.txt results. Are you referring to pixel tests not being imported?
youenn fablet
Comment 12
2015-01-07 04:35:54 PST
(In reply to
comment #11
)
> > From what I saw a while ago, Blink is not importing a lot of WPT tests right now. > > This is correct. > > > Also Blink is not importing any expected file, which simplifies things a bit. > > I'm not sure what is meant by "any expected file" in this context?
I was meaning the -expected.txt results.
> > Blink currently imports only reftests and script-only (js) tests, not tests > that require pixel tests (manual review). script-only test may still produce > partial failures, which can result in -expected.txt results.
Oh, I was not aware of that. That looks better this way. When running the whole test suite, it takes a lot of time due to fully failing (probably timeouting) tests. The current patch is importing them but marking them as Skip. Another approach may be to not import them until they get fixed, either in WPT repo side or in WebKit side.
Bem Jones-Bey
Comment 13
2015-01-07 15:59:15 PST
(In reply to
comment #10
)
> (In reply to
comment #9
) > > (In reply to
comment #8
) > > > Right, manual inspection of the results and augmentation of the generated > > > test expectations is needed. > > > The tool to ease test results study may well be w3c_conformance_results.html > > > from
bug 134766
. > > > > From a quick look, that does seem like a useful tool. > > > > > Automatically generating WPT test expectations has some additional benefits, > > > like ordering of the tests, clustering of the test suites... > > > > I think those are outweighed by the detriment that if no one is triaging or > > filing bugs, issues just won't get fixed. I'm also worried that a test that > > didn't fail before would start failing on import and the person running the > > import wouldn't notice. > > Ah, maybe the term 'automated' in the title is a bit misleading... > > The purpose of this script is to help the authoring of test-import patches > by developers. It is the responsibility of the developer to do the triage, > augment test expectations accordingly and submit the patch. I guess the > patch would be accepted if sufficient triage is done. > > This patch handles the case of partially passing tests: test expectations > are added so that there is an incentive to fix them but they are marked as > PASS since they should comply with the provided -expected.txt file. > > It also allows incremental import: (sub)folder after (sub)folder. >
I like these features. I just don't think that it should edit the test expectations files, I think it should just give the user a report (which could be as simple as listing the lines that would need to be added to the TestExpectations) so that the user can then triage and add the appropriate lines to TestExpectations. It looks like there is a lot of code in this change just to manage intelligently editing the TestExpectations files, and I don't think that's worth the maintenance burden when the user is just going to have to edit them by hand anyways.
> When running the whole test suite, it takes a lot of time due to fully > failing (probably timeouting) tests. The current patch is importing them but > marking them as Skip. Another approach may be to not import them until they > get fixed, either in WPT repo side or in WebKit side.
I think we should do like Blink is doing and import one part at at time, so that those tests can be fixed and made to work before going on to the next part of the suite. I don't think we get much value from having a large number of failing or skipped tests with no obvious owner to make them work.
youenn fablet
Comment 14
2015-01-08 04:46:00 PST
> I like these features. I just don't think that it should edit the test > expectations files, I think it should just give the user a report (which > could be as simple as listing the lines that would need to be added to the > TestExpectations) so that the user can then triage and add the appropriate
This is the default behavior of this patch (generation of a TestExpectationsWPT file in WebKitBuild folder).
> lines to TestExpectations. It looks like there is a lot of code in this > change just to manage intelligently editing the TestExpectations files, and > I don't think that's worth the maintenance burden when the user is just > going to have to edit them by hand anyways.
I think that kind of functionality will be useful in the future. But we need hands-on experience before going further.
> > When running the whole test suite, it takes a lot of time due to fully > > failing (probably timeouting) tests. The current patch is importing them but > > marking them as Skip. Another approach may be to not import them until they > > get fixed, either in WPT repo side or in WebKit side. > > I think we should do like Blink is doing and import one part at at time, so > that those tests can be fixed and made to work before going on to the next > part of the suite. I don't think we get much value from having a large > number of failing or skipped tests with no obvious owner to make them work.
Agreed, a separate whitelist file should be a better way to manage failing tests.
youenn fablet
Comment 15
2015-01-13 07:55:01 PST
Created
attachment 244514
[details]
Adding CSS testsuite support
youenn fablet
Comment 16
2015-01-13 11:41:00 PST
(In reply to
comment #15
)
> Created
attachment 244514
[details]
> Adding CSS testsuite support
As part of
bug 134764
, a whitelist is added to identify which test suites to add, similarly to blink. Another list identifies the infrastructure files and folders that need to be copied and skipped by run-webkit-tests.
WebKit Commit Bot
Comment 17
2016-05-09 19:30:41 PDT
Attachment 244514
[details]
did not pass style-queue: ERROR: Tools/Scripts/webkitpy/w3c/test_updater.py:79: [TestUpdater.test_paths] Class 'TestDownloader' has no 'CSS_NAME' member [pylint/E1101] [5] ERROR: Tools/Scripts/webkitpy/w3c/test_updater.py:79: [TestUpdater.test_paths] Class 'TestDownloader' has no 'WPT_NAME' member [pylint/E1101] [5] Total errors found: 2 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Ryosuke Niwa
Comment 18
2016-05-09 19:37:08 PDT
Comment on
attachment 244514
[details]
Adding CSS testsuite support View in context:
https://bugs.webkit.org/attachment.cgi?id=244514&action=review
> Tools/ChangeLog:11 > + update-w3c-tests is the script to update W3C imported test suites and their baselines. > + TestUpdater is the main class that does the work. > + It is similar in spirit to
https://codereview.chromium.org/148173016
update-w3c-deps script in Blink. > + It uses TestImporter to import the tests and run-webkit-tests to generate the missing baselines.
Instead of adding a new script, we should just add an option to import-w3c-tests.
youenn fablet
Comment 19
2016-06-02 00:00:13 PDT
(In reply to
comment #18
)
> Comment on
attachment 244514
[details]
> Adding CSS testsuite support > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=244514&action=review
> > > Tools/ChangeLog:11 > > + update-w3c-tests is the script to update W3C imported test suites and their baselines. > > + TestUpdater is the main class that does the work. > > + It is similar in spirit to
https://codereview.chromium.org/148173016
update-w3c-deps script in Blink. > > + It uses TestImporter to import the tests and run-webkit-tests to generate the missing baselines. > > Instead of adding a new script, we should just add an option to > import-w3c-tests.
I upgraded import-w3c-tests to support most of the functionality present in this patch. What is missing from this patch is running tests through rwt to rebase the expectations and rerunning the tests to detect flaky tests. I am not sure this would add much though.
Frédéric Wang (:fredw)
Comment 20
2018-06-25 08:48:14 PDT
(In reply to youenn fablet from
comment #12
)
> (In reply to
comment #11
) > > > From what I saw a while ago, Blink is not importing a lot of WPT tests right now. > > > > This is correct. > > > > > Also Blink is not importing any expected file, which simplifies things a bit. > > > > I'm not sure what is meant by "any expected file" in this context? > > I was meaning the -expected.txt results. >
I guess this is
bug 82245
Sam Sneddon [:gsnedders]
Comment 21
2023-12-18 06:35:59 PST
(In reply to youenn fablet from
comment #19
)
> I upgraded import-w3c-tests to support most of the functionality present in > this patch. > What is missing from this patch is running tests through rwt to rebase the > expectations and rerunning the tests to detect flaky tests. > I am not sure this would add much though.
IMO, we're better off moving in the direction of update-test-expectations (originally from
bug 169538
), given that has knowledge of more platforms than just locally. We should make that able to also take a local layout-test-results directory and update from that (
bug 266575
), but at that point all this does is provide a single entry-point, which is probably not too useful when we want to move all these steps into automation.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug