134767 – Add support for automated W3C test baseline computation

NEW134767

Add support for automated W3C test baseline computation

https://bugs.webkit.org/show_bug.cgi?id=134767

Summary Add support for automated W3C test baseline computation

youenn fablet

Reported 2014-07-09 07:08:59 PDT

To ease the process of importing W3C tests as WebKit layout tests, it may be convenient to provide a tool that imports W3C tests, generates expected.txt files and related TestExpectation.

Attachments
Patch (13.19 KB, patch) 2014-07-09 07:13 PDT, youenn fablet	no flags	Details Formatted Diff Diff
skipping all tests by default (15.12 KB, patch) 2014-07-16 11:43 PDT, youenn fablet	no flags	Details Formatted Diff Diff
add support for merge of TestExpectations (19.20 KB, patch) 2014-09-16 03:16 PDT, youenn fablet	no flags	Details Formatted Diff Diff
Improving test expectation merge (22.65 KB, patch) 2014-12-23 07:27 PST, youenn fablet	no flags	Details Formatted Diff Diff
Patch (22.60 KB, patch) 2014-12-23 15:39 PST, youenn fablet	no flags	Details Formatted Diff Diff
Adding CSS testsuite support (12.67 KB, patch) 2015-01-13 07:55 PST, youenn fablet	no flags	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.

youenn fablet

Comment 1 2014-07-09 07:13:27 PDT

Created attachment 234636 [details] Patch

youenn fablet

Comment 2 2014-07-16 11:43:52 PDT

Created attachment 235013 [details] skipping all tests by default

youenn fablet

Comment 3 2014-09-16 03:16:00 PDT

Created attachment 238170 [details] add support for merge of TestExpectations

youenn fablet

Comment 4 2014-12-23 07:27:56 PST

Created attachment 243676 [details] Improving test expectation merge

youenn fablet

Comment 5 2014-12-23 15:39:13 PST

Created attachment 243704 [details] Patch

Bem Jones-Bey

Comment 6 2015-01-05 14:08:56 PST

Comment on attachment 243704 [details] Patch I'm not convinced that we want to be automatically adding entries to TestExpectations for imported tests, especially without bugs. Also, often multiple imported tests will fail for the same reason, so it can be good to have a human look at them and intelligently add a single bug for these TestExpectations lines. That being said, I'm all for a tool that makes it easier to quickly see which imported tests have failed an in what way to make it easier to create the test expectations entries and relevant bugs.

Ryosuke Niwa

Comment 7 2015-01-05 14:55:11 PST

(In reply to comment #6) > Comment on attachment 243704 [details] > Patch > > I'm not convinced that we want to be automatically adding entries to > TestExpectations for imported tests, especially without bugs. Also, often > multiple imported tests will fail for the same reason, so it can be good to > have a human look at them and intelligently add a single bug for these > TestExpectations lines. That being said, I'm all for a tool that makes it > easier to quickly see which imported tests have failed an in what way to > make it easier to create the test expectations entries and relevant bugs. I agree. I think having a tool to help humans triage failures is good. Mindlessly adding test expectations for all failing tests is not.

youenn fablet

Comment 8 2015-01-06 01:40:14 PST

(In reply to comment #7) > (In reply to comment #6) > > Comment on attachment 243704 [details] > > Patch > > > > I'm not convinced that we want to be automatically adding entries to > > TestExpectations for imported tests, especially without bugs. Also, often > > multiple imported tests will fail for the same reason, so it can be good to > > have a human look at them and intelligently add a single bug for these > > TestExpectations lines. That being said, I'm all for a tool that makes it > > easier to quickly see which imported tests have failed an in what way to > > make it easier to create the test expectations entries and relevant bugs. > > I agree. I think having a tool to help humans triage failures is good. > Mindlessly adding test expectations for all failing tests is not. Right, manual inspection of the results and augmentation of the generated test expectations is needed. The tool to ease test results study may well be w3c_conformance_results.html from bug 134766. Automatically generating WPT test expectations has some additional benefits, like ordering of the tests, clustering of the test suites... One important use case is resyncing with the latest version of the WPT repo. We should strive to make that effort minimal. This patch is an initial step towards that goal.

Bem Jones-Bey

Comment 9 2015-01-06 08:26:21 PST

(In reply to comment #8) > Right, manual inspection of the results and augmentation of the generated > test expectations is needed. > The tool to ease test results study may well be w3c_conformance_results.html > from bug 134766. From a quick look, that does seem like a useful tool. > Automatically generating WPT test expectations has some additional benefits, > like ordering of the tests, clustering of the test suites... I think those are outweighed by the detriment that if no one is triaging or filing bugs, issues just won't get fixed. I'm also worried that a test that didn't fail before would start failing on import and the person running the import wouldn't notice. > One important use case is resyncing with the latest version of the WPT repo. > We should strive to make that effort minimal. This patch is an initial step > towards that goal. I don't think that updating the test expectations file is the hardest part of the import. Blink has a script that imports both the WPT tests and the CSSWG tests (based on the same import-w3c-tests infrastructure). I recently used it to import the CSS Shapes tests, and since it's all or nothing, it imported some new WPT tests as well. There were some new failures, and I don't think automatically adding expectations would have made the task any easier, since I would have had to still triage them and file bugs. One thing to note: on the Blink side, they use a whitelist to determine what parts of the CSSWG and W3C test suites to import. Perhaps if we went that route, it wouldn't seem like we need so many expectations? We could simply not import parts of the suite for things that we don't implement yet, and also could import in parts so that each piece could be triaged separately. Dirk Pranke built a lot of that stuff over there, and may have other suggestions.

youenn fablet

Comment 10 2015-01-06 13:12:43 PST

(In reply to comment #9) > (In reply to comment #8) > > Right, manual inspection of the results and augmentation of the generated > > test expectations is needed. > > The tool to ease test results study may well be w3c_conformance_results.html > > from bug 134766. > > From a quick look, that does seem like a useful tool. > > > Automatically generating WPT test expectations has some additional benefits, > > like ordering of the tests, clustering of the test suites... > > I think those are outweighed by the detriment that if no one is triaging or > filing bugs, issues just won't get fixed. I'm also worried that a test that > didn't fail before would start failing on import and the person running the > import wouldn't notice. Ah, maybe the term 'automated' in the title is a bit misleading... The purpose of this script is to help the authoring of test-import patches by developers. It is the responsibility of the developer to do the triage, augment test expectations accordingly and submit the patch. I guess the patch would be accepted if sufficient triage is done. This patch handles the case of partially passing tests: test expectations are added so that there is an incentive to fix them but they are marked as PASS since they should comply with the provided -expected.txt file. It also allows incremental import: (sub)folder after (sub)folder. > > One important use case is resyncing with the latest version of the WPT repo. > > We should strive to make that effort minimal. This patch is an initial step > > towards that goal. > > I don't think that updating the test expectations file is the hardest part > of the import. Blink has a script that imports both the WPT tests and the > CSSWG tests (based on the same import-w3c-tests infrastructure). I recently > used it to import the CSS Shapes tests, and since it's all or nothing, it > imported some new WPT tests as well. There were some new failures, and I > don't think automatically adding expectations would have made the task any > easier, since I would have had to still triage them and file bugs. From what I saw a while ago, Blink is not importing a lot of WPT tests right now. Also Blink is not importing any expected file, which simplifies things a bit. > > One thing to note: on the Blink side, they use a whitelist to determine what > parts of the CSSWG and W3C test suites to import. Perhaps if we went that > route, it wouldn't seem like we need so many expectations? We could simply > not import parts of the suite for things that we don't implement yet, and > also could import in parts so that each piece could be triaged separately. > Dirk Pranke built a lot of that stuff over there, and may have other > suggestions. To limit adding a lot of test expectation lines, we may only import tests that have/had at least one passing assertion. Bug 134766 tool can be used to manage the other tests.

Dirk Pranke

Comment 11 2015-01-06 18:37:34 PST

> From what I saw a while ago, Blink is not importing a lot of WPT tests right now. This is correct. > Also Blink is not importing any expected file, which simplifies things a bit. I'm not sure what is meant by "any expected file" in this context? Blink currently imports only reftests and script-only (js) tests, not tests that require pixel tests (manual review). script-only test may still produce partial failures, which can result in -expected.txt results. Are you referring to pixel tests not being imported?

youenn fablet

Comment 12 2015-01-07 04:35:54 PST

(In reply to comment #11) > > From what I saw a while ago, Blink is not importing a lot of WPT tests right now. > > This is correct. > > > Also Blink is not importing any expected file, which simplifies things a bit. > > I'm not sure what is meant by "any expected file" in this context? I was meaning the -expected.txt results. > > Blink currently imports only reftests and script-only (js) tests, not tests > that require pixel tests (manual review). script-only test may still produce > partial failures, which can result in -expected.txt results. Oh, I was not aware of that. That looks better this way. When running the whole test suite, it takes a lot of time due to fully failing (probably timeouting) tests. The current patch is importing them but marking them as Skip. Another approach may be to not import them until they get fixed, either in WPT repo side or in WebKit side.

Bem Jones-Bey

Comment 13 2015-01-07 15:59:15 PST

(In reply to comment #10) > (In reply to comment #9) > > (In reply to comment #8) > > > Right, manual inspection of the results and augmentation of the generated > > > test expectations is needed. > > > The tool to ease test results study may well be w3c_conformance_results.html > > > from bug 134766. > > > > From a quick look, that does seem like a useful tool. > > > > > Automatically generating WPT test expectations has some additional benefits, > > > like ordering of the tests, clustering of the test suites... > > > > I think those are outweighed by the detriment that if no one is triaging or > > filing bugs, issues just won't get fixed. I'm also worried that a test that > > didn't fail before would start failing on import and the person running the > > import wouldn't notice. > > Ah, maybe the term 'automated' in the title is a bit misleading... > > The purpose of this script is to help the authoring of test-import patches > by developers. It is the responsibility of the developer to do the triage, > augment test expectations accordingly and submit the patch. I guess the > patch would be accepted if sufficient triage is done. > > This patch handles the case of partially passing tests: test expectations > are added so that there is an incentive to fix them but they are marked as > PASS since they should comply with the provided -expected.txt file. > > It also allows incremental import: (sub)folder after (sub)folder. > I like these features. I just don't think that it should edit the test expectations files, I think it should just give the user a report (which could be as simple as listing the lines that would need to be added to the TestExpectations) so that the user can then triage and add the appropriate lines to TestExpectations. It looks like there is a lot of code in this change just to manage intelligently editing the TestExpectations files, and I don't think that's worth the maintenance burden when the user is just going to have to edit them by hand anyways. > When running the whole test suite, it takes a lot of time due to fully > failing (probably timeouting) tests. The current patch is importing them but > marking them as Skip. Another approach may be to not import them until they > get fixed, either in WPT repo side or in WebKit side. I think we should do like Blink is doing and import one part at at time, so that those tests can be fixed and made to work before going on to the next part of the suite. I don't think we get much value from having a large number of failing or skipped tests with no obvious owner to make them work.

youenn fablet

Comment 14 2015-01-08 04:46:00 PST

> I like these features. I just don't think that it should edit the test > expectations files, I think it should just give the user a report (which > could be as simple as listing the lines that would need to be added to the > TestExpectations) so that the user can then triage and add the appropriate This is the default behavior of this patch (generation of a TestExpectationsWPT file in WebKitBuild folder). > lines to TestExpectations. It looks like there is a lot of code in this > change just to manage intelligently editing the TestExpectations files, and > I don't think that's worth the maintenance burden when the user is just > going to have to edit them by hand anyways. I think that kind of functionality will be useful in the future. But we need hands-on experience before going further. > > When running the whole test suite, it takes a lot of time due to fully > > failing (probably timeouting) tests. The current patch is importing them but > > marking them as Skip. Another approach may be to not import them until they > > get fixed, either in WPT repo side or in WebKit side. > > I think we should do like Blink is doing and import one part at at time, so > that those tests can be fixed and made to work before going on to the next > part of the suite. I don't think we get much value from having a large > number of failing or skipped tests with no obvious owner to make them work. Agreed, a separate whitelist file should be a better way to manage failing tests.

youenn fablet

Comment 15 2015-01-13 07:55:01 PST

Created attachment 244514 [details] Adding CSS testsuite support

youenn fablet

Comment 16 2015-01-13 11:41:00 PST

(In reply to comment #15) > Created attachment 244514 [details] > Adding CSS testsuite support As part of bug 134764, a whitelist is added to identify which test suites to add, similarly to blink. Another list identifies the infrastructure files and folders that need to be copied and skipped by run-webkit-tests.

WebKit Commit Bot

Comment 17 2016-05-09 19:30:41 PDT

Attachment 244514 [details] did not pass style-queue: ERROR: Tools/Scripts/webkitpy/w3c/test_updater.py:79: [TestUpdater.test_paths] Class 'TestDownloader' has no 'CSS_NAME' member [pylint/E1101] [5] ERROR: Tools/Scripts/webkitpy/w3c/test_updater.py:79: [TestUpdater.test_paths] Class 'TestDownloader' has no 'WPT_NAME' member [pylint/E1101] [5] Total errors found: 2 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.

Ryosuke Niwa

Comment 18 2016-05-09 19:37:08 PDT

Comment on attachment 244514 [details] Adding CSS testsuite support View in context: https://bugs.webkit.org/attachment.cgi?id=244514&action=review > Tools/ChangeLog:11 > + update-w3c-tests is the script to update W3C imported test suites and their baselines. > + TestUpdater is the main class that does the work. > + It is similar in spirit to https://codereview.chromium.org/148173016 update-w3c-deps script in Blink. > + It uses TestImporter to import the tests and run-webkit-tests to generate the missing baselines. Instead of adding a new script, we should just add an option to import-w3c-tests.

youenn fablet

Comment 19 2016-06-02 00:00:13 PDT

(In reply to comment #18) > Comment on attachment 244514 [details] > Adding CSS testsuite support > > View in context: > https://bugs.webkit.org/attachment.cgi?id=244514&action=review > > > Tools/ChangeLog:11 > > + update-w3c-tests is the script to update W3C imported test suites and their baselines. > > + TestUpdater is the main class that does the work. > > + It is similar in spirit to https://codereview.chromium.org/148173016 update-w3c-deps script in Blink. > > + It uses TestImporter to import the tests and run-webkit-tests to generate the missing baselines. > > Instead of adding a new script, we should just add an option to > import-w3c-tests. I upgraded import-w3c-tests to support most of the functionality present in this patch. What is missing from this patch is running tests through rwt to rebase the expectations and rerunning the tests to detect flaky tests. I am not sure this would add much though.

Frédéric Wang (:fredw)

Comment 20 2018-06-25 08:48:14 PDT

(In reply to youenn fablet from comment #12) > (In reply to comment #11) > > > From what I saw a while ago, Blink is not importing a lot of WPT tests right now. > > > > This is correct. > > > > > Also Blink is not importing any expected file, which simplifies things a bit. > > > > I'm not sure what is meant by "any expected file" in this context? > > I was meaning the -expected.txt results. > I guess this is bug 82245

Sam Sneddon [:gsnedders]

Comment 21 2023-12-18 06:35:59 PST

(In reply to youenn fablet from comment #19) > I upgraded import-w3c-tests to support most of the functionality present in > this patch. > What is missing from this patch is running tests through rwt to rebase the > expectations and rerunning the tests to detect flaky tests. > I am not sure this would add much though. IMO, we're better off moving in the direction of update-test-expectations (originally from bug 169538), given that has knowledge of more platforms than just locally. We should make that able to also take a local layout-test-results directory and update from that (bug 266575), but at that point all this does is provide a single entry-point, which is probably not too useful when we want to move all these steps into automation.

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

youenn fablet

Reported

2014-07-09 07:08 PDT

Modified

2023-12-18 06:35 PST History

CC List

11 users Show

URL

Keywords

Depends on

134766

Blocks

Dependencies

tree graph