WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED WONTFIX
96398
[Master][Chromium] Bring layout tests on Android on par with other Chromium platforms
https://bugs.webkit.org/show_bug.cgi?id=96398
Summary
[Master][Chromium] Bring layout tests on Android on par with other Chromium p...
Peter Beverloo
Reported
2012-09-11 09:18:40 PDT
There are a lot of failing tests right now, many of which are Android-specific. We should triage them and work towards getting the bot green.
Attachments
Add attachment
proposed patch, testcase, etc.
Xianzhu Wang
Comment 1
2012-09-11 10:07:54 PDT
Can we see the failures online?
Peter Beverloo
Comment 2
2012-09-11 10:10:06 PDT
I linked to them in the specific bugs. The results are available offline, the tests aren't.
Xianzhu Wang
Comment 3
2012-09-11 10:11:24 PDT
Never mind. I think I found the way :)
Peter Beverloo
Comment 4
2012-09-25 10:21:25 PDT
We're not in bad shape actually:
http://peter.sh/files/layout-test-results/results.html
In short: 1816 failing tests and 18 crashes. Most failures are due to: (1) Scrollbars (a) have a difference appearance and (b) overlay the content, and therefore don't take space. (2) Text selection color is a lighter shade of blue. (3) Android uses a lower quality image scaling algorithm, which in some cases is visually worse on desktop, but fine on mobile. As a plan of action for layout tests, I'd like to propose this: - Send an announcement to webkit-dev about 1816 new baselines being added, as it will cause quite some commit/update traffic. - Start uploading new baselines for all "failures" which actually are fine, i.e. fit in the three categories mentioned above. - Keep the bot red until we file bugs for the remaining failures and add them to TestExpectations. After TestWebKitAPI and webkit_unit_tests also work: - Integrate with garden-o-matic to keep us green. Does this seem reasonable? NB: The bot is crashy right now due to a JNI issue, which is fixed per
http://codereview.chromium.org/10980016/
. That just needs to roll into WebKit, which will certainly happen before the rebaselining.
Ojan Vafai
Comment 5
2012-09-25 10:32:02 PDT
This plan sounds good to me. You don't need to email webkit-dev though. Port-specific mass rebaselines happen all the time. Unless there were something unusual about this case, it's not email worthy.
Peter Beverloo
Comment 6
2012-09-25 12:13:33 PDT
Committed
r129538
: <
http://trac.webkit.org/changeset/129538
>
Peter Beverloo
Comment 7
2012-09-25 12:32:37 PDT
(In reply to
comment #5
)
> This plan sounds good to me. You don't need to email webkit-dev though. Port-specific mass rebaselines happen all the time. Unless there were something unusual about this case, it's not email worthy.
Cool, cheers :-). I just landed a first baseline, for the css1/ directory. There weren't any differences except for the different width (for tests with scrollbars) and the scrollbar rendering. Because of the larger rendering area, .txt baselines were necessary as well. Following this first commit, we now have 1722 failing layout tests (minus the crashing ones). I assume it's fine to summarize the description as I did in the commit? It saves a lot of spam on the commit overviews.
http://trac.webkit.org/changeset/129538
I'll work tomorrow to get us closer to 500, making sure that the tester doesn't abort early anymore :-). To confirm: yes, I'm checking every result manually.
Ojan Vafai
Comment 8
2012-09-25 13:30:59 PDT
Unfortunately, I have bad news for you. We bail early after 20 crashes, so we're only running a small percentage of the tests. Looking at
http://build.webkit.org/builders/Chromium%20Android%20Release%20%28Tests%29/builds/770/steps/layout-test/logs/stdio
=> Results: 1180/24298 tests passed (4.9%) You should setup a bot on the chromium webkit canary waterfall. We use much higher limits there before we bail early (e.g. 100 crashes). You'll need a bot there for garden-o-matic anyways. Also, then you could modify garden-o-matic locally so that you can use it for your android rebaselines. (In reply to
comment #7
)
> Cool, cheers :-). I just landed a first baseline, for the css1/ directory. There weren't any differences except for the different width (for tests with scrollbars) and the scrollbar rendering.
>
> I assume it's fine to summarize the description as I did in the commit? It saves a lot of spam on the commit overviews.
Would be nice to include your one sentence description above about what the differences are. Consider someone a year from now trying to understand the history of why a given test has a given baseline. All they have to go on is ChangeLog entries and bug numbers. You don't need per-test descriptions for large rebaselines like this though. Also, we have a loose policy of only doing one patch per bug. It's not super-strict, but it confuses code reviews and the tooling. For unreviewed patches like this, it's OK to just not file a bug. See
http://trac.webkit.org/changeset/128909
as an example. Finally, for any patches where you do hundreds of rebaselines, delete the list of changed files from the ChangeLog entry, as you did, is correct.
Peter Beverloo
Comment 9
2012-09-25 13:37:30 PDT
(In reply to
comment #8
)
> Unfortunately, I have bad news for you. We bail early after 20 crashes, so we're only running a small percentage of the tests. Looking at
http://build.webkit.org/builders/Chromium%20Android%20Release%20%28Tests%29/builds/770/steps/layout-test/logs/stdio
> => Results: 1180/24298 tests passed (4.9%)
As said, that's because of a Chromium issue already fixed (
http://codereview.chromium.org/10980016/
). I'll roll DEPS tomorrow.
> You should setup a bot on the chromium webkit canary waterfall. We use much higher limits there before we bail early (e.g. 100 crashes). You'll need a bot there for garden-o-matic anyways. Also, then you could modify garden-o-matic locally so that you can use it for your android rebaselines.
Hardware has been requested a while ago but is not available yet. I'm aware of this limitation, and will address it as soon as the hardware is there.
> (In reply to
comment #7
) > > Cool, cheers :-). I just landed a first baseline, for the css1/ directory. There weren't any differences except for the different width (for tests with scrollbars) and the scrollbar rendering. > > > > I assume it's fine to summarize the description as I did in the commit? It saves a lot of spam on the commit overviews. > > Would be nice to include your one sentence description above about what the differences are. Consider someone a year from now trying to understand the history of why a given test has a given baseline. All they have to go on is ChangeLog entries and bug numbers. You don't need per-test descriptions for large rebaselines like this though. > > Also, we have a loose policy of only doing one patch per bug. It's not super-strict, but it confuses code reviews and the tooling. > > For unreviewed patches like this, it's OK to just not file a bug. See
http://trac.webkit.org/changeset/128909
as an example.
I chose to divert from the policy as this bug contains that information, and there will be several commits with new baselines. Repeating the same sentence(s) with every batch seems redundant if we can group it here. Given that the patches aren't being uploaded to this bug either, would not mentioning it be preferred? The net difference is (a) a mention of the bug in the description, and (b) a comment such as
comment #6
. One thing I do plan on doing is separately mention baselines which look suspicious at first in the ChangeLog.
Ojan Vafai
Comment 10
2012-09-25 13:50:27 PDT
(In reply to
comment #9
)
> (In reply to
comment #8
) > > Unfortunately, I have bad news for you. We bail early after 20 crashes, so we're only running a small percentage of the tests. Looking at
http://build.webkit.org/builders/Chromium%20Android%20Release%20%28Tests%29/builds/770/steps/layout-test/logs/stdio
> > => Results: 1180/24298 tests passed (4.9%) > > As said, that's because of a Chromium issue already fixed (
http://codereview.chromium.org/10980016/
). I'll roll DEPS tomorrow.
I see. Missed that.
> I chose to divert from the policy as this bug contains that information, and there will be several commits with new baselines. Repeating the same sentence(s) with every batch seems redundant if we can group it here. Given that the patches aren't being uploaded to this bug either, would not mentioning it be preferred? The net difference is (a) a mention of the bug in the description, and (b) a comment such as
comment #6
.
Either one is fine as this is an unusual case.
Peter Beverloo
Comment 11
2012-09-26 11:26:51 PDT
Four commits with rebaselines landed today, bringing the amount of failing tests down to roughly 825. I've uploaded the latest complete results here:
http://peter.sh/files/layout-test-results2/results.html
http://trac.webkit.org/changeset/129618
http://trac.webkit.org/changeset/129619
http://trac.webkit.org/changeset/129642
http://trac.webkit.org/changeset/129659
The tester is now able to run 17,000 tests before aborting, which brings us significantly closer to doing a full run. Unfortunately, cycle time has gotten significantly worse as well: a run now takes 40 minutes, and will probably be an hour when we run everything. I'll be filing bugs for the issues I ran into today. Most notably, there are a lot of compositing failures, some color issues and we have very subtle differences in how we render buttons and select boxes.
Xianzhu Wang
Comment 12
2012-09-26 11:57:29 PDT
(In reply to
comment #11
)
> The tester is now able to run 17,000 tests before aborting, which brings us significantly closer to doing a full run. Unfortunately, cycle time has gotten significantly worse as well: a run now takes 40 minutes, and will probably be an hour when we run everything. >
How many devices are you using to run the layout tests? In downstream, multiple devices get good scalability for shortening the run time.
Peter Beverloo
Comment 13
2012-09-26 12:00:57 PDT
(In reply to
comment #12
)
> How many devices are you using to run the layout tests? In downstream, multiple devices get good scalability for shortening the run time.
Four Galaxy Nexus phones are attached. I imagine a large part of the slower runs is caused by us actually running pixel tests. I'll re-assess the bots' performances after they're able to run all of WebKit's tests. If this proves to be too slow then we'll just buy new devices, that's not a problem.
Xianzhu Wang
Comment 14
2012-09-26 12:05:38 PDT
(In reply to
comment #13
)
> Four Galaxy Nexus phones are attached. I imagine a large part of the slower runs is caused by us actually running pixel tests.
> I think so. We don't run pixel tests in downstream.
Peter Beverloo
Comment 15
2013-04-08 11:06:03 PDT
Resolving as WontFix given that Chromium moved to Blink.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug