RESOLVED FIXED 88134
new-run-webkit-tests should spin-up enough httpd processes to avoid timeouts
https://bugs.webkit.org/show_bug.cgi?id=88134
Summary new-run-webkit-tests should spin-up enough httpd processes to avoid timeouts
Ami Fischman
Reported 2012-06-01 14:12:17 PDT
I noticed recently that on my z600/chromium/linux/16-core setup if I run a test (e.g. http/tests/media/video-buffered.html) with --iterations=100 (or 1000), the first 1-3 runs pass, then the next 13-15 runs timeout, and then the rest pass. If I bump the {{Min,Max}Spare},Start}Servers to 16 in http://trac.webkit.org/browser/trunk/LayoutTests/http/conf/apache2-debian-httpd.conf then I get a 100% pass rate. My guess is that spinning up the new apache processes is eating too much of the test's timeout budget, but once they're spun up the test does fine. Is there a reason not to always run with extra processes? (what's the minimum machine footprint n-r-w-t is expected to work on?) IWBN if n-r-w-t could configure its httpd to run as many apache processes as DRT processes, but absent that kind of setup, would bumping the counts to 32 (or something else likely to be >= #cores on dev machines) be acceptable?
Attachments
Patch (11.44 KB, patch)
2012-06-18 12:45 PDT, Dirk Pranke
no flags
add missing unittest file (14.99 KB, patch)
2012-06-18 13:04 PDT, Dirk Pranke
no flags
patch for landing (16.53 KB, patch)
2012-06-20 11:59 PDT, Dirk Pranke
no flags
Ojan Vafai
Comment 1 2012-06-01 14:17:24 PDT
Apache is started from Tools/Scripts/webkitpy/layout_tests/servers/apache_http_server.py. If you can set {{Min,Max}Spare},Start}Servers from the apache commandline, then we could make it use the same number of processes as DRT processes. Seems worth a try. (In reply to comment #0) > My guess is that spinning up the new apache processes is eating too much of the test's timeout budget, but once they're spun up the test does fine. If we're including the time to start apache in the test's timeout budget that's also just a bug we should fix. Dirk, WDYT?
Ami Fischman
Comment 2 2012-06-01 14:22:03 PDT
(In reply to comment #1) > Apache is started from Tools/Scripts/webkitpy/layout_tests/servers/apache_http_server.py. If you can set {{Min,Max}Spare},Start}Servers from the apache commandline, then we could make it use the same number of processes as DRT processes. > Seems worth a try. Configuration directives can be passed on the apache2 command-line using -c (to override the httpd.conf) or -C (to allow httpd.conf to override cmdline).
Ojan Vafai
Comment 3 2012-06-01 14:22:42 PDT
> (In reply to comment #0) > > My guess is that spinning up the new apache processes is eating too much of the test's timeout budget, but once they're spun up the test does fine. > > If we're including the time to start apache in the test's timeout budget that's also just a bug we should fix. At a quick glance at the code, we wait and check that the http server is correctly started up and responding to requests before we try to run any http tests, so I doubt this is what's going on. Unless our logic for testing whether the server is up and running is wrong. See _is_server_running_on_all_ports in Tools/Scripts/webkitpy/layout_tests/servers/http_server_base.py to see that code. It seems to me that connecting to the server via a socket should be a sufficient test, but maybe it's not.
Ami Fischman
Comment 4 2012-06-01 14:26:21 PDT
(In reply to comment #3) > > (In reply to comment #0) > > > My guess is that spinning up the new apache processes is eating too much of the test's timeout budget, but once they're spun up the test does fine. > > > > If we're including the time to start apache in the test's timeout budget that's also just a bug we should fix. > > At a quick glance at the code, we wait and check that the http server is correctly started up and responding to requests before we try to run any http tests, so I doubt this is what's going on. > > Unless our logic for testing whether the server is up and running is wrong. See _is_server_running_on_all_ports in Tools/Scripts/webkitpy/layout_tests/servers/http_server_base.py to see that code. It seems to me that connecting to the server via a socket should be a sufficient test, but maybe it's not. The httpd.conf only requests a single httpd process to be started initially. Apache uses a whole process ("Server" in httpd.conf-speak) to serve a request, and the only parallelism it supports is multi-process (one process per request). So the existing conf requests a single process to start up, and the controller declares it started up when it can connect to the socket, which is all well and good, but as soon as the other 15 DRTs pile on their own requests, apache has to spin up 15 more httpd processes. It's this spin-up that is currently (incorrectly) counted against the test's timeout budget.
Ojan Vafai
Comment 5 2012-06-01 14:36:43 PDT
I didn't know that! That would clearly lead to flaky http tests! Please put together a patch (plus unittests) if you're willing.
Ojan Vafai
Comment 6 2012-06-01 14:37:44 PDT
This might even mean we could switch windows back to using Apache instead of lighttpd. Last time we tried to make that switch, it made the http tests more flaky. Would be great to get rid of the lighttpd configuration entirely (only chromium-win uses it now, we moved chromium-mac/linux back to Apache ages ago).
Tony Chang
Comment 7 2012-06-01 14:38:56 PDT
I thought we ran http tests in serial? Anyway, I think it would be good to bump the StartServers value to somewhere around 4 range and bump the MinSpareServers to a similar value. It's probably common for a test to load 1 or 2 extra resources or make a few concurrent requests. Do we actually get as high as 32 requests at once on a 32 core machine? It would be nice if we could get Apache to log information about how many requests it's handling and pick a reasonable number based on that.
Dirk Pranke
Comment 8 2012-06-01 15:05:26 PDT
I think it makes sense to bump up StartServers to some smallish N * floor(--max-locked-shards, --child-processes) (since that controls the number of concurrent HTTP tests), since I agree it's likely there is more than one http request in flight per test. I have definitely seen the initial timeout issues on Linux (although, oddly, only on Linux). If that would solve the timeout issues then perhaps we could also increase the default -max-locked-shards beyond one, since that would be a big win on the bigger multicore machines. I will play around with some approaches and post something.
Ojan Vafai
Comment 9 2012-06-01 15:12:38 PDT
(In reply to comment #8) > I think it makes sense to bump up StartServers to some smallish N * floor(--max-locked-shards, --child-processes) (since that controls the number of concurrent HTTP tests), since I agree it's likely there is more than one http request in flight per test. I have definitely seen the initial timeout issues on Linux (although, oddly, only on Linux). If that would solve the timeout issues then perhaps we could also increase the default -max-locked-shards beyond one, since that would be a big win on the bigger multicore machines. > > I will play around with some approaches and post something. SGTM. Tony also mentioned that we could try running all the http tests with max-locked-shards=1 and see how many processes Apache starts up and set N to that (or maybe to that + 1 or 2).
Ami Fischman
Comment 10 2012-06-01 15:15:23 PDT
(In reply to comment #7) > I thought we ran http tests in serial? Nope; access_log confirms lots of requests coming in in the same second (for the same file). > Anyway, I think it would be good to bump the StartServers value to somewhere around 4 range and bump the MinSpareServers to a similar value. It's probably common for a test to load 1 or 2 extra resources or make a few concurrent requests. That's a good point - 2x#cores SGTM. > Do we actually get as high as 32 requests at once on a 32 core machine? It would be nice if we could get Apache to log information about how many requests it's handling and pick a reasonable number based on that. It's not easy to get that information from a conf change, but looking at the timestamps on the access log definitely confirms that it's serving a lot of concurrent DRTs (the test I pointed to above, video-buffered.html, uses the throttling CGI so each request ends up taking more than a second to serve, so I know the second-resolution timestamp is good enough to make this inference). (In reply to comment #8) > I think it makes sense to bump up StartServers Note that the other directives should also be bumped, or else you'll get reaping. > to some smallish N * floor(--max-locked-shards, --child-processes) OOC, what's a "locked shard"? I'm esp. curious because here: > If that would solve the timeout issues then perhaps we could also increase the default -max-locked-shards beyond one, since that would be a big win on the bigger multicore machines. I read you to say that max-locked-shards defaults to 1 on my machine, and that that means HTTP tests should run serially, but they clearly don't. > I will play around with some approaches and post something. Thanks!
Ojan Vafai
Comment 11 2012-06-01 15:19:16 PDT
(In reply to comment #10) > (In reply to comment #7) > > I thought we ran http tests in serial? > > Nope; access_log confirms lots of requests coming in in the same second (for the same file). Are these all sub-resources? In theory, we run the http tests serially, but each test can load sub-resources in parallel.
Ojan Vafai
Comment 12 2012-06-01 15:20:02 PDT
> In theory, we run the http tests serially, but each test can load sub-resources in parallel. We've love to run the http tests in parallel as well, but historically everytime we've tried it's increased flakiness. Upping the number of processes might just be what we need to make it work.
Ami Fischman
Comment 13 2012-06-01 15:23:28 PDT
(In reply to comment #11) > (In reply to comment #10) > > (In reply to comment #7) > > > I thought we ran http tests in serial? > > > > Nope; access_log confirms lots of requests coming in in the same second (for the same file). > > Are these all sub-resources? > > In theory, we run the http tests serially, but each test can load sub-resources in parallel. Each test loads a single .wav file through the throttle CGI. I think this proves that multiple instances of the test are running in parallel: 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 127.0.0.1 - - [01/Jun/2012:14:59:46 -0700] "GET /media/video-throttled-load.cgi?nph=1&name=../../../media/content/test.wav&throttle=100&type=audio/wav HTTP/1.1" 206 121138 Perhaps the distinction is that I'm running a single test with --iterations=100, and that if I was running 100 distinct tests they'd be serialized? (and that the lack of serialization is considered a bug?)
Dirk Pranke
Comment 14 2012-06-01 15:26:04 PDT
(In reply to comment #10) > > to some smallish N * floor(--max-locked-shards, --child-processes) > > OOC, what's a "locked shard"? I'm esp. curious because here: > a "locked shard" is a shard that contains tests that require us to have the "global lock", meaning that only one NRWT can be running those tests at once (that's one manager, not one child process / DRT). This is designed to accomodate people like the Qt port that actually run multiple bots at a time on the same hardware. That said, we also default to only one locked shard running at a time, and only one run one test at a time per shard. So, there should only be one HTTP test running at a time. It is certainly possible that we're issuing multiple requests per test, and it's possible that since we're running multiple tests per second you could get multiple requests for the same subresources in the same second. Note that if you run with -f/--experimental-fully-parallel, that overrides the default calculation, so, yeah, we'd be getting multiple http tests simultaneously, but you didn't mention that flag before. Failing that, running with --verbose (or at least --print config) will tell you how many locked shards will run concurrently. If you have access logs showing something other than one test being run concurrently with locked shards == 1, I would be highly curious. That shouldn't be possible. (I'd be happy to look at / interpret / analyze the logs regardless). --iterations shouldn't affect anything I'm talking about. Do you by chance have (or can make) an excerpt from the --verbose logging for the same time period of the testing?
Ami Fischman
Comment 15 2012-06-01 15:37:23 PDT
(In reply to comment #14) > Note that if you run with -f/--experimental-fully-parallel, that overrides the default calculation, so, yeah, we'd be getting multiple http tests simultaneously, but you didn't mention that flag before. Bingo. I'm using -f. Let me know if you want me to generate any logs (assuming not since -f explains mismatch; 100 locked shards).
Ojan Vafai
Comment 16 2012-06-01 15:42:14 PDT
(In reply to comment #15) > (In reply to comment #14) > > Note that if you run with -f/--experimental-fully-parallel, that overrides the default calculation, so, yeah, we'd be getting multiple http tests simultaneously, but you didn't mention that flag before. > > Bingo. I'm using -f. > Let me know if you want me to generate any logs (assuming not since -f explains mismatch; 100 locked shards). Yes, -f runs everything in parallel, but increases flakiness. If we can get the http tests non-flaky with -f, then we can definitely start running them in parallel.
Ami Fischman
Comment 17 2012-06-01 15:45:39 PDT
(In reply to comment #16) > Yes, -f runs everything in parallel, but increases flakiness. If we can get the http tests non-flaky with -f, then we can definitely start running them in parallel. FWIW I always run -f and this issue (#servers) is the only consistent source of flake I see, I think. Although I do tend to restrict myself to media/ http/tests/media/ fast/canvas/webgl/
Dirk Pranke
Comment 18 2012-06-01 16:10:02 PDT
(In reply to comment #17) > (In reply to comment #16) > > Yes, -f runs everything in parallel, but increases flakiness. If we can get the http tests non-flaky with -f, then we can definitely start running them in parallel. > > FWIW I always run -f and this issue (#servers) is the only consistent source of flake I see, I think. > Although I do tend to restrict myself to media/ http/tests/media/ fast/canvas/webgl/ There's flakiness elsewhere :). Yeah, no need to add more logs.
Tony Chang
Comment 19 2012-06-01 16:23:36 PDT
I tried 'new-run-webkit-tests -f http' with MinSpareServers 300, StartServers 300 and MaxSpareServers 300 and I get about the same flakiness as with the current values (~20-22 unexpected failures). Note that some of these failures are against the websocket server, which is separate. I'm on running on Lucid and I have 16 DRT processes. I wonder if --iterations=100 is causing some sort of unrelated flakiness.
Ojan Vafai
Comment 20 2012-06-01 16:39:39 PDT
(In reply to comment #19) > I tried 'new-run-webkit-tests -f http' with MinSpareServers 300, StartServers 300 and MaxSpareServers 300 and I get about the same flakiness as with the current values (~20-22 unexpected failures). Note that some of these failures are against the websocket server, which is separate. > > I'm on running on Lucid and I have 16 DRT processes. > > I wonder if --iterations=100 is causing some sort of unrelated flakiness. It's hard to evaluate what this means. You don't have data on the long tail of http tests that we have marked as flaky in test_expectations.txt.
Ami Fischman
Comment 21 2012-06-01 16:49:30 PDT
(In reply to comment #20) > (In reply to comment #19) > > I tried 'new-run-webkit-tests -f http' with MinSpareServers 300, StartServers 300 and MaxSpareServers 300 and I get about the same flakiness as with the current values (~20-22 unexpected failures). Note that some of these failures are against the websocket server, which is separate. > > > > I'm on running on Lucid and I have 16 DRT processes. > > > > I wonder if --iterations=100 is causing some sort of unrelated flakiness. > > It's hard to evaluate what this means. You don't have data on the long tail of http tests that we have marked as flaky in test_expectations.txt. For concreteness, this is my cmdline: ./Tools/Scripts/new-run-webkit-tests -f --no-retry --exit-after-n-crashes-or-timeouts=1 --iterations=100 -f --results-directory=/tmp/x2 http/tests/media/video-buffered.html With default *Servers configuration I see 13-15 failures (all up-front). With *Servers set to 16 I see no failures. Consistently on both counts. (note webkit has to be at or beyond r119268 for that particular test to work on the chromium port)
Ami Fischman
Comment 22 2012-06-10 13:19:21 PDT
(In reply to comment #8) > I will play around with some approaches and post something. dpranke: any updates on this? Are you still planning to post a patch?
Dirk Pranke
Comment 23 2012-06-10 15:58:43 PDT
(In reply to comment #22) > (In reply to comment #8) > > I will play around with some approaches and post something. > > dpranke: any updates on this? Are you still planning to post a patch? It's still in my to-do queue, but has been getting bumped by other things ... if you or someone else wanted to take a stab at it, that would be great.
Dirk Pranke
Comment 24 2012-06-18 12:45:54 PDT
Dirk Pranke
Comment 25 2012-06-18 13:04:20 PDT
Created attachment 148155 [details] add missing unittest file
Tony Chang
Comment 26 2012-06-18 13:50:03 PDT
Dirk, were you able to repro the improvements that Ami saw? I would like to understand why I'm not seeing any differences.
Dirk Pranke
Comment 27 2012-06-18 14:34:12 PDT
(In reply to comment #26) > Dirk, were you able to repro the improvements that Ami saw? I would like to understand why I'm not seeing any differences. I haven't actually tried to reproduce his results. The change seemed like the right thing to do (or at least conservatively more correct) regardless. I can bang on it a bit on linux and report if I see any real differences if you like ...
Tony Chang
Comment 28 2012-06-18 14:38:14 PDT
(In reply to comment #27) > (In reply to comment #26) > > Dirk, were you able to repro the improvements that Ami saw? I would like to understand why I'm not seeing any differences. > > I haven't actually tried to reproduce his results. The change seemed like the right thing to do (or at least conservatively more correct) regardless. Maybe. I'm hesitant to add code and complexity if it doesn't improve behavior.
Dirk Pranke
Comment 29 2012-06-18 19:41:21 PDT
Ami, can you try this patch and see if it makes any difference for you?
Ami Fischman
Comment 30 2012-06-19 17:30:29 PDT
(In reply to comment #29) > Ami, can you try this patch and see if it makes any difference for you? It does, but it needs a bit of finessing to see the difference. For example, modify http/tests/media/video-cancel-load.html to change s/throttle=40/throttle=4/ and run: ./Tools/Scripts/new-run-webkit-tests -f --debug --results-directory=/tmp/x2 --iterations=100 http/tests/media/video-cancel-load.html Without the patch I get 15/100 failures. With the patch, I get no failures. Alternatively, without the patch and reverting the above test to ToT, running: ./Tools/Scripts/new-run-webkit-tests --time-out-ms=6000 -f --debug --results-directory=/tmp/x2 --iterations=100 http/tests/media/video-cancel-load.html sees 15/100 failures, but applying the patch makes the ALL the iterations pass. Basically what this comes down to is that spinning up the extra httpd's takes non-trivial time. Counting that time against a particular test's execution is "unfair" and causes flakiness. Spinning up the httpd's in advance makes the time each test gets to run more predictable.
Tony Chang
Comment 31 2012-06-20 11:25:13 PDT
Comment on attachment 148155 [details] add missing unittest file View in context: https://bugs.webkit.org/attachment.cgi?id=148155&action=review I was able to repro Ami's results on my machine. I think the key was to use a debug build of DRT. On an unrelated note, new-run-webkit-httpd is broken (it always uses lighttpd), but that's a separate bug. > Tools/Scripts/webkitpy/common/system/executive_mock.py:40 > - def __init__(self, stdout='MOCK STDOUT\n'): > + def __init__(self, stdout='MOCK STDOUT\n', stderr=''): > self.pid = 42 > self.stdout = StringIO.StringIO(stdout) > + self.stderr = StringIO.StringIO(stderr) Why is this change necessary? Can you mention it in the ChangeLog? > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:763 > + self.start_servers_with_lock(num_servers=2 * min(num_workers, len(locked_shards))) Nit: It's weird to me to name the param when it's not an optional param. It you want to make it clear what this is for, you could make num_servers a local var. > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:972 > self._printer.print_update('Starting HTTP server ...') > - self._port.start_http_server() > + self._port.start_http_server(num_servers=num_servers) > self._printer.print_update('Starting WebSocket server ...') You probably need to rebase. > Tools/Scripts/webkitpy/layout_tests/port/base.py:790 > + def start_http_server(self, additional_dirs=None, num_servers=None): Nit: num_servers -> number_of_servers > Tools/Scripts/webkitpy/layout_tests/servers/apache_http_server_unittest.py:31 > +import unittest > +import re > +import sys Nit: sort. > Tools/Scripts/webkitpy/layout_tests/servers/apache_http_server_unittest.py:35 > +from webkitpy.common.system.outputcapture import OutputCapture > +from webkitpy.common.host_mock import MockHost Nit: sort.
Dirk Pranke
Comment 32 2012-06-20 11:45:41 PDT
Comment on attachment 148155 [details] add missing unittest file View in context: https://bugs.webkit.org/attachment.cgi?id=148155&action=review >> Tools/Scripts/webkitpy/common/system/executive_mock.py:40 >> + self.stderr = StringIO.StringIO(stderr) > > Why is this change necessary? Can you mention it in the ChangeLog? We need this because we try to read stderr when starting httpd in apache_http_server.py. I will add a comment. >> Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:763 >> + self.start_servers_with_lock(num_servers=2 * min(num_workers, len(locked_shards))) > > Nit: It's weird to me to name the param when it's not an optional param. It you want to make it clear what this is for, you could make num_servers a local var. Will fix. >> Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:972 >> self._printer.print_update('Starting WebSocket server ...') > > You probably need to rebase. Yeah, will do. >> Tools/Scripts/webkitpy/layout_tests/port/base.py:790 >> + def start_http_server(self, additional_dirs=None, num_servers=None): > > Nit: num_servers -> number_of_servers Will fix. >> Tools/Scripts/webkitpy/layout_tests/servers/apache_http_server_unittest.py:31 >> +import sys > > Nit: sort. will fix. >> Tools/Scripts/webkitpy/layout_tests/servers/apache_http_server_unittest.py:35 >> +from webkitpy.common.host_mock import MockHost > > Nit: sort. will fix.
Dirk Pranke
Comment 33 2012-06-20 11:59:51 PDT
Created attachment 148620 [details] patch for landing
Dirk Pranke
Comment 34 2012-06-20 12:01:11 PDT
Note You need to log in before you can comment on or make changes to this bug.