81603 – webkitpy: clean up actually getting crash logs from DRT/WTR crashes

Dirk Pranke

Reported 2012-03-19 17:46:14 PDT

currently the logic is scattered between different files and doesn't actually handle WebProcess crashes properly.

Dirk Pranke

Comment 1 2012-03-19 17:47:42 PDT

Created attachment 132735 [details] Patch

Dirk Pranke

Comment 2 2012-03-19 17:49:37 PDT

Created attachment 132736 [details] tweak changelog slightly

Adam Roben (:aroben)

Comment 3 2012-03-20 12:12:04 PDT

Comment on attachment 132736 [details] tweak changelog slightly View in context: https://bugs.webkit.org/attachment.cgi?id=132736&action=review > Tools/ChangeLog:15 > + Mac will now slow-spin with a timeout waiting for the crash log, rather > + than waiting for the ReportCrash process to exit (it appears > + that the mac will manage multiple crashes with a single > + ReportCrash process, the process waits around longer than > + necessary, presumably to avoid thrashing if processes are > + repeatedly crashing). Seems like we could optimize this by just keeping track of all the process name/pid pairs that crashed during the test run, and only trying to retrieve crash logs for them at the very end of the run. Then in the common case ReportCrash would finish its business without us ever having to wait on it. We'd only have to wait for tests that crashed very near the end of the test run.

Dirk Pranke

Comment 4 2012-03-20 13:02:32 PDT

Comment on attachment 132736 [details] tweak changelog slightly View in context: https://bugs.webkit.org/attachment.cgi?id=132736&action=review >> Tools/ChangeLog:15 >> + repeatedly crashing). > > Seems like we could optimize this by just keeping track of all the process name/pid pairs that crashed during the test run, and only trying to retrieve crash logs for them at the very end of the run. Then in the common case ReportCrash would finish its business without us ever having to wait on it. We'd only have to wait for tests that crashed very near the end of the test run. Perhaps. I think it's a little conceptually clearer to return the crash as part of the test output itself. In the normal case, most of the waiting happens before we even get notified in the select() that the process has crashed; the only time we need to wait longer is when lots of things are crashing at once. I'd prefer to leave things as-is until we see a need to do another pass over things.

Adam Barth

Comment 5 2012-03-21 14:29:32 PDT

Comment on attachment 132736 [details] tweak changelog slightly View in context: https://bugs.webkit.org/attachment.cgi?id=132736&action=review Ok. I kind of like aroben's idea of waiting until the end. I'm not sure whether that's going to make things go significantly faster in practice, but it might be a good refinement to experiment with. > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:611 > return DriverOutput(text, output_image, actual_checksum, audio=audio_bytes, > - crash=crash, crashed_process_name=crashed_process_name, test_time=run_time, timeout=timeout, error=error) > + crash=crash, crashed_process_name=crashed_process_name, crashed_pid=crashed_pid, crash_log=crash_log, > + test_time=run_time, timeout=timeout, error=error) This line of code is getting kind of out of control! > Tools/Scripts/webkitpy/layout_tests/port/mac.py:169 > + CRASH_LOG_WAIT_TIMEOUT_SECS = 5 * int(self.get_option('child_processes')) Is there a reason CRASH_LOG_WAIT_TIMEOUT_SECS is in all caps? Does get_option return the actual number of child processes we're using (e.g., if we have memory limitations)?

Dirk Pranke

Comment 6 2012-03-21 14:37:53 PDT

(In reply to comment #5) > (From update of attachment 132736 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=132736&action=review > > Ok. I kind of like aroben's idea of waiting until the end. I'm not sure whether that's going to make things go significantly faster in practice, but it might be a good refinement to experiment with. > > > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:611 > > return DriverOutput(text, output_image, actual_checksum, audio=audio_bytes, > > - crash=crash, crashed_process_name=crashed_process_name, test_time=run_time, timeout=timeout, error=error) > > + crash=crash, crashed_process_name=crashed_process_name, crashed_pid=crashed_pid, crash_log=crash_log, > > + test_time=run_time, timeout=timeout, error=error) > > This line of code is getting kind of out of control! > Yeah ... I'm not sure that the named arguments are really needed, either. > > Tools/Scripts/webkitpy/layout_tests/port/mac.py:169 > > + CRASH_LOG_WAIT_TIMEOUT_SECS = 5 * int(self.get_option('child_processes')) > > Is there a reason CRASH_LOG_WAIT_TIMEOUT_SECS is in all caps? Hm. No; it was a constant at one point. Will fix. > Does get_option return the actual number of child processes we're using (e.g., if we have memory limitations)? It represents the max number of child processes, so it respects the memory limitations. However, we may have fewer children actually running (either because we didn't need that many, or because the others have already finished). Getting the number actually running would be difficult if not impossible since it depends on what is going on in other processes.

Dirk Pranke

Comment 7 2012-03-21 14:41:32 PDT

(In reply to comment #5) > (From update of attachment 132736 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=132736&action=review > > Ok. I kind of like aroben's idea of waiting until the end. I'm not sure whether that's going to make things go significantly faster in practice, but it might be a good refinement to experiment with. > Another downside to this is that currently *all* of the "handle test failure and put stuff into layout-test-results" is done in the worker/child processes, in parallel. I wouldn't really like to split that logic up, and pushing that into the manager to the end could end up slowing things down (would have to code it to see). There is a certain appeal to only having the manager process writing into layout-test-results, though.

Dirk Pranke

Comment 8 2012-03-21 15:21:44 PDT

Created attachment 133125 [details] patch for landing, updated w/ review feedback

Dirk Pranke

Comment 9 2012-03-21 15:24:52 PDT

Committed r111609: <http://trac.webkit.org/changeset/111609>

Csaba Osztrogonác

Comment 10 2012-03-22 01:05:47 PDT

(In reply to comment #9) > Committed r111609: <http://trac.webkit.org/changeset/111609> Reopen, because it broke Qt WK2 tester bot: http://build.webkit.sed.hu/builders/x86-32%20Linux%20Qt%20Release%20WebKit2/builds/21876/steps/layout-test/logs/stdio Could you check what happened?

Csaba Osztrogonác

Comment 11 2012-03-22 01:21:24 PDT

I rolled it out to unbreak Qt WK2 tester bot - http://trac.webkit.org/changeset/111661 (with accidentally change in r111615 too)

Dirk Pranke

Comment 12 2012-03-22 10:00:28 PDT

(In reply to comment #11) > I rolled it out to unbreak Qt WK2 tester bot - http://trac.webkit.org/changeset/111661 (with accidentally change in r111615 too) Argh. This was fixed in r111615 ... it was just the ChangeLog that was wrong :(. Starting with build 21879 on that bot, though, there appears to be another, different bug. I will look into that one as well.

Philippe Normand

Comment 13 2012-03-22 10:22:09 PDT

(In reply to comment #12) > (In reply to comment #11) > > I rolled it out to unbreak Qt WK2 tester bot - http://trac.webkit.org/changeset/111661 (with accidentally change in r111615 too) > > Argh. This was fixed in r111615 ... it was just the ChangeLog that was wrong :(. > > Starting with build 21879 on that bot, though, there appears to be another, different bug. I will look into that one as well. '\n'.join('STDERR: ' + line for line in stderr.decode('utf8').splitlines()) if stderr else '<empty>') should be: '\n'.join(['STDERR: '] + line for line in stderr.decode('utf8').splitlines()) if stderr else '<empty>') I think.

Dirk Pranke

Comment 14 2012-03-22 11:21:12 PDT

(In reply to comment #13) > (In reply to comment #12) > > (In reply to comment #11) > > > I rolled it out to unbreak Qt WK2 tester bot - http://trac.webkit.org/changeset/111661 (with accidentally change in r111615 too) > > > > Argh. This was fixed in r111615 ... it was just the ChangeLog that was wrong :(. > > > > Starting with build 21879 on that bot, though, there appears to be another, different bug. I will look into that one as well. > > '\n'.join('STDERR: ' + line for line in stderr.decode('utf8').splitlines()) if stderr else '<empty>') > > should be: > > > '\n'.join(['STDERR: '] + line for line in stderr.decode('utf8').splitlines()) if stderr else '<empty>') > > I think. No, that part's correct, as it is prefixing each line with STDERR: (although adding some parentheses will make that clearer). It's actually complaining about the %d getting handed a string two lines above. Clearly I need a unit test for this ...

Dirk Pranke

Comment 15 2012-03-22 11:43:19 PDT

Created attachment 133304 [details] fix BasePort._get_crash_log()

Dirk Pranke

Comment 16 2012-03-22 11:44:38 PDT

Committed r111736: <http://trac.webkit.org/changeset/111736>

Attachments
Patch (19.18 KB, patch) 2012-03-19 17:47 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
tweak changelog slightly (19.45 KB, patch) 2012-03-19 17:49 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
patch for landing, updated w/ review feedback (19.62 KB, patch) 2012-03-21 15:21 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
fix BasePort._get_crash_log() (21.49 KB, patch) 2012-03-22 11:43 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.