RESOLVED FIXED 81603
webkitpy: clean up actually getting crash logs from DRT/WTR crashes
https://bugs.webkit.org/show_bug.cgi?id=81603
Summary webkitpy: clean up actually getting crash logs from DRT/WTR crashes
Dirk Pranke
Reported 2012-03-19 17:46:14 PDT
currently the logic is scattered between different files and doesn't actually handle WebProcess crashes properly.
Attachments
Patch (19.18 KB, patch)
2012-03-19 17:47 PDT, Dirk Pranke
no flags
tweak changelog slightly (19.45 KB, patch)
2012-03-19 17:49 PDT, Dirk Pranke
no flags
patch for landing, updated w/ review feedback (19.62 KB, patch)
2012-03-21 15:21 PDT, Dirk Pranke
no flags
fix BasePort._get_crash_log() (21.49 KB, patch)
2012-03-22 11:43 PDT, Dirk Pranke
no flags
Dirk Pranke
Comment 1 2012-03-19 17:47:42 PDT
Dirk Pranke
Comment 2 2012-03-19 17:49:37 PDT
Created attachment 132736 [details] tweak changelog slightly
Adam Roben (:aroben)
Comment 3 2012-03-20 12:12:04 PDT
Comment on attachment 132736 [details] tweak changelog slightly View in context: https://bugs.webkit.org/attachment.cgi?id=132736&action=review > Tools/ChangeLog:15 > + Mac will now slow-spin with a timeout waiting for the crash log, rather > + than waiting for the ReportCrash process to exit (it appears > + that the mac will manage multiple crashes with a single > + ReportCrash process, the process waits around longer than > + necessary, presumably to avoid thrashing if processes are > + repeatedly crashing). Seems like we could optimize this by just keeping track of all the process name/pid pairs that crashed during the test run, and only trying to retrieve crash logs for them at the very end of the run. Then in the common case ReportCrash would finish its business without us ever having to wait on it. We'd only have to wait for tests that crashed very near the end of the test run.
Dirk Pranke
Comment 4 2012-03-20 13:02:32 PDT
Comment on attachment 132736 [details] tweak changelog slightly View in context: https://bugs.webkit.org/attachment.cgi?id=132736&action=review >> Tools/ChangeLog:15 >> + repeatedly crashing). > > Seems like we could optimize this by just keeping track of all the process name/pid pairs that crashed during the test run, and only trying to retrieve crash logs for them at the very end of the run. Then in the common case ReportCrash would finish its business without us ever having to wait on it. We'd only have to wait for tests that crashed very near the end of the test run. Perhaps. I think it's a little conceptually clearer to return the crash as part of the test output itself. In the normal case, most of the waiting happens before we even get notified in the select() that the process has crashed; the only time we need to wait longer is when lots of things are crashing at once. I'd prefer to leave things as-is until we see a need to do another pass over things.
Adam Barth
Comment 5 2012-03-21 14:29:32 PDT
Comment on attachment 132736 [details] tweak changelog slightly View in context: https://bugs.webkit.org/attachment.cgi?id=132736&action=review Ok. I kind of like aroben's idea of waiting until the end. I'm not sure whether that's going to make things go significantly faster in practice, but it might be a good refinement to experiment with. > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:611 > return DriverOutput(text, output_image, actual_checksum, audio=audio_bytes, > - crash=crash, crashed_process_name=crashed_process_name, test_time=run_time, timeout=timeout, error=error) > + crash=crash, crashed_process_name=crashed_process_name, crashed_pid=crashed_pid, crash_log=crash_log, > + test_time=run_time, timeout=timeout, error=error) This line of code is getting kind of out of control! > Tools/Scripts/webkitpy/layout_tests/port/mac.py:169 > + CRASH_LOG_WAIT_TIMEOUT_SECS = 5 * int(self.get_option('child_processes')) Is there a reason CRASH_LOG_WAIT_TIMEOUT_SECS is in all caps? Does get_option return the actual number of child processes we're using (e.g., if we have memory limitations)?
Dirk Pranke
Comment 6 2012-03-21 14:37:53 PDT
(In reply to comment #5) > (From update of attachment 132736 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=132736&action=review > > Ok. I kind of like aroben's idea of waiting until the end. I'm not sure whether that's going to make things go significantly faster in practice, but it might be a good refinement to experiment with. > > > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:611 > > return DriverOutput(text, output_image, actual_checksum, audio=audio_bytes, > > - crash=crash, crashed_process_name=crashed_process_name, test_time=run_time, timeout=timeout, error=error) > > + crash=crash, crashed_process_name=crashed_process_name, crashed_pid=crashed_pid, crash_log=crash_log, > > + test_time=run_time, timeout=timeout, error=error) > > This line of code is getting kind of out of control! > Yeah ... I'm not sure that the named arguments are really needed, either. > > Tools/Scripts/webkitpy/layout_tests/port/mac.py:169 > > + CRASH_LOG_WAIT_TIMEOUT_SECS = 5 * int(self.get_option('child_processes')) > > Is there a reason CRASH_LOG_WAIT_TIMEOUT_SECS is in all caps? Hm. No; it was a constant at one point. Will fix. > Does get_option return the actual number of child processes we're using (e.g., if we have memory limitations)? It represents the max number of child processes, so it respects the memory limitations. However, we may have fewer children actually running (either because we didn't need that many, or because the others have already finished). Getting the number actually running would be difficult if not impossible since it depends on what is going on in other processes.
Dirk Pranke
Comment 7 2012-03-21 14:41:32 PDT
(In reply to comment #5) > (From update of attachment 132736 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=132736&action=review > > Ok. I kind of like aroben's idea of waiting until the end. I'm not sure whether that's going to make things go significantly faster in practice, but it might be a good refinement to experiment with. > Another downside to this is that currently *all* of the "handle test failure and put stuff into layout-test-results" is done in the worker/child processes, in parallel. I wouldn't really like to split that logic up, and pushing that into the manager to the end could end up slowing things down (would have to code it to see). There is a certain appeal to only having the manager process writing into layout-test-results, though.
Dirk Pranke
Comment 8 2012-03-21 15:21:44 PDT
Created attachment 133125 [details] patch for landing, updated w/ review feedback
Dirk Pranke
Comment 9 2012-03-21 15:24:52 PDT
Csaba Osztrogonác
Comment 10 2012-03-22 01:05:47 PDT
Csaba Osztrogonác
Comment 11 2012-03-22 01:21:24 PDT
I rolled it out to unbreak Qt WK2 tester bot - http://trac.webkit.org/changeset/111661 (with accidentally change in r111615 too)
Dirk Pranke
Comment 12 2012-03-22 10:00:28 PDT
(In reply to comment #11) > I rolled it out to unbreak Qt WK2 tester bot - http://trac.webkit.org/changeset/111661 (with accidentally change in r111615 too) Argh. This was fixed in r111615 ... it was just the ChangeLog that was wrong :(. Starting with build 21879 on that bot, though, there appears to be another, different bug. I will look into that one as well.
Philippe Normand
Comment 13 2012-03-22 10:22:09 PDT
(In reply to comment #12) > (In reply to comment #11) > > I rolled it out to unbreak Qt WK2 tester bot - http://trac.webkit.org/changeset/111661 (with accidentally change in r111615 too) > > Argh. This was fixed in r111615 ... it was just the ChangeLog that was wrong :(. > > Starting with build 21879 on that bot, though, there appears to be another, different bug. I will look into that one as well. '\n'.join('STDERR: ' + line for line in stderr.decode('utf8').splitlines()) if stderr else '<empty>') should be: '\n'.join(['STDERR: '] + line for line in stderr.decode('utf8').splitlines()) if stderr else '<empty>') I think.
Dirk Pranke
Comment 14 2012-03-22 11:21:12 PDT
(In reply to comment #13) > (In reply to comment #12) > > (In reply to comment #11) > > > I rolled it out to unbreak Qt WK2 tester bot - http://trac.webkit.org/changeset/111661 (with accidentally change in r111615 too) > > > > Argh. This was fixed in r111615 ... it was just the ChangeLog that was wrong :(. > > > > Starting with build 21879 on that bot, though, there appears to be another, different bug. I will look into that one as well. > > '\n'.join('STDERR: ' + line for line in stderr.decode('utf8').splitlines()) if stderr else '<empty>') > > should be: > > > '\n'.join(['STDERR: '] + line for line in stderr.decode('utf8').splitlines()) if stderr else '<empty>') > > I think. No, that part's correct, as it is prefixing each line with STDERR: (although adding some parentheses will make that clearer). It's actually complaining about the %d getting handed a string two lines above. Clearly I need a unit test for this ...
Dirk Pranke
Comment 15 2012-03-22 11:43:19 PDT
Created attachment 133304 [details] fix BasePort._get_crash_log()
Dirk Pranke
Comment 16 2012-03-22 11:44:38 PDT
Note You need to log in before you can comment on or make changes to this bug.