WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
81603
webkitpy: clean up actually getting crash logs from DRT/WTR crashes
https://bugs.webkit.org/show_bug.cgi?id=81603
Summary
webkitpy: clean up actually getting crash logs from DRT/WTR crashes
Dirk Pranke
Reported
2012-03-19 17:46:14 PDT
currently the logic is scattered between different files and doesn't actually handle WebProcess crashes properly.
Attachments
Patch
(19.18 KB, patch)
2012-03-19 17:47 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
tweak changelog slightly
(19.45 KB, patch)
2012-03-19 17:49 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
patch for landing, updated w/ review feedback
(19.62 KB, patch)
2012-03-21 15:21 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
fix BasePort._get_crash_log()
(21.49 KB, patch)
2012-03-22 11:43 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Dirk Pranke
Comment 1
2012-03-19 17:47:42 PDT
Created
attachment 132735
[details]
Patch
Dirk Pranke
Comment 2
2012-03-19 17:49:37 PDT
Created
attachment 132736
[details]
tweak changelog slightly
Adam Roben (:aroben)
Comment 3
2012-03-20 12:12:04 PDT
Comment on
attachment 132736
[details]
tweak changelog slightly View in context:
https://bugs.webkit.org/attachment.cgi?id=132736&action=review
> Tools/ChangeLog:15 > + Mac will now slow-spin with a timeout waiting for the crash log, rather > + than waiting for the ReportCrash process to exit (it appears > + that the mac will manage multiple crashes with a single > + ReportCrash process, the process waits around longer than > + necessary, presumably to avoid thrashing if processes are > + repeatedly crashing).
Seems like we could optimize this by just keeping track of all the process name/pid pairs that crashed during the test run, and only trying to retrieve crash logs for them at the very end of the run. Then in the common case ReportCrash would finish its business without us ever having to wait on it. We'd only have to wait for tests that crashed very near the end of the test run.
Dirk Pranke
Comment 4
2012-03-20 13:02:32 PDT
Comment on
attachment 132736
[details]
tweak changelog slightly View in context:
https://bugs.webkit.org/attachment.cgi?id=132736&action=review
>> Tools/ChangeLog:15 >> + repeatedly crashing). > > Seems like we could optimize this by just keeping track of all the process name/pid pairs that crashed during the test run, and only trying to retrieve crash logs for them at the very end of the run. Then in the common case ReportCrash would finish its business without us ever having to wait on it. We'd only have to wait for tests that crashed very near the end of the test run.
Perhaps. I think it's a little conceptually clearer to return the crash as part of the test output itself. In the normal case, most of the waiting happens before we even get notified in the select() that the process has crashed; the only time we need to wait longer is when lots of things are crashing at once. I'd prefer to leave things as-is until we see a need to do another pass over things.
Adam Barth
Comment 5
2012-03-21 14:29:32 PDT
Comment on
attachment 132736
[details]
tweak changelog slightly View in context:
https://bugs.webkit.org/attachment.cgi?id=132736&action=review
Ok. I kind of like aroben's idea of waiting until the end. I'm not sure whether that's going to make things go significantly faster in practice, but it might be a good refinement to experiment with.
> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:611 > return DriverOutput(text, output_image, actual_checksum, audio=audio_bytes, > - crash=crash, crashed_process_name=crashed_process_name, test_time=run_time, timeout=timeout, error=error) > + crash=crash, crashed_process_name=crashed_process_name, crashed_pid=crashed_pid, crash_log=crash_log, > + test_time=run_time, timeout=timeout, error=error)
This line of code is getting kind of out of control!
> Tools/Scripts/webkitpy/layout_tests/port/mac.py:169 > + CRASH_LOG_WAIT_TIMEOUT_SECS = 5 * int(self.get_option('child_processes'))
Is there a reason CRASH_LOG_WAIT_TIMEOUT_SECS is in all caps? Does get_option return the actual number of child processes we're using (e.g., if we have memory limitations)?
Dirk Pranke
Comment 6
2012-03-21 14:37:53 PDT
(In reply to
comment #5
)
> (From update of
attachment 132736
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=132736&action=review
> > Ok. I kind of like aroben's idea of waiting until the end. I'm not sure whether that's going to make things go significantly faster in practice, but it might be a good refinement to experiment with. > > > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:611 > > return DriverOutput(text, output_image, actual_checksum, audio=audio_bytes, > > - crash=crash, crashed_process_name=crashed_process_name, test_time=run_time, timeout=timeout, error=error) > > + crash=crash, crashed_process_name=crashed_process_name, crashed_pid=crashed_pid, crash_log=crash_log, > > + test_time=run_time, timeout=timeout, error=error) > > This line of code is getting kind of out of control! >
Yeah ... I'm not sure that the named arguments are really needed, either.
> > Tools/Scripts/webkitpy/layout_tests/port/mac.py:169 > > + CRASH_LOG_WAIT_TIMEOUT_SECS = 5 * int(self.get_option('child_processes')) > > Is there a reason CRASH_LOG_WAIT_TIMEOUT_SECS is in all caps?
Hm. No; it was a constant at one point. Will fix.
> Does get_option return the actual number of child processes we're using (e.g., if we have memory limitations)?
It represents the max number of child processes, so it respects the memory limitations. However, we may have fewer children actually running (either because we didn't need that many, or because the others have already finished). Getting the number actually running would be difficult if not impossible since it depends on what is going on in other processes.
Dirk Pranke
Comment 7
2012-03-21 14:41:32 PDT
(In reply to
comment #5
)
> (From update of
attachment 132736
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=132736&action=review
> > Ok. I kind of like aroben's idea of waiting until the end. I'm not sure whether that's going to make things go significantly faster in practice, but it might be a good refinement to experiment with. >
Another downside to this is that currently *all* of the "handle test failure and put stuff into layout-test-results" is done in the worker/child processes, in parallel. I wouldn't really like to split that logic up, and pushing that into the manager to the end could end up slowing things down (would have to code it to see). There is a certain appeal to only having the manager process writing into layout-test-results, though.
Dirk Pranke
Comment 8
2012-03-21 15:21:44 PDT
Created
attachment 133125
[details]
patch for landing, updated w/ review feedback
Dirk Pranke
Comment 9
2012-03-21 15:24:52 PDT
Committed
r111609
: <
http://trac.webkit.org/changeset/111609
>
Csaba Osztrogonác
Comment 10
2012-03-22 01:05:47 PDT
(In reply to
comment #9
)
> Committed
r111609
: <
http://trac.webkit.org/changeset/111609
>
Reopen, because it broke Qt WK2 tester bot:
http://build.webkit.sed.hu/builders/x86-32%20Linux%20Qt%20Release%20WebKit2/builds/21876/steps/layout-test/logs/stdio
Could you check what happened?
Csaba Osztrogonác
Comment 11
2012-03-22 01:21:24 PDT
I rolled it out to unbreak Qt WK2 tester bot -
http://trac.webkit.org/changeset/111661
(with accidentally change in
r111615
too)
Dirk Pranke
Comment 12
2012-03-22 10:00:28 PDT
(In reply to
comment #11
)
> I rolled it out to unbreak Qt WK2 tester bot -
http://trac.webkit.org/changeset/111661
(with accidentally change in
r111615
too)
Argh. This was fixed in
r111615
... it was just the ChangeLog that was wrong :(. Starting with build 21879 on that bot, though, there appears to be another, different bug. I will look into that one as well.
Philippe Normand
Comment 13
2012-03-22 10:22:09 PDT
(In reply to
comment #12
)
> (In reply to
comment #11
) > > I rolled it out to unbreak Qt WK2 tester bot -
http://trac.webkit.org/changeset/111661
(with accidentally change in
r111615
too) > > Argh. This was fixed in
r111615
... it was just the ChangeLog that was wrong :(. > > Starting with build 21879 on that bot, though, there appears to be another, different bug. I will look into that one as well.
'\n'.join('STDERR: ' + line for line in stderr.decode('utf8').splitlines()) if stderr else '<empty>') should be: '\n'.join(['STDERR: '] + line for line in stderr.decode('utf8').splitlines()) if stderr else '<empty>') I think.
Dirk Pranke
Comment 14
2012-03-22 11:21:12 PDT
(In reply to
comment #13
)
> (In reply to
comment #12
) > > (In reply to
comment #11
) > > > I rolled it out to unbreak Qt WK2 tester bot -
http://trac.webkit.org/changeset/111661
(with accidentally change in
r111615
too) > > > > Argh. This was fixed in
r111615
... it was just the ChangeLog that was wrong :(. > > > > Starting with build 21879 on that bot, though, there appears to be another, different bug. I will look into that one as well. > > '\n'.join('STDERR: ' + line for line in stderr.decode('utf8').splitlines()) if stderr else '<empty>') > > should be: > > > '\n'.join(['STDERR: '] + line for line in stderr.decode('utf8').splitlines()) if stderr else '<empty>') > > I think.
No, that part's correct, as it is prefixing each line with STDERR: (although adding some parentheses will make that clearer). It's actually complaining about the %d getting handed a string two lines above. Clearly I need a unit test for this ...
Dirk Pranke
Comment 15
2012-03-22 11:43:19 PDT
Created
attachment 133304
[details]
fix BasePort._get_crash_log()
Dirk Pranke
Comment 16
2012-03-22 11:44:38 PDT
Committed
r111736
: <
http://trac.webkit.org/changeset/111736
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug