WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
81600
webkitpy: get ServerProcess out of the reportcrash business
https://bugs.webkit.org/show_bug.cgi?id=81600
Summary
webkitpy: get ServerProcess out of the reportcrash business
Dirk Pranke
Reported
2012-03-19 17:28:29 PDT
webkitpy: get ServerProcess out of the reportcrash business
Attachments
Patch
(12.64 KB, patch)
2012-03-19 17:32 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
Patch
(12.41 KB, patch)
2012-03-20 15:39 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
fix another crash
(12.74 KB, patch)
2012-03-21 13:22 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Dirk Pranke
Comment 1
2012-03-19 17:32:16 PDT
Created
attachment 132732
[details]
Patch
Adam Barth
Comment 2
2012-03-19 17:39:47 PDT
Comment on
attachment 132732
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=132732&action=review
> Tools/Scripts/webkitpy/layout_tests/port/server_process.py:106 > + if self._proc.returncode in (-1073741510, -signal.SIGINT):
-1073741510 <- Can we name this constant?
Dirk Pranke
Comment 3
2012-03-19 17:52:55 PDT
Comment on
attachment 132732
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=132732&action=review
>> Tools/Scripts/webkitpy/layout_tests/port/server_process.py:106 >> + if self._proc.returncode in (-1073741510, -signal.SIGINT): > > -1073741510 <- Can we name this constant?
I don't actually know what a good name for this constant would be; given the FIXME above, I suspect we can actually delete the constant altogether, and even if we can't, I need to figure out if this value correlates to something. I'd like to defer doing that and removing the fixme to a subsequent patch if that's okay, since getting to the root of this is kind of a different effort.
Adam Barth
Comment 4
2012-03-19 17:54:28 PDT
Comment on
attachment 132732
[details]
Patch I'm not sure I understand. Where did the code that waits for ReportCrash to finish dumping the stack go?
Adam Barth
Comment 5
2012-03-19 17:55:09 PDT
Do we not need to wait for ReportCrash anymore?
Dirk Pranke
Comment 6
2012-03-19 18:04:13 PDT
(In reply to
comment #4
)
> (From update of
attachment 132732
[details]
) > I'm not sure I understand. Where did the code that waits for ReportCrash to finish dumping the stack go?
That logic moves into the Port classes (see
bug 81603
), but I should probably move some of it back here. The reason is slightly complicated. It turns out that waiting for ReportCrash to exit is the wrong thing to do. Under normal circumstances (things are crashing rarely), ReportCrash actually writes the crash log before python is even notified that the subprocess has crashed. When lots of things are crashing, however, the write may be somewhat delayed (I don't really know what's going on, but it's like there is a delay before the file gets flushed to disk). So, in some cases, we actually do want to wait a while longer to look for the log - hence, in the patch in
bug 81603
, there's a slow spin for a period of time derived from how many workers are running in parallel. However, in any case, the operating system appears to keep ReportCrash around for a long (~30-40second) time just so it doesn't have to re-launch it if there is another crash. Also, there is only ever one ReportCrash running, regardless of how many processes are crashing at once (it appears that the single ReportCrash handles multiple crashes simultaneously). Most of the time, not waiting at all seems to be fine. Sometimes - if the world is broken and so every test crashes - you want to wait for a while. Ideally (for me) we'd land this patch as-is and then get "more correct" when 81603 lands. However, if you like I can rework the division of labor between the two patches a bit so that we even slow-spin as part of this patch as well.
Dirk Pranke
Comment 7
2012-03-19 18:04:41 PDT
(In reply to
comment #6
)
> That logic moves into the Port classes (see
bug 81603
), but I should probably move some of it back here.
s/should probably/could if necessary/. :)
Adam Barth
Comment 8
2012-03-19 18:18:04 PDT
I see. I think I misunderstood how ReportCrash worked previously. I thought there was one for each crash dump.
Adam Barth
Comment 9
2012-03-19 18:19:02 PDT
Comment on
attachment 132732
[details]
Patch I'm not sure I understand 100% of the details in this patch, but I like the overall direction.
Dirk Pranke
Comment 10
2012-03-19 18:33:10 PDT
Committed
r111293
: <
http://trac.webkit.org/changeset/111293
>
Jessie Berlin
Comment 11
2012-03-19 20:11:26 PDT
This is still breaking the layout tests on the bots, even after
http://trac.webkit.org/changeset/111307
http://build.webkit.org/builders/Lion%20Intel%20Debug%20%28WebKit2%20Tests%29/builds/5070/steps/layout-test/logs/stdio
-03-19 19:51:42,660 43121 single_test_runner.py:209 DEBUG /Volumes/Data/slave/lion-intel-debug/build/Source/WebCore/loader/appcache/ApplicationCacheStorage.cpp(557) : bool WebCore::ApplicationCacheStorage::executeSQLCommand(const WTF::String &) 2012-03-19 19:51:42,660 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager_worker_broker.py", line 280, in run 2012-03-19 19:51:42,661 43123 stack_utils.py:67 DEBUG self._worker_connection.run_message_loop() 2012-03-19 19:51:42,661 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager_worker_broker.py", line 232, in run_message_loop 2012-03-19 19:51:42,661 43123 stack_utils.py:67 DEBUG self._broker.run_message_loop(self._run_topic, self._client, delay_secs) 2012-03-19 19:51:42,661 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager_worker_broker.py", line 168, in run_message_loop 2012-03-19 19:51:42,661 43123 stack_utils.py:67 DEBUG self._run_loop(topic_name, client, block=True, delay_secs=delay_secs) 2012-03-19 19:51:42,661 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager_worker_broker.py", line 182, in _run_loop 2012-03-19 19:51:42,662 43121 worker.py:206 DEBUG worker/5 fast/media/media-query-invalid-value.html passed 2012-03-19 19:51:42,662 43123 stack_utils.py:67 DEBUG self._dispatch_message(msg, client) 2012-03-19 19:51:42,662 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager_worker_broker.py", line 191, in _dispatch_message 2012-03-19 19:51:42,662 43123 stack_utils.py:67 DEBUG message_handler(message.src, *optargs) 2012-03-19 19:51:42,662 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/worker.py", line 128, in handle_test_list 2012-03-19 19:51:42,662 43123 stack_utils.py:67 DEBUG self._run_test(test_input) 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/worker.py", line 143, in _run_test 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG result = self.run_test_with_timeout(test_input, test_timeout_sec) 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/worker.py", line 185, in run_test_with_timeout 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG return self._run_test_in_this_thread(test_input) 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/worker.py", line 270, in _run_test_in_this_thread 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG return self.run_single_test(self._driver, test_input) 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/worker.py", line 274, in run_single_test 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG test_input, driver, self._name) 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py", line 46, in run_single_test 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG return runner.run() 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py", line 106, in run 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG return self._run_compare_test() 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py", line 119, in _run_compare_test 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG test_result_writer.write_test_result(self._filesystem, self._port, self._test_name, driver_output, expected_driver_output, test_result.failures) 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/test_result_writer.py", line 66, in write_test_result 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG writer.write_crash_report(crashed_driver_output.crashed_process_name, crashed_driver_output.error) 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/test_result_writer.py", line 165, in write_crash_report 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG log = crash_logs.find_newest_log(crashed_process_name) 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/common/system/crashlogs.py", line 40, in find_newest_log 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG return self._find_newest_log_darwin(process_name, pid, include_errors) 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/common/system/crashlogs.py", line 57, in _find_newest_log_darwin 2012-03-19 19:51:42,666 43123 stack_utils.py:67 DEBUG logs = self._filesystem.files_under(log_directory, file_filter=is_crash_log) 2012-03-19 19:51:42,666 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/common/system/filesystem.py", line 111, in files_under 2012-03-19 19:51:42,666 43123 stack_utils.py:67 DEBUG if file_filter(self, dirpath, filename): 2012-03-19 19:51:42,666 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/common/system/crashlogs.py", line 54, in is_crash_log 2012-03-19 19:51:42,666 43123 stack_utils.py:67 DEBUG return basename.startswith(process_name + "_") and basename.endswith(".crash") 2012-03-19 19:51:42,668 43123 manager_worker_broker.py:291 DEBUG worker/7 done with message loop, exception raised 2012-03-19 19:51:42,669 42883 manager.py:1424 ERROR worker/7 raised TypeError('unsupported operand type(s) for +: 'NoneType' and 'str''): 2012-03-19 19:51:42,669 42883 manager.py:1458 ERROR layout_tests/controllers/manager_worker_broker.py:280 (in run) 2012-03-19 19:51:42,669 42883 manager.py:1459 ERROR self._worker_connection.run_message_loop() 2012-03-19 19:51:42,669 42883 manager.py:1458 ERROR layout_tests/controllers/manager_worker_broker.py:232 (in run_message_loop) 2012-03-19 19:51:42,669 42883 manager.py:1459 ERROR self._broker.run_message_loop(self._run_topic, self._client, delay_secs) 2012-03-19 19:51:42,669 42883 manager.py:1458 ERROR layout_tests/controllers/manager_worker_broker.py:168 (in run_message_loop) 2012-03-19 19:51:42,669 42883 manager.py:1459 ERROR self._run_loop(topic_name, client, block=True, delay_secs=delay_secs) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/manager_worker_broker.py:182 (in _run_loop) 2012-03-19 19:51:42,670 42883 manager.py:1459 ERROR self._dispatch_message(msg, client) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/manager_worker_broker.py:191 (in _dispatch_message) 2012-03-19 19:51:42,670 42883 manager.py:1459 ERROR message_handler(message.src, *optargs) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/worker.py:128 (in handle_test_list) 2012-03-19 19:51:42,670 42883 manager.py:1459 ERROR self._run_test(test_input) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/worker.py:143 (in _run_test) 2012-03-19 19:51:42,670 42883 manager.py:1459 ERROR result = self.run_test_with_timeout(test_input, test_timeout_sec) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/worker.py:185 (in run_test_with_timeout) 2012-03-19 19:51:42,670 42883 manager.py:1459 ERROR return self._run_test_in_this_thread(test_input) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/worker.py:270 (in _run_test_in_this_thread) 2012-03-19 19:51:42,670 42883 manager.py:1459 ERROR return self.run_single_test(self._driver, test_input) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/worker.py:274 (in run_single_test) 2012-03-19 19:51:42,671 42883 manager.py:1459 ERROR test_input, driver, self._name) 2012-03-19 19:51:42,671 42883 manager.py:1458 ERROR layout_tests/controllers/single_test_runner.py:46 (in run_single_test) 2012-03-19 19:51:42,671 42883 manager.py:1459 ERROR return runner.run() 2012-03-19 19:51:42,671 42883 manager.py:1458 ERROR layout_tests/controllers/single_test_runner.py:106 (in run) 2012-03-19 19:51:42,671 42883 manager.py:1459 ERROR return self._run_compare_test() 2012-03-19 19:51:42,671 42883 manager.py:1458 ERROR layout_tests/controllers/single_test_runner.py:119 (in _run_compare_test) 2012-03-19 19:51:42,671 42883 manager.py:1459 ERROR test_result_writer.write_test_result(self._filesystem, self._port, self._test_name, driver_output, expected_driver_output, test_result.failures) 2012-03-19 19:51:42,671 43123 worker.py:151 DEBUG worker/7 cleaning up 2012-03-19 19:51:42,671 42883 manager.py:1458 ERROR layout_tests/controllers/test_result_writer.py:66 (in write_test_result) 2012-03-19 19:51:42,671 42883 manager.py:1459 ERROR writer.write_crash_report(crashed_driver_output.crashed_process_name, crashed_driver_output.error) 2012-03-19 19:51:42,671 42883 manager.py:1458 ERROR layout_tests/controllers/test_result_writer.py:165 (in write_crash_report) 2012-03-19 19:51:42,671 42883 manager.py:1459 ERROR log = crash_logs.find_newest_log(crashed_process_name) 2012-03-19 19:51:42,671 43123 worker.py:178 DEBUG worker/7 killing driver 2012-03-19 19:51:42,671 42883 manager.py:1458 ERROR common/system/crashlogs.py:40 (in find_newest_log) 2012-03-19 19:51:42,672 42883 manager.py:1459 ERROR return self._find_newest_log_darwin(process_name, pid, include_errors) 2012-03-19 19:51:42,672 42883 manager.py:1458 ERROR common/system/crashlogs.py:57 (in _find_newest_log_darwin) 2012-03-19 19:51:42,672 42883 manager.py:1459 ERROR logs = self._filesystem.files_under(log_directory, file_filter=is_crash_log) 2012-03-19 19:51:42,672 42883 manager.py:1458 ERROR common/system/filesystem.py:111 (in files_under) 2012-03-19 19:51:42,672 42883 manager.py:1459 ERROR if file_filter(self, dirpath, filename): 2012-03-19 19:51:42,672 42883 manager.py:1458 ERROR common/system/crashlogs.py:54 (in is_crash_log) 2012-03-19 19:51:42,672 42883 manager.py:1459 ERROR return basename.startswith(process_name + "_") and basename.endswith(".crash")
Jessie Berlin
Comment 12
2012-03-19 20:12:19 PDT
I think I may have to roll out those patches soon if this isn't addressed.
Jessie Berlin
Comment 13
2012-03-19 20:48:06 PDT
I had to roll out the changes in
http://trac.webkit.org/changeset/111343
Maciej Stachowiak
Comment 14
2012-03-20 00:52:34 PDT
Comment on
attachment 132732
[details]
Patch Unflagging patch, since it got rolled out.
Dirk Pranke
Comment 15
2012-03-20 13:56:32 PDT
Thanks for rolling this out, and I'm sorry I wasn't around to debug/fix/roll it out myself!
Dirk Pranke
Comment 16
2012-03-20 15:39:02 PDT
Created
attachment 132911
[details]
Patch
Dirk Pranke
Comment 17
2012-03-20 15:39:48 PDT
Comment on
attachment 132911
[details]
Patch The actual fix: layout_tests $ git diff port/webkit.py 494c494,497 < return self._server_process.has_crashed() ---
> if self._server_process.has_crashed(): > self._crashed_subprocess_name = self._port.driver_name() > return True > return False
501c504,505 < elif error_line == "#CRASHED - WebProcess\n": ---
> return True > if error_line == "#CRASHED - WebProcess\n":
503a508
> return True
Dirk Pranke
Comment 18
2012-03-20 15:40:19 PDT
Committed
r111453
: <
http://trac.webkit.org/changeset/111453
>
mitz
Comment 19
2012-03-20 16:52:53 PDT
(In reply to
comment #18
)
> Committed
r111453
: <
http://trac.webkit.org/changeset/111453
>
This change broke Lion Intel Release (WebKit2 Tests), as seen here: <
http://build.webkit.org/builders/Lion%20Intel%20Release%20%28WebKit2%20Tests%29/builds/5535/steps/layout-test/logs/stdio/text
>.
Dirk Pranke
Comment 20
2012-03-20 17:00:17 PDT
Reverted
r111453
for reason: still broken Committed
r111470
: <
http://trac.webkit.org/changeset/111470
>
Philippe Normand
Comment 21
2012-03-21 02:27:29 PDT
(In reply to
comment #20
)
> Reverted
r111453
for reason: > > still broken > > Committed
r111470
: <
http://trac.webkit.org/changeset/111470
>
I still see this in GTK Debug when a crash happens: 2012-03-21 02:10:45,581 1482 manager.py:1424 ERROR worker/1 raised UnicodeDecodeError(''ascii' codec can't decode byte 0xce in position 235: ordinal not in range(128)'): 2012-03-21 02:10:45,582 1482 manager.py:1458 ERROR layout_tests/controllers/manager_worker_broker.py:280 (in run) 2012-03-21 02:10:45,582 1482 manager.py:1459 ERROR self._worker_connection.run_message_loop() ... 2012-03-21 02:10:45,584 1482 manager.py:1458 ERROR layout_tests/controllers/single_test_runner.py:119 (in _run_compare_test) 2012-03-21 02:10:45,584 1482 manager.py:1459 ERROR test_result_writer.write_test_result(self._filesystem, self._port, self._test_name, driver_output, expected_driver_output, test_result.failures) 2012-03-21 02:10:45,584 1482 manager.py:1458 ERROR layout_tests/controllers/test_result_writer.py:66 (in write_test_result) 2012-03-21 02:10:45,584 1482 manager.py:1459 ERROR writer.write_crash_report(crashed_driver_output.crashed_process_name, crashed_driver_output.error) 2012-03-21 02:10:45,584 1482 manager.py:1458 ERROR layout_tests/controllers/test_result_writer.py:168 (in write_crash_report) 2012-03-21 02:10:45,584 1482 manager.py:1459 ERROR fs.write_text_file(filename, log if log else error) 2012-03-21 02:10:45,584 1482 manager.py:1458 ERROR common/system/filesystem.py:223 (in write_text_file) 2012-03-21 02:10:45,584 1482 manager.py:1459 ERROR f.write(contents) 2012-03-21 02:10:45,584 1482 manager.py:1458 ERROR /usr/lib/python2.6/codecs.py:691 (in write) 2012-03-21 02:10:45,584 1482 manager.py:1459 ERROR return self.writer.write(data) 2012-03-21 02:10:45,585 1482 manager.py:1458 ERROR /usr/lib/python2.6/codecs.py:351 (in write) 2012-03-21 02:10:45,585 1482 manager.py:1459 ERROR data, consumed = self.encode(object, self.errors) The error variable needs to be: errors.decode('utf8')
Dirk Pranke
Comment 22
2012-03-21 13:11:57 PDT
(In reply to
comment #21
)
> The error variable needs to be: errors.decode('utf8')
Okay, will fix with the next attempt :).
Dirk Pranke
Comment 23
2012-03-21 13:22:14 PDT
Created
attachment 133102
[details]
fix another crash
Dirk Pranke
Comment 24
2012-03-21 13:23:28 PDT
Actual fix in patch #3: $ diff --git a/Tools/Scripts/webkitpy/layout_tests/port/webkit.py b/Tools/Scripts/webkitpy/layout_tests/port/webkit.py index 32a7a25..b998c47 100644 --- a/Tools/Scripts/webkitpy/layout_tests/port/webkit.py +++ b/Tools/Scripts/webkitpy/layout_tests/port/webkit.py @@ -588,7 +588,7 @@ class WebKitDriver(Driver): block = ContentBlock() out_seen_eof = False - while True: + while not self.has_crashed(): if out_seen_eof and (self.err_seen_eof or not wait_for_stderr_eof): break $ I need to revamp ServerProcess to be a little more bulletproof, but I'll do that in a separate patch.
Dirk Pranke
Comment 25
2012-03-21 13:29:45 PDT
Committed
r111599
: <
http://trac.webkit.org/changeset/111599
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug