WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
61847
Worker may not be stopped after LT.
https://bugs.webkit.org/show_bug.cgi?id=61847
Summary
Worker may not be stopped after LT.
Hao Zheng
Reported
2011-06-01 01:48:01 PDT
In worker.py, only call to cleanup() is in __del__, which is not guaranteed to be called when the python interpreter exits. And self._driver is not killed after LT. After many runs of LT, there will be many zombie processes of driver. I think we need to run cleanup() explicitly for worker. I'm not sure where is better to put cleanup(): def handle_stop(self, src): self._done = True ** self.cleanup() Or, def run(self, port): self.safe_init(port) exception_msg = "" _log.debug("%s starting" % self._name) try: self._worker_connection.run_message_loop() if not self.is_done(): raise AssertionError("%s: ran out of messages in worker queue." % self._name) except KeyboardInterrupt: exception_msg = ", interrupted" except: exception_msg = ", exception raised" finally: _log.debug("%s done%s" % (self._name, exception_msg)) if exception_msg: exception_type, exception_value, exception_traceback = sys.exc_info() stack_utils.log_traceback(_log.debug, exception_traceback) # FIXME: Figure out how to send a message with a traceback. self._worker_connection.post_message('exception', (exception_type, exception_value, None)) self._worker_connection.post_message('done') ** self.cleanup() I'd like to hear your comments.
Attachments
Proposed patch.
(1.47 KB, patch)
2011-06-01 21:15 PDT
,
Hao Zheng
no flags
Details
Formatted Diff
Diff
Patch
(3.75 KB, patch)
2011-06-02 19:17 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
Patch
(3.55 KB, patch)
2011-06-03 12:34 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Dirk Pranke
Comment 1
2011-06-01 10:34:28 PDT
cleanup() should be getting called at the end of run(). I'm pretty sure it did at some point; I wonder if I deleted it? Feel free to add a patch for this, or I'll do so this afternoon.
Hao Zheng
Comment 2
2011-06-01 21:15:49 PDT
Created
attachment 95716
[details]
Proposed patch.
WebKit Commit Bot
Comment 3
2011-06-02 12:58:55 PDT
The commit-queue encountered the following flaky tests while processing
attachment 95716
[details]
: fast/history/history-subframe-with-name.html
bug 51039
(author:
mihaip@chromium.org
) http/tests/websocket/tests/error-detect.html
bug 54012
(author:
abarth@webkit.org
) The commit-queue is continuing to process your patch.
WebKit Commit Bot
Comment 4
2011-06-02 13:01:14 PDT
Comment on
attachment 95716
[details]
Proposed patch. Clearing flags on attachment: 95716 Committed
r87946
: <
http://trac.webkit.org/changeset/87946
>
WebKit Commit Bot
Comment 5
2011-06-02 13:01:18 PDT
All reviewed patches have been landed. Closing bug.
Adam Barth
Comment 6
2011-06-02 14:40:34 PDT
This broke NRWT in multiple configurations. Here's an example:
http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Linux%20%28dbg%29%281%29/builds/6754/steps/webkit_tests/logs/stdio
2011-06-02 13:35:14,919 8291 manager.py:672 DEBUG No wedged threads 2011-06-02 13:35:15,419 8291 manager.py:690 INFO Exception raised, exiting Traceback (most recent call last): File "/mnt/data/b/build/slave/Webkit_Linux__dbg__1_/build/src/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py", line 460, in <module> sys.exit(main()) File "/mnt/data/b/build/slave/Webkit_Linux__dbg__1_/build/src/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py", line 455, in main return run(port_obj, options, args) File "/mnt/data/b/build/slave/Webkit_Linux__dbg__1_/build/src/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py", line 118, in run num_unexpected_results = manager.run(result_summary) File "/mnt/data/b/build/slave/Webkit_Linux__dbg__1_/build/src/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py", line 779, in run self._run_tests(self._test_files_list, result_summary)) File "/mnt/data/b/build/slave/Webkit_Linux__dbg__1_/build/src/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py", line 678, in _run_tests assert not worker_state.worker_connection.is_alive() AssertionError program finished with exit code 1
Dirk Pranke
Comment 7
2011-06-02 19:17:46 PDT
Created
attachment 95845
[details]
Patch
Hao Zheng
Comment 8
2011-06-02 20:13:12 PDT
Thanks for your help. Sorry to break the bots.
Tony Chang
Comment 9
2011-06-03 09:53:28 PDT
Comment on
attachment 95845
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=95845&action=review
> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:681 > + _log.error('Worked %d did not exit in time. Aborting.' % worker_state.number) > + raise AssertionError('Worker %d did not exit in time' % worker_state.number)
Should we raise an Assertion here or just log the error and try to keep going?
> Tools/Scripts/webkitpy/layout_tests/layout_package/worker.py:150 > + self.kill_driver() > + self.stop_servers_with_lock()
Nit: We may want to remove the _driver and _has_http_lock checks in the other call sites (in a separate patch).
Dirk Pranke
Comment 10
2011-06-03 10:01:43 PDT
(In reply to
comment #9
)
> (From update of
attachment 95845
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=95845&action=review
> > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:681 > > + _log.error('Worked %d did not exit in time. Aborting.' % worker_state.number) > > + raise AssertionError('Worker %d did not exit in time' % worker_state.number) > > Should we raise an Assertion here or just log the error and try to keep going? >
Good question. I'm not actually sure what the end result of the run will be in either case (hung process or process that eventually exits when the slow thread/process exits). I would prefer to leave it as an assertion for now just to help identify if it actually occurs.
> > Tools/Scripts/webkitpy/layout_tests/layout_package/worker.py:150 > > + self.kill_driver() > > + self.stop_servers_with_lock() > > Nit: We may want to remove the _driver and _has_http_lock checks in the other call sites (in a separate patch).
will do.
Tony Chang
Comment 11
2011-06-03 11:29:41 PDT
Comment on
attachment 95845
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=95845&action=review
>>> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:681 >>> + raise AssertionError('Worker %d did not exit in time' % worker_state.number) >> >> Should we raise an Assertion here or just log the error and try to keep going? > > Good question. I'm not actually sure what the end result of the run will be in either case (hung process or process that eventually exits when the slow thread/process exits). I would prefer to leave it as an assertion for now just to help identify if it actually occurs.
I think this is OK, but In general I think we should try to have the bots keep going as much as possible. If the script crashes, it causes problems for the gardener who then needs to rollback and wait for the bots to cycle an extra time. The downside for you is much smaller: you just need to land the patch manually and watch the bots as they cycle to see if an error is logged. You might want to try landing this during non-busy hours or coordinate the landing with the gardener.
Dirk Pranke
Comment 12
2011-06-03 12:32:23 PDT
(In reply to
comment #11
)
> (From update of
attachment 95845
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=95845&action=review
> > >>> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:681 > >>> + raise AssertionError('Worker %d did not exit in time' % worker_state.number) > >> > >> Should we raise an Assertion here or just log the error and try to keep going? > > > > Good question. I'm not actually sure what the end result of the run will be in either case (hung process or process that eventually exits when the slow thread/process exits). I would prefer to leave it as an assertion for now just to help identify if it actually occurs. > > I think this is OK, but In general I think we should try to have the bots keep going as much as possible. If the script crashes, it causes problems for the gardener who then needs to rollback and wait for the bots to cycle an extra time. The downside for you is much smaller: you just need to land the patch manually and watch the bots as they cycle to see if an error is logged. > > You might want to try landing this during non-busy hours or coordinate the landing with the gardener.
Oh, I'm not worried about this patch in particular being broken, I'm worried about something happening down the road that causes this to fail. That said, I just tested it, and raising the assert won't by itself cause the program to exit (since there are still child processes running), so there's not much effective difference here between asserting and logging an error. I'll change this to a log for now. There's also two typos in that line that are real bugs, so I'll post another patch that actually works :)
Dirk Pranke
Comment 13
2011-06-03 12:34:24 PDT
Created
attachment 95947
[details]
Patch
Dirk Pranke
Comment 14
2011-06-03 12:44:03 PDT
Committed
r88040
: <
http://trac.webkit.org/changeset/88040
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug