Bug 214257 - Delete old ews script DumpRenderTreeWatchDog.py
Summary: Delete old ews script DumpRenderTreeWatchDog.py
Status: ASSIGNED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Aakash Jain
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-07-13 08:18 PDT by Aakash Jain
Modified: 2020-07-15 14:04 PDT (History)
6 users (show)

See Also:


Attachments
Patch (3.16 KB, patch)
2020-07-13 08:20 PDT, Aakash Jain
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Aakash Jain 2020-07-13 08:18:27 PDT
We should delete old ews script DumpRenderTreeWatchDog.py. This script was used on Windows bots in old EWS to terminate hanging DRT processes. In new EWS, buildbot terminate the build automatically after 20 minute of inactivity. Further, in new EWS builds, we also run kill-old-processes which kills DumpRenderTree.exe and imagediff.exe

Therefore, we don't need this script anymore and it should be deleted.
Comment 1 Radar WebKit Bug Importer 2020-07-13 08:19:48 PDT
<rdar://problem/65477036>
Comment 2 Aakash Jain 2020-07-13 08:20:02 PDT
Created attachment 404148 [details]
Patch
Comment 3 Aakash Jain 2020-07-13 08:29:51 PDT
This script (dumprendertreewatchdog.py) killed "ImageDiff.exe", while kill-old-processes kills "imagediff.exe" (https://trac.webkit.org/browser/webkit/trunk/Tools/BuildSlaveSupport/kill-old-processes#L45). Not sure if the capitalization matters.
Comment 4 Per Arne Vollan 2020-07-13 09:14:17 PDT
(In reply to Aakash Jain from comment #3)
> This script (dumprendertreewatchdog.py) killed "ImageDiff.exe", while
> kill-old-processes kills "imagediff.exe"
> (https://trac.webkit.org/browser/webkit/trunk/Tools/BuildSlaveSupport/kill-
> old-processes#L45). Not sure if the capitalization matters.

I think this script is still in use on the EWS bots, or am I mistaken?
Comment 5 Aakash Jain 2020-07-13 09:19:55 PDT
(In reply to Per Arne Vollan from comment #4)
> I think this script is still in use on the EWS bots, or am I mistaken?
Some EWS bots were using it by mistake, I just stopped all that. See <rdar://problem/65142015>.

Is there a reason that this script is required on EWS bots? Buildbot already terminates any hanging build, and we use kill-old-processes to kill DumpRenderTree.exe in the beginning of every build.
Comment 6 Per Arne Vollan 2020-07-13 09:27:36 PDT
(In reply to Aakash Jain from comment #5)
> (In reply to Per Arne Vollan from comment #4)
> > I think this script is still in use on the EWS bots, or am I mistaken?
> Some EWS bots were using it by mistake, I just stopped all that. See
> <rdar://problem/65142015>.
> 
> Is there a reason that this script is required on EWS bots? Buildbot already
> terminates any hanging build, and we use kill-old-processes to kill
> DumpRenderTree.exe in the beginning of every build.

Yes, I believe this is still required. The script will kill hanging processes during the test run, making sure the test run completes. I believe we have had cases where the test run never finishes, which is fixed by having this script running. I don't believe the script kill-old-processes supports this.
Comment 7 Darin Adler 2020-07-13 09:32:50 PDT
Then it seems like a bad idea to stop running it, and a bad idea to delete it.
Comment 8 Jonathan Bedard 2020-07-14 17:25:45 PDT
If we think such a script is needed, why don't we also need it on build.webkit.org?

One of the purposes of the push to move EWS to buildbot was so that EWS was running with the same configuration as trunk testing. I'm pretty sure trunk deals with the hanging problem by leaning on run-webkit-tests to take care of it and, as a last resort, buildbot timing out.
Comment 9 Alexey Proskuryakov 2020-07-14 22:56:43 PDT
I think that the new behavior is:

1. Buildbot will terminate unresponsive steps.
2. kill-old-processes will be executed before the next test to clean up.

So we should be fine? In any case, this is dead code now that Aakash removed the scheduled task from all bots where we still had it.
Comment 10 Per Arne Vollan 2020-07-15 13:04:12 PDT
(In reply to Jonathan Bedard from comment #8)
> If we think such a script is needed, why don't we also need it on
> build.webkit.org?
> 
> One of the purposes of the push to move EWS to buildbot was so that EWS was
> running with the same configuration as trunk testing. I'm pretty sure trunk
> deals with the hanging problem by leaning on run-webkit-tests to take care
> of it and, as a last resort, buildbot timing out.

Yes, we should also have it on build.webkit.org (or a similar feature), since the same issue will be present there. However, it is more important on EWS, since without it, Windows EWS will often start lagging behind, and we loose test coverage.
Comment 11 Per Arne Vollan 2020-07-15 13:22:19 PDT
(In reply to Alexey Proskuryakov from comment #9)
> I think that the new behavior is:
> 
> 1. Buildbot will terminate unresponsive steps.
> 2. kill-old-processes will be executed before the next test to clean up.
> 
> So we should be fine? In any case, this is dead code now that Aakash removed
> the scheduled task from all bots where we still had it.

The script is able to unblock the waiting parent process of a stuck DumpRenderTree/ImageDiff while running the layout tests, making the test run succeed, although with some test failures. It seems buildbot will terminate the whole step, which means the test run would fail, or am I mistaken? Anyway, I see that this is dead code now after moving to buildbot. If we still need this feature, maybe it should be part of the run-webkit-tests script? Also, I haven't observed hangs after the buildbot move, which makes me think that this feature is not super critical at this point, since the Windows EWS bots are keeping up with the queue. Great job!
Comment 12 Jonathan Bedard 2020-07-15 14:04:04 PDT
(In reply to Per Arne Vollan from comment #11)
> (In reply to Alexey Proskuryakov from comment #9)
> > I think that the new behavior is:
> > 
> > 1. Buildbot will terminate unresponsive steps.
> > 2. kill-old-processes will be executed before the next test to clean up.
> > 
> > So we should be fine? In any case, this is dead code now that Aakash removed
> > the scheduled task from all bots where we still had it.
> 
> The script is able to unblock the waiting parent process of a stuck
> DumpRenderTree/ImageDiff while running the layout tests, making the test run
> succeed, although with some test failures. It seems buildbot will terminate
> the whole step, which means the test run would fail, or am I mistaken?
> Anyway, I see that this is dead code now after moving to buildbot. If we
> still need this feature, maybe it should be part of the run-webkit-tests
> script? Also, I haven't observed hangs after the buildbot move, which makes
> me think that this feature is not super critical at this point, since the
> Windows EWS bots are keeping up with the queue. Great job!

Yes, I would expect such a script to be part of run-webkit-tests, if we needed it. Seems like it would basically turn an error that broke the entire test suite into one that just broke a few tests. Although such behavior would definitely cause flakey test failures.