Bug 217972

Summary: Make report-non-inclusive-language ignore autoinstalled directories and certain file types
Product: WebKit Reporter: Aakash Jain <aakash_jain>
Component: Tools / TestsAssignee: Aakash Jain <aakash_jain>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, bdakin, darin, jbedard, ryanhaddad, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: Other   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=215156
https://bugs.webkit.org/show_bug.cgi?id=217993
https://bugs.webkit.org/show_bug.cgi?id=224979
Bug Depends on:    
Bug Blocks: 213092    
Attachments:
Description Flags
Patch
bdakin: review+
Patch
none
Patch none

Description Aakash Jain 2020-10-20 10:42:09 PDT
Make report-non-inclusive-language ignore following directories and files:

1) 'autoinstalled': The source code of packages inside autoinstalled directories (e.g.: Scripts/libraries/autoinstalled) are not part of our repository and is not controlled by us. So report-non-inclusive-language shouldn't count them. Currently the count is really skewed because of counting such autoinstalled directories.

2) Files ending with .swp: usually these are files created for files opened in other tabs).

3) Files ending with .log: usually these are not part of source code, and are simply logs. For e.g.: twistd.log while running buildbot instance locally.
Comment 1 Aakash Jain 2020-10-20 10:47:41 PDT
Created attachment 411887 [details]
Patch
Comment 2 Aakash Jain 2020-10-20 11:30:02 PDT
It should also ignore files ending in .pyc
Comment 3 Aakash Jain 2020-10-20 11:31:13 PDT
Created attachment 411892 [details]
Patch
Comment 4 Beth Dakin 2020-10-20 11:32:00 PDT
Comment on attachment 411887 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=411887&action=review

> Tools/Scripts/report-non-inclusive-language:47
> +IGNORE_FILES_ENDING_WITH = ('.order', '.xcuserstate', '.swp', '.log')

I saw your comment about adding .pyc here, but R+ for now anyway. This is great!
Comment 5 EWS 2020-10-20 12:13:27 PDT
Committed r268752: <https://trac.webkit.org/changeset/268752>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 411892 [details].
Comment 6 Radar WebKit Bug Importer 2020-10-20 12:14:20 PDT
<rdar://problem/70496351>
Comment 7 Darin Adler 2020-10-20 13:04:38 PDT
Comment on attachment 411892 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=411892&action=review

> Tools/Scripts/report-non-inclusive-language:61
> +    if any(directory in prefix for directory in IGNORE_DIRECTORIES):

This is IGNORE_DIRECTORIES or IGNORE_DIRECTORIES_CONTAINING?

It looks like we are using substring matching for directory names, but we want to know if this is exactly a directory name. Ideally this should not match "notautoinstalled".

I’d like a more correct implementation to be future-looking although for now none of those three strings seems like a problem.

I believe that some others at Apple have been using this script, so it’s good if it works well even outside WebKit, which is why it’s good to be careful about such things.
Comment 8 Aakash Jain 2020-10-20 13:14:43 PDT
(In reply to Darin Adler from comment #7)
> It looks like we are using substring matching for directory names, but we want to know if this is exactly a directory name. Ideally this should not match "notautoinstalled".
Good point. Should we do the exact match for the directories. That seems like the right thing to do. Proposed patch would be something like:

-    if any(directory in prefix for directory in IGNORE_DIRECTORIES):
+    if any(directory==prefix for directory in IGNORE_DIRECTORIES):
Comment 9 Aakash Jain 2020-10-20 13:18:07 PDT
(In reply to Aakash Jain from comment #8)
> (In reply to Darin Adler from comment #7)
> > It looks like we are using substring matching for directory names, but we want to know if this is exactly a directory name. Ideally this should not match "notautoinstalled".
> Good point. Should we do the exact match for the directories. That seems
> like the right thing to do. Proposed patch would be something like:
> 
> -    if any(directory in prefix for directory in IGNORE_DIRECTORIES):
> +    if any(directory==prefix for directory in IGNORE_DIRECTORIES):
Ignore my above comment, just noticed that prefix contains the complete path.
Comment 10 Aakash Jain 2020-10-20 13:24:36 PDT
Reopening to attach minor follow-up patch.
Comment 11 Aakash Jain 2020-10-20 13:25:58 PDT
Created attachment 411908 [details]
Patch
Comment 12 Darin Adler 2020-10-20 14:07:37 PDT
Comment on attachment 411908 [details]
Patch

For now, this is OK.
Comment 13 Darin Adler 2020-10-20 14:08:42 PDT
(In reply to Aakash Jain from comment #8)
> (In reply to Darin Adler from comment #7)
> > It looks like we are using substring matching for directory names, but we want to know if this is exactly a directory name. Ideally this should not match "notautoinstalled".
> Good point. Should we do the exact match for the directories. That seems
> like the right thing to do. Proposed patch would be something like:
> 
> -    if any(directory in prefix for directory in IGNORE_DIRECTORIES):
> +    if any(directory==prefix for directory in IGNORE_DIRECTORIES):

Maybe:

    if any(directory.endsWith("/" + prefix) for directory in IGNORE_DIRECTORIES):

Would that work?
Comment 14 EWS 2020-10-20 15:01:10 PDT
Committed r268763: <https://trac.webkit.org/changeset/268763>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 411908 [details].
Comment 15 Aakash Jain 2020-10-20 15:39:07 PDT
(In reply to Darin Adler from comment #13)
> Maybe:
> 
>     if any(directory.endsWith("/" + prefix) for directory in
> IGNORE_DIRECTORIES):
> 
> Would that work?
That doesn't seem to work. Uploaded patch in https://bugs.webkit.org/show_bug.cgi?id=217993 which seems to work fine.