WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
38885
Add infrastructure to parse SVN property changes
https://bugs.webkit.org/show_bug.cgi?id=38885
Summary
Add infrastructure to parse SVN property changes
Daniel Bates
Reported
2010-05-10 20:52:58 PDT
Towards adding executable bit support to svn-apply and svn-unapply, we need to add support for parsing SVN property change diffs. SVN property change diffs have the following forms: Property changes on: Makefile ___________________________________________________________________ Added: svn:executable + * OR Property changes on: Makefile ___________________________________________________________________ Deleted: svn:executable + * OR Property changes on: Makefile ___________________________________________________________________ Name: svn:executable + *
Attachments
Patch with unit tests
(13.80 KB, patch)
2010-05-10 20:54 PDT
,
Daniel Bates
no flags
Details
Formatted Diff
Diff
Patch with unit tests
(14.07 KB, patch)
2010-05-10 21:17 PDT
,
Daniel Bates
no flags
Details
Formatted Diff
Diff
Patch with unit tests
(17.86 KB, patch)
2010-05-10 23:20 PDT
,
Daniel Bates
no flags
Details
Formatted Diff
Diff
Patch with unit tests
(14.38 KB, patch)
2010-05-19 23:35 PDT
,
Daniel Bates
cjerdonek
: review+
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Daniel Bates
Comment 1
2010-05-10 20:54:48 PDT
Created
attachment 55654
[details]
Patch with unit tests
Daniel Bates
Comment 2
2010-05-10 21:07:21 PDT
Comment on
attachment 55654
[details]
Patch with unit tests Noticed an issue with this patch, will correct.
Daniel Bates
Comment 3
2010-05-10 21:17:45 PDT
Created
attachment 55655
[details]
Patch with unit tests
Chris Jerdonek
Comment 4
2010-05-10 21:26:20 PDT
(In reply to
comment #0
)
> SVN property change diffs have the following forms: > > Property changes on: Makefile > ___________________________________________________________________ > Added: svn:executable > + *
For completeness, can you also include enough examples so the pattern is clear when there are multiple additions or, say, additions combined with deletions, etc? In particular, does each added property get its own "Added" line, or do multiple additions get a single "Added" line?
Daniel Bates
Comment 5
2010-05-10 22:20:06 PDT
(In reply to
comment #4
)
> (In reply to
comment #0
) > > SVN property change diffs have the following forms: > > > > Property changes on: Makefile > > ___________________________________________________________________ > > Added: svn:executable > > + * > > For completeness, can you also include enough examples so the pattern is clear when there are multiple additions or, say, additions combined with deletions, etc? In particular, does each added property get its own "Added" line, or do multiple additions get a single "Added" line?
Sure, I'll add more test cases. Yes, each property gets its own Added/Deleted/Name line followed by a line of the form: +/- V, where V is the the value of the property. Take V := '*' for the svn:executable property.
Daniel Bates
Comment 6
2010-05-10 23:20:41 PDT
Created
attachment 55669
[details]
Patch with unit tests Added additional unit tests. Also, while testing found that property change lines can be separated by whitespace/empty lines, so added support for this.
WebKit Review Bot
Comment 7
2010-05-10 23:27:59 PDT
Attachment 55669
[details]
did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 WebKitTools/Scripts/webkitperl/VCSUtils_unittest/parseSvnDiffFooter.pl:140: Line contains tab character. [whitespace/tab] [5] Total errors found: 1 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Chris Jerdonek
Comment 8
2010-05-11 09:26:09 PDT
(In reply to
comment #6
)
> Created an attachment (id=55669) [details]
Some comments on the first portion of the patch:
> Index: WebKitTools/ChangeLog > + * Scripts/VCSUtils.pm:
In the ChangeLog, it is generally good to comment on the per-file changes for each file. For example, can you at least include the names of the new subroutines added to VCSUtils.pm?
> Index: WebKitTools/Scripts/VCSUtils.pm > my $gitDiffStartRegEx = qr#^diff --git (\w/)?(.+) (\w/)?([^\r\n]+)#; > my $svnDiffStartRegEx = qr#^Index: ([^\r\n]+)#; > +my $svnFooterStartRegEx = qr#^Property changes on: ([^\r\n]+)#; > +my $svnPropertyNameRegEx = qr#^(Name|Added|Deleted): ([^\r\n]+)#;
It might be good to comment these two variables. For example, it might be nice to say here what $1 is (will normally be the same as the index path). For the second, it might be good to state what $1 is and also clarify that the reg ex marks the beginning of a change to a particular property instead of to a section of additions (for example). For the second, might $svnPropertyChangeStartRegEx or $svnPropertyStartRegEx be clearer since the line marks the beginning of a property change in the "Property changes" section and when parseSvnProperty() should be called?
> +# Args: > +# $fileHandle: advanced so the last line read from the handle is either the > +# the first occurence of an empty line or the first line of the
occurrence Since this is in the args section, the $fileHandle comment should be explaining the state of the file handle going into the subroutine. Can you clarify the scoping in the phrase "first occurrence of an empty line"? This could be taken to mean the first empty line in the diff, which is obviously not right. Also, do we really want to be saying that an empty line is what marks the beginning of the footer? It seems like we can always require it to be the "Property changes" line.
> +# next header to parse. For SVN-formatted diffs, this is a line > +# beginning with "Index:". For Git, this is a line beginning > +# with "diff --git".
It looks like you need to rewrite this in terms of footer language.
> +# $line: the line last read from $fileHandle
Missing a period.
> +# Returns ($footerHashRef, $lastReadLine): > +# $footerHashRef: a hash reference representing a diff footer > +# propertyPath: the path of the target file.
It might be good to add that this should normally be the same as the "index path" for the diff.
> +sub parseSvnDiffFooter($$)
> + if (/$svnFooterStartRegEx/) { > + $propertyPath = $1;
Would something like "filePath" be clearer? propertyPath makes it seem like it is a path to a property.
> + } elsif (/$svnPropertyNameRegEx/) { > + ($propertyHashRef, $_) = parseSvnProperty($fileHandle, $_); > + $footer{executableBitDelta} = $propertyHashRef->{propertyChangeDelta} if $propertyHashRef->{name} eq "svn:executable";
Since it doesn't look like you're storing any property changes other than the executable bit one, it might be good to add a FIXME somewhere (like in or near the description of the subroutine) to add support for parsing all properties. When this method is modified to support all properties, it seems like it would make sense to store all the properties as a generic hash of key-value pairs, so that this subroutine will not need to have any special knowledge of the property representing the executable bit. Then the caller could be responsible for examining the hash and converting that information into an executableBitDelta for the diff. Maybe you can include a FIXME here suggesting at future changes to this subroutine.
> + } elsif (/^\s*$/x || $_ eq '_' x 67) { > + # Skip empty/whitespace-only lines and the divider line. > + } else { > + # Some other content, such as the binary portion of an SVN binary diff. > + last;
Since the binary lines are getting skipped over, how are they getting preserved so the binary changes can later be applied?
> + } > + > + $_ = <$fileHandle>; # Not defined if end-of-file reached. > + > + last if (!defined($_) || /$svnDiffStartRegEx/ || /$gitDiffStartRegEx/);
Are we supporting patches with Git and SVN diffs mixed together? It doesn't seem like this SVN-specific method needs to know about Git diffs.
> + } > + > + if (!$propertyPath) { > + die("First line of SVN diff does not begin with \"Property changes on \": \"$_\".");
It seems like the logic above doesn't require the first line to be the "Property changes" line. It seems like it only requires that some line be a "Property changes" line. Is there any reason not to check the first line at the beginning of the method?
Chris Jerdonek
Comment 9
2010-05-11 09:48:38 PDT
(In reply to
comment #5
)
> Sure, I'll add more test cases. > > Yes, each property gets its own Added/Deleted/Name line followed by a line of the form: +/- V, where V is the the value of the property. Take V := '*' for the svn:executable property.
Thanks for this extra information and for the added test cases. Do you know when "Name" is used versus Added/Deleted? It looks like "Modified" is also a possibility, and for there to be both a + and - line after the first line. For example, I tried modifying the svn:ignore property for a directory and got this: Property changes on: WebKitTools/Scripts/webkitpy ___________________________________________________________________ Modified: svn:ignore - commands init irc patch steps *.pyc autoinstall.cache.d + commands init patch steps *.pyc autoinstall.cache.d test-add
Daniel Bates
Comment 10
2010-05-11 15:16:01 PDT
(In reply to
comment #8
)
> (In reply to
comment #6
) > > Created an attachment (id=55669) [details] [details] > > Some comments on the first portion of the patch: > > > Index: WebKitTools/ChangeLog > > + * Scripts/VCSUtils.pm: > > In the ChangeLog, it is generally good to comment on the per-file changes > for each file. For example, can you at least include the names of the new > subroutines added to VCSUtils.pm?
Yes, I agree. Will change.
> > > Index: WebKitTools/Scripts/VCSUtils.pm > > my $gitDiffStartRegEx = qr#^diff --git (\w/)?(.+) (\w/)?([^\r\n]+)#; > > my $svnDiffStartRegEx = qr#^Index: ([^\r\n]+)#; > > +my $svnFooterStartRegEx = qr#^Property changes on: ([^\r\n]+)#; > > +my $svnPropertyNameRegEx = qr#^(Name|Added|Deleted): ([^\r\n]+)#; > > It might be good to comment these two variables. For example, it might be > nice to say here what $1 is (will normally be the same as the index path).
Will add comment.
> For the second, it might be good to state what $1 is and also clarify that the > reg ex marks the beginning of a change to a particular property instead of > to a section of additions (for example).
Will add comment.
> > For the second, might $svnPropertyChangeStartRegEx or $svnPropertyStartRegEx > be clearer since the line marks the beginning of a property change in the > "Property changes" section and when parseSvnProperty() should be called?
Yeah, I was not sure what to call this. I'll go with $svnPropertyChangeStartRegEx.
> > > +# Args: > > +# $fileHandle: advanced so the last line read from the handle is either the > > +# the first occurence of an empty line or the first line of the > > occurrence > > Since this is in the args section, the $fileHandle comment should be explaining > the state of the file handle going into the subroutine. Can you clarify the scoping > in the phrase "first occurrence of an empty line"? This could be taken to mean > the first empty line in the diff, which is obviously not right. Also, do we really > want to be saying that an empty line is what marks the beginning of the footer? > It seems like we can always require it to be the "Property changes" line.
Oops, this used to describe the state of the fileHandle after the subroutine returns (in the old patch: <
https://bugs.webkit.org/attachment.cgi?id=55655
>). The empty line comment no longer applies to <
https://bugs.webkit.org/attachment.cgi?id=55669
>. Will update comment.
> > > +# next header to parse. For SVN-formatted diffs, this is a line > > +# beginning with "Index:". For Git, this is a line beginning > > +# with "diff --git". > > It looks like you need to rewrite this in terms of footer language.
See above response. Will update comment.
> > > +# $line: the line last read from $fileHandle > > Missing a period.
Will fix.
> > > +# Returns ($footerHashRef, $lastReadLine): > > +# $footerHashRef: a hash reference representing a diff footer > > +# propertyPath: the path of the target file. > > It might be good to add that this should normally be the same as the > "index path" for the diff.
Will update comment to reflect this.
> > > +sub parseSvnDiffFooter($$) > > > + if (/$svnFooterStartRegEx/) { > > + $propertyPath = $1; > > Would something like "filePath" be clearer? propertyPath makes it seem like > it is a path to a property. > > > + } elsif (/$svnPropertyNameRegEx/) { > > + ($propertyHashRef, $_) = parseSvnProperty($fileHandle, $_); > > + $footer{executableBitDelta} = $propertyHashRef->{propertyChangeDelta} if $propertyHashRef->{name} eq "svn:executable"; > > Since it doesn't look like you're storing any property changes other than > the executable bit one, it might be good to add a FIXME somewhere (like in > or near the description of the subroutine) to add support for parsing > all properties.
Will do.
> > When this method is modified to support all properties, it seems like it > would make sense to store all the properties as a generic hash of key-value > pairs, so that this subroutine will not need to have any special knowledge of > the property representing the executable bit. Then the caller could be > responsible for examining the hash and converting that information into an > executableBitDelta for the diff. Maybe you can include a FIXME here > suggesting at future changes to this subroutine.
Yes, this was my original thought as well. Will add a FIXME for now.
> > > + } elsif (/^\s*$/x || $_ eq '_' x 67) { > > + # Skip empty/whitespace-only lines and the divider line. > > + } else { > > + # Some other content, such as the binary portion of an SVN binary diff. > > + last; > > Since the binary lines are getting skipped over, how are they getting preserved > so the binary changes can later be applied?
The binary lines should not be skipped over. Disregarding the case where the caller of parseSvnFooterDiff does not pass the start of an SVN property change entry (i.e. starts with "Property changes on ..."), by the last "else" clause we break out of the loop and will return this line so that the caller (say, parseDiff() can continue processing the diff). Therefore, it will be the caller's responsibility to to preserve the binary patch contents. See test cases in parseSvnDiffFooter.pl: "binary file (diff snippet) with executable bit set", and "binary file (diff snippet) with executable bit set (with empty line)".
> > > + } > > + > > + $_ = <$fileHandle>; # Not defined if end-of-file reached. > > + > > + last if (!defined($_) || /$svnDiffStartRegEx/ || /$gitDiffStartRegEx/); > > Are we supporting patches with Git and SVN diffs mixed together? It doesn't > seem like this SVN-specific method needs to know about Git diffs.
I agree, we shouldn't have mixing. Will change.
> > > + } > > + > > + if (!$propertyPath) { > > + die("First line of SVN diff does not begin with \"Property changes on \": \"$_\"."); > > It seems like the logic above doesn't require the first line to be the > "Property changes" line. It seems like it only requires that some line > be a "Property changes" line. Is there any reason not to check the first > line at the beginning of the method?
In the process of refactoring this, I wound up moving it into the loop. I believe we can factor this out of the loop and do the testing at the beginning of the method.
Daniel Bates
Comment 11
2010-05-11 15:19:03 PDT
(In reply to
comment #9
)
> (In reply to
comment #5
) > > Sure, I'll add more test cases. > > > > Yes, each property gets its own Added/Deleted/Name line followed by a line of the form: +/- V, where V is the the value of the property. Take V := '*' for the svn:executable property. > > Thanks for this extra information and for the added test cases. Do you know when "Name" is used versus Added/Deleted?
For some versions of SVN < 1.6.5, such as 1.4.4 (I did not determine the exact SVN version that changed the format from "Name" to "Added", "Deleted", "Modified".
> > It looks like "Modified" is also a possibility, and for there to be both a + and - line after the first line. For example, I tried modifying the svn:ignore property for a directory and got this: > > Property changes on: WebKitTools/Scripts/webkitpy > ___________________________________________________________________ > Modified: svn:ignore > - commands > init > irc > patch > steps > *.pyc > autoinstall.cache.d > > + commands > init > patch > steps > *.pyc > autoinstall.cache.d > test-add
Will add "Modified".
Chris Jerdonek
Comment 12
2010-05-11 16:13:28 PDT
(In reply to
comment #10
) Thanks a lot for your responses, Dan, and for incorporating that feedback.
> > > + } else { > > > + # Some other content, such as the binary portion of an SVN binary diff. > > > + last; > > > > Since the binary lines are getting skipped over, how are they getting preserved > > so the binary changes can later be applied? > > The binary lines should not be skipped over. Disregarding the case where the
Sorry for the confusion on this point. I had misread that "last" to be "next".
Daniel Bates
Comment 13
2010-05-19 23:35:32 PDT
Created
attachment 56567
[details]
Patch with unit tests
Chris Jerdonek
Comment 14
2010-05-20 05:30:16 PDT
Comment on
attachment 56567
[details]
Patch with unit tests Thanks for the revised patch. Some suggestions (mostly about comments) to consider before landing.
> +# Parse the next SVN diff footer from the given file handle, and advance > +# the handle so the last line read is the first line after the footer. > +# > +# This subroutine dies if given leading junk.
I would just say "Parse an SVN diff footer...." I would also not use the phrase "leading junk" here since it is usually used in the context of leading junk at the beginning of a patch (e.g. the e-mail stuff that appears at the beginning of git-format-patch) rather than in the middle. It might be nice to add a comment here that for the special case of an SVN binary diff, there can be binary contents after the footer (so technically speaking, the "Property changes" section is not the full footer). Should we be calling this parseSvnPropertyChanges() since it's not really always the footer?
> + if (defined($_) && /^\Q$separator\E[\r\n]+$/) {
Do you need to be using \Q/\E here? If so, I would add a comment why since it's a bit more obscure.
> + # FIXME: We should expand this to support other SVN properties. > + # Notice, we keep processing until we hit end-of-file or some
I would add a blank line after FIXME line so it's clearer that the second and subsequent lines is not part of the FIXME. Also, did you want to hint at possible implementation for other SVN properties -- e.g. by returning a hash of property key-values that represents all properties? The line you wrote makes it seem like the FIXME is to make the function explicitly more aware of other properties. In some sense, returning a hash makes the function less aware of the specific properties since the special handling of svn:executable will be going in the calling code with the implementation we've discussed (returning a hash).
> + # FIXME: Should the diff specify multiple values for svn:executable > + # (hence be malformed) then we set the value to the last > + # svn:executable property. We may want to consider some > + # better error handling.
I probably wouldn't bother with this FIXME. We shouldn't need to be validating every aspect of a patch. And "last wins" is a valid logic to follow.
> +#### > +# Simple test cases > +##
This looks good. I like how you start the text closer in. :)
> +{ > + # New test > + diffName => "add svn:executable, followed by empty line and start of next property diff", > + inputText => <<'END', > +Property changes on: FileA > +___________________________________________________________________ > +Added: svn:executable > + + * > + > +Property changes on: Makefile.shared > +END
Can this case ever happen? Is this what happens if a property change is the only change to a file?
> +#### > +# Property value followed by empty line and start of binary patch > +##
I would say "binary contents" as binary patch makes it seem like it's the start of a new diff.
> +Added: documentation > + + A sentence.
This is not a complete sentence. JK. :)
Daniel Bates
Comment 15
2010-05-21 21:27:50 PDT
(In reply to
comment #14
)
> (From update of
attachment 56567
[details]
) > Thanks for the revised patch. Some suggestions (mostly about comments) to consider before landing. > > > +# Parse the next SVN diff footer from the given file handle, and advance > > +# the handle so the last line read is the first line after the footer. > > +# > > +# This subroutine dies if given leading junk. > > I would just say "Parse an SVN diff footer...." I would also not use the > phrase "leading junk" here since it is usually used in the context of > leading junk at the beginning of a patch (e.g. the e-mail stuff that appears > at the beginning of git-format-patch) rather than in the middle.
Will change.
> It might be nice to add a comment here that for the special case of an > SVN binary diff, there can be binary contents after the footer (so > technically speaking, the "Property changes" section is not the full > footer). Should we be calling this parseSvnPropertyChanges() since it's > not really always the footer?
I can add a comment about the binary diff. With regards to the name, what are your thoughts on parseSvnDiffPropertyChanges? This would follow our naming of parseDiff, parseDiffHeader, parseSvnDiffHeader, et cetera.
> > > + if (defined($_) && /^\Q$separator\E[\r\n]+$/) { > > Do you need to be using \Q/\E here? If so, I would add a comment why since > it's a bit more obscure.
I'll remove the \Q and \E.
> > > + # FIXME: We should expand this to support other SVN properties. > > + # Notice, we keep processing until we hit end-of-file or some > > I would add a blank line after FIXME line so it's clearer that the second > and subsequent lines is not part of the FIXME.
Will change.
> Also, did you want to hint at possible implementation for other SVN > properties -- e.g. by returning a hash of property key-values that > represents all properties? The line you wrote makes it seem like the > FIXME is to make the function explicitly more aware of other properties. > In some sense, returning a hash makes the function less aware of the > specific properties since the special handling of svn:executable will be > going in the calling code with the implementation we've discussed > (returning a hash).
Sure, I'll elaborate a bit further in the FIXME.
> > > + # FIXME: Should the diff specify multiple values for svn:executable > > + # (hence be malformed) then we set the value to the last > > + # svn:executable property. We may want to consider some > > + # better error handling. > > I probably wouldn't bother with this FIXME. We shouldn't need to be validating > every aspect of a patch. And "last wins" is a valid logic to follow.
Will remove.
> > > +#### > > +# Simple test cases > > +## > > This looks good. I like how you start the text closer in. :)
:-)
> > > +{ > > + # New test > > + diffName => "add svn:executable, followed by empty line and start of next property diff", > > + inputText => <<'END', > > +Property changes on: FileA > > +___________________________________________________________________ > > +Added: svn:executable > > + + * > > + > > +Property changes on: Makefile.shared > > +END > > Can this case ever happen? Is this what happens if a property change > is the only change to a file?
Yes, it can happen for the reason you stated. That is, a property change may be the only change to a file.
> > +#### > > +# Property value followed by empty line and start of binary patch > > +## > > I would say "binary contents" as binary patch makes it seem like it's the > start of a new diff.
Will change.
> > > +Added: documentation > > + + A sentence. > > This is not a complete sentence. JK. :)
Will change.
Chris Jerdonek
Comment 16
2010-05-21 22:00:26 PDT
(In reply to
comment #15
)
> > It might be nice to add a comment here that for the special case of an > > SVN binary diff, there can be binary contents after the footer (so > > technically speaking, the "Property changes" section is not the full > > footer). Should we be calling this parseSvnPropertyChanges() since it's > > not really always the footer? > > I can add a comment about the binary diff. With regards to the name, what are your thoughts on parseSvnDiffPropertyChanges? This would follow our naming of parseDiff, parseDiffHeader, parseSvnDiffHeader, et cetera.
In that case, perhaps parseSvnDiffProperties() to keep it a tad shorter?
> > > +Added: documentation > > > + + A sentence. > > > > This is not a complete sentence. JK. :) > > Will change.
That was a joke, so no need (JK = just kidding). But you can if you want.
Daniel Bates
Comment 17
2010-05-22 14:40:55 PDT
Committed
r60015
: <
http://trac.webkit.org/changeset/60015
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug