Bug 239412

Summary: Drop String::truncate() and use String::left() instead
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: Web Template FrameworkAssignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, benjamin, changseok, cmarcelo, darin, eric.carlson, esprehn+autocc, ews-watchlist, ggaren, glenn, gyuyoung.kim, japhet, jer.noble, kangil.han, mifenton, philipj, sam, sergio, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch none

Description Chris Dumez 2022-04-15 17:13:24 PDT
String::truncate() and String::left() have identical behavior. The only difference is that truncate() modifies the String in place (which is a bit confusing), while left() returns a new String, without modifying the original.
To simplify our API, I am dropping String::truncate().
Comment 1 Chris Dumez 2022-04-15 17:16:44 PDT
Created attachment 457737 [details]
Patch
Comment 2 Darin Adler 2022-04-16 14:11:05 PDT
Comment on attachment 457737 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=457737&action=review

> Source/WebCore/dom/FragmentDirectiveParser.cpp:105
> +            firstToken = firstToken.left(tokens.first().length() - 2);
>              
>              if (auto prefix = WTF::URLParser::formURLDecode(tokens.takeFirst()))

It’s unnecessary to truncate in place. Instead it can be more like this:

    auto takenFirstToken = tokens.takeFirst();
    if (auto prefix = URLParser::formURLDecode(StringView(takenFirstToken).left(firstToken.length() - 2)))

Notice that this means we don’t have to call String::left, which allocates memory!

> Source/WebCore/html/HTMLImageElement.cpp:216
>              String type = typeAttribute.string();
> -            type.truncate(type.find(';'));
> -            type = stripLeadingAndTrailingHTMLSpaces(type);
> +            type = stripLeadingAndTrailingHTMLSpaces(type.left(type.find(';')));
>              if (!type.isEmpty() && !MIMETypeRegistry::isSupportedImageVideoOrSVGMIMEType(type))
>                  continue;

This allocates an intermediate string after finding the ";", but before stripping the spaces, so it can allocate two strings. We can do this without double allocation.

    StringView typeView = typeAttribute;
    typeView = typeView.left(typeView.find(';)).stripLeadingAndTrailingMatchedCharacters(isHTMLSpace<UChar>);
    if (!typeView.isEmpty()) {
        auto typeString = typeView == typeAttribute ? typeAttribute.string() : typeView.toString();
        if (!MIMETypeRegistry::isSupportedImageVideoOrSVGMIMEType(typeString))
            continue;
    }

But unfortunately it’s longer code. Not sure how you feel about this.

> Source/WebCore/html/TextFieldInputType.cpp:627
> +    eventText = eventText.left(textLength).replace("\r\n", " ").replace('\r', ' ').replace('\n', ' ');

New code is no worse, but so inefficient to make so many buffers! This operation could be written to use a maximum of one buffer by working on a copy of the string in place, but instead way this code works, we can have up to 3 intermediate String objects plus the final String object passed into limitText, which might create one more. A worst case of 5 malloc/free pairs instead of 1. Probably doesn’t matter, though.

> Source/WebCore/loader/FrameLoader.cpp:760
> +            headerContentLanguage = stripLeadingAndTrailingHTMLSpaces(headerContentLanguage.left(headerContentLanguage.find(','))); // notFound == -1 == don't truncate.

This can allocate two strings instead of 1. This is the exact same operation as HTMLImageElement::bestFitSourceFromPictureElement, but with "," instead of ";", with the same opportunity for a better algorithm.

> Source/WebCore/platform/graphics/avfoundation/objc/AVAssetTrackUtilities.mm:49
> +            auto codecStringView = StringView(codecString).left(codecString.find('.'));
> +            auto codecIdentifier = FourCC::fromString(codecStringView.left(4));

How about a one liner:

    if (auto identifier = FourCC::fromString(StringView(codecString).left(codecString.find('.')).left(4)))

> Source/WebKitLegacy/win/WebDownloadCFNet.cpp:195
> +        m_destination = StringView(m_bundlePath).left(m_destination.length() - DownloadBundle::fileExtension().length()).toString();

I’m not sure I understand why isolatedCopy is needed. If it’s not, then we should use String::left instead of StringView::left.

If it is needed, then not thrilled that it’s now hidden in the use of StringView::toString. It good for efficiency that we don’t make an extra copy, though, if we do need it!

> Tools/WebKitTestRunner/InjectedBundle/InjectedBundlePage.cpp:1342
> +    messageString = messageString.left(messageString.find(UChar(0)));

Could use nullCharacter from CharacterNames.h instead of UChar(0).
Comment 3 Chris Dumez 2022-04-16 15:36:52 PDT
(In reply to Darin Adler from comment #2)
> Comment on attachment 457737 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=457737&action=review
> 
> > Source/WebCore/dom/FragmentDirectiveParser.cpp:105
> > +            firstToken = firstToken.left(tokens.first().length() - 2);
> >              
> >              if (auto prefix = WTF::URLParser::formURLDecode(tokens.takeFirst()))
> 
> It’s unnecessary to truncate in place. Instead it can be more like this:
> 
>     auto takenFirstToken = tokens.takeFirst();
>     if (auto prefix =
> URLParser::formURLDecode(StringView(takenFirstToken).left(firstToken.
> length() - 2)))
> 
> Notice that this means we don’t have to call String::left, which allocates
> memory!
> 
> > Source/WebCore/html/HTMLImageElement.cpp:216
> >              String type = typeAttribute.string();
> > -            type.truncate(type.find(';'));
> > -            type = stripLeadingAndTrailingHTMLSpaces(type);
> > +            type = stripLeadingAndTrailingHTMLSpaces(type.left(type.find(';')));
> >              if (!type.isEmpty() && !MIMETypeRegistry::isSupportedImageVideoOrSVGMIMEType(type))
> >                  continue;
> 
> This allocates an intermediate string after finding the ";", but before
> stripping the spaces, so it can allocate two strings. We can do this without
> double allocation.
> 
>     StringView typeView = typeAttribute;
>     typeView =
> typeView.left(typeView.find(';)).
> stripLeadingAndTrailingMatchedCharacters(isHTMLSpace<UChar>);
>     if (!typeView.isEmpty()) {
>         auto typeString = typeView == typeAttribute ? typeAttribute.string()
> : typeView.toString();
>         if
> (!MIMETypeRegistry::isSupportedImageVideoOrSVGMIMEType(typeString))
>             continue;
>     }
> 
> But unfortunately it’s longer code. Not sure how you feel about this.
> 
> > Source/WebCore/html/TextFieldInputType.cpp:627
> > +    eventText = eventText.left(textLength).replace("\r\n", " ").replace('\r', ' ').replace('\n', ' ');
> 
> New code is no worse, but so inefficient to make so many buffers! This
> operation could be written to use a maximum of one buffer by working on a
> copy of the string in place, but instead way this code works, we can have up
> to 3 intermediate String objects plus the final String object passed into
> limitText, which might create one more. A worst case of 5 malloc/free pairs
> instead of 1. Probably doesn’t matter, though.
> 
> > Source/WebCore/loader/FrameLoader.cpp:760
> > +            headerContentLanguage = stripLeadingAndTrailingHTMLSpaces(headerContentLanguage.left(headerContentLanguage.find(','))); // notFound == -1 == don't truncate.
> 
> This can allocate two strings instead of 1. This is the exact same operation
> as HTMLImageElement::bestFitSourceFromPictureElement, but with "," instead
> of ";", with the same opportunity for a better algorithm.
> 
> > Source/WebCore/platform/graphics/avfoundation/objc/AVAssetTrackUtilities.mm:49
> > +            auto codecStringView = StringView(codecString).left(codecString.find('.'));
> > +            auto codecIdentifier = FourCC::fromString(codecStringView.left(4));
> 
> How about a one liner:
> 
>     if (auto identifier =
> FourCC::fromString(StringView(codecString).left(codecString.find('.')).
> left(4)))
> 
> > Source/WebKitLegacy/win/WebDownloadCFNet.cpp:195
> > +        m_destination = StringView(m_bundlePath).left(m_destination.length() - DownloadBundle::fileExtension().length()).toString();
> 
> I’m not sure I understand why isolatedCopy is needed. If it’s not, then we
> should use String::left instead of StringView::left.
> 
> If it is needed, then not thrilled that it’s now hidden in the use of
> StringView::toString. It good for efficiency that we don’t make an extra
> copy, though, if we do need it!

I only dropped it because StringView().toString() was going to create a fresh (thus isolated) String anyway. That said, you are right that it is a bit obscure. I am going to use StringView().toString().isolatedCopy() for clarity. There should be no cost in perf anyway because isolatedCopy() will get optimized out since this is an r-value reference with a refcount of 1.

> 
> > Tools/WebKitTestRunner/InjectedBundle/InjectedBundlePage.cpp:1342
> > +    messageString = messageString.left(messageString.find(UChar(0)));
> 
> Could use nullCharacter from CharacterNames.h instead of UChar(0).
Comment 4 Chris Dumez 2022-04-16 15:43:11 PDT
Created attachment 457757 [details]
Patch
Comment 5 EWS 2022-04-16 20:57:43 PDT
Committed r292945 (249710@main): <https://commits.webkit.org/249710@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 457757 [details].
Comment 6 Radar WebKit Bug Importer 2022-04-16 20:58:14 PDT
<rdar://problem/91856687>