| Summary: | Leverage StringView in more places to avoid some String allocations | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | WebKit | Reporter: | Chris Dumez <cdumez> | ||||||
| Component: | WebCore Misc. | Assignee: | Chris Dumez <cdumez> | ||||||
| Status: | RESOLVED FIXED | ||||||||
| Severity: | Normal | CC: | achristensen, benjamin, calvaris, changseok, cmarcelo, darin, dino, eric.carlson, esprehn+autocc, ews-watchlist, ggaren, glenn, gyuyoung.kim, hi, hta, japhet, jer.noble, joepeck, kangil.han, keith_miller, kondapallykalyan, macpherson, mark.lam, menard, msaboff, pangle, pdr, philipj, saam, sam, sergio, tommyw, tzagallo, webkit-bug-importer | ||||||
| Priority: | P2 | Keywords: | InRadar | ||||||
| Version: | WebKit Nightly Build | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Attachments: |
|
||||||||
|
Description
Chris Dumez
2022-04-14 15:41:18 PDT
Created attachment 457653 [details]
Patch
Comment on attachment 457653 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457653&action=review > Source/JavaScriptCore/inspector/ContentSearchUtilities.cpp:94 > + result.append(std::pair { lineNumber, line.toString() }); I guess we do need the explicit std::pair; I would omit it if we could. > Source/WebCore/Modules/fetch/FetchBodyConsumer.cpp:167 > + contentType = StringView(header).substring(contentTypeBegin + contentTypePrefixLength, contentTypeEnd - contentTypeBegin - contentTypePrefixLength).stripLeadingAndTrailingMatchedCharacters(isHTTPSpace).toString(); Is there a very-common case where the header string happens to have nothing that need to be trimmed, no "\r\n", no spaces? I ask because that is the case will become *less* efficient, now copying the String. I wish switching to StringView didn’t make the stripLeadingAndTrailingHTTPSpaces() function call *so* much longer stripLeadingAndTrailingMatchedCharacters(isHTTPSpace). On reflection, seems unnecessarily to have "matched characters" in that function name. > Source/WebCore/Modules/mediastream/libwebrtc/LibWebRTCRtpTransceiverBackend.cpp:94 > + rtcCodec.name = StringView(codec.mimeType).substring(6).utf8().data(); Since we are setting a std::string here it would be better if CString knew how to convert to std::string without writing .data(), which makes the code look unnecessarily dangerous, although it’s actually correct and fine. Kind of annoying that we need both CString and std::string in WebKit. Longer term, I wonder how many unique great things CString has; maybe we can drop it and switch to std::string. It uses our faster malloc, but otherwise seems to have no significantly different features from std::string. > Source/WebCore/html/HTMLMapElement.cpp:103 > m_name = mapName; WTFMove(mapName) > Source/WebCore/page/Location.cpp:202 > + if (!hash.isEmpty() && hash[0] == '#') We have startsWith(UChar) function for cases like this. > Source/WebCore/page/UserContentURLPattern.h:37 > + UserContentURLPattern(StringView pattern) Should we add explicit? > Source/WebCore/page/UserContentURLPattern.h:38 > : m_matchSubdomains(false) Normally we indent. > Source/WebCore/platform/network/HTTPParsers.cpp:681 > // FIXME: The rest of this should use StringView. Does this comment still make sense? Comment on attachment 457653 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457653&action=review >> Source/JavaScriptCore/inspector/ContentSearchUtilities.cpp:94 >> + result.append(std::pair { lineNumber, line.toString() }); > > I guess we do need the explicit std::pair; I would omit it if we could. You're probably right that we can. I'll try. >> Source/WebCore/Modules/fetch/FetchBodyConsumer.cpp:167 >> + contentType = StringView(header).substring(contentTypeBegin + contentTypePrefixLength, contentTypeEnd - contentTypeBegin - contentTypePrefixLength).stripLeadingAndTrailingMatchedCharacters(isHTTPSpace).toString(); > > Is there a very-common case where the header string happens to have nothing that need to be trimmed, no "\r\n", no spaces? I ask because that is the case will become *less* efficient, now copying the String. > > I wish switching to StringView didn’t make the stripLeadingAndTrailingHTTPSpaces() function call *so* much longer stripLeadingAndTrailingMatchedCharacters(isHTTPSpace). On reflection, seems unnecessarily to have "matched characters" in that function name. I was careful about that and I don't think that's true? Previous code had a |header| String, then was allocating a new temporary String (let's call it |tmp|), then passing |tmp| to stripLeadingAndTrailingHTTPSpaces(), which in the common case would return |tmp| (because this case is optimized). However, there was still one String allocation for |tmp|, even in this optimized "common" case. With my code, we convert |header| to a StringView, do all operations on the StringView and then construct a String at the end. Still only one String allocation, and one String allocation is all cases, not just the "common" case. The one case where the previous code could be more efficient is if both the subString() AND the stripLeadingAndTrailingHTTPSpaces() are NO-OPs. But given the logic in this function, I don't think it is common for the substring() to return the original String as-is. Did I miss something? >> Source/WebCore/Modules/mediastream/libwebrtc/LibWebRTCRtpTransceiverBackend.cpp:94 >> + rtcCodec.name = StringView(codec.mimeType).substring(6).utf8().data(); > > Since we are setting a std::string here it would be better if CString knew how to convert to std::string without writing .data(), which makes the code look unnecessarily dangerous, although it’s actually correct and fine. Kind of annoying that we need both CString and std::string in WebKit. Longer term, I wonder how many unique great things CString has; maybe we can drop it and switch to std::string. It uses our faster malloc, but otherwise seems to have no significantly different features from std::string. The reason std::string is used here is because of libwebrtc here I believe. I agree it'd be nice if CString would know how to convert by itself though. Will look into that. >> Source/WebCore/html/HTMLMapElement.cpp:103 >> m_name = mapName; > > WTFMove(mapName) Indeed. Comment on attachment 457653 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457653&action=review >>> Source/WebCore/Modules/fetch/FetchBodyConsumer.cpp:167 >>> + contentType = StringView(header).substring(contentTypeBegin + contentTypePrefixLength, contentTypeEnd - contentTypeBegin - contentTypePrefixLength).stripLeadingAndTrailingMatchedCharacters(isHTTPSpace).toString(); >> >> Is there a very-common case where the header string happens to have nothing that need to be trimmed, no "\r\n", no spaces? I ask because that is the case will become *less* efficient, now copying the String. >> >> I wish switching to StringView didn’t make the stripLeadingAndTrailingHTTPSpaces() function call *so* much longer stripLeadingAndTrailingMatchedCharacters(isHTTPSpace). On reflection, seems unnecessarily to have "matched characters" in that function name. > > I was careful about that and I don't think that's true? > > Previous code had a |header| String, then was allocating a new temporary String (let's call it |tmp|), then passing |tmp| to stripLeadingAndTrailingHTTPSpaces(), which in the common case would return |tmp| (because this case is optimized). However, there was still one String allocation for |tmp|, even in this optimized "common" case. > > With my code, we convert |header| to a StringView, do all operations on the StringView and then construct a String at the end. Still only one String allocation, and one String allocation is all cases, not just the "common" case. > > The one case where the previous code could be more efficient is if both the subString() AND the stripLeadingAndTrailingHTTPSpaces() are NO-OPs. But given the logic in this function, I don't think it is common for the substring() to return the original String as-is. > > Did I miss something? I didn’t spot why it’s not common for substring to return the original header as-is; I had guessed there might be a common degenerate case. But I trust you on that. Comment on attachment 457653 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457653&action=review >>>> Source/WebCore/Modules/fetch/FetchBodyConsumer.cpp:167 >>>> + contentType = StringView(header).substring(contentTypeBegin + contentTypePrefixLength, contentTypeEnd - contentTypeBegin - contentTypePrefixLength).stripLeadingAndTrailingMatchedCharacters(isHTTPSpace).toString(); >>> >>> Is there a very-common case where the header string happens to have nothing that need to be trimmed, no "\r\n", no spaces? I ask because that is the case will become *less* efficient, now copying the String. >>> >>> I wish switching to StringView didn’t make the stripLeadingAndTrailingHTTPSpaces() function call *so* much longer stripLeadingAndTrailingMatchedCharacters(isHTTPSpace). On reflection, seems unnecessarily to have "matched characters" in that function name. >> >> I was careful about that and I don't think that's true? >> >> Previous code had a |header| String, then was allocating a new temporary String (let's call it |tmp|), then passing |tmp| to stripLeadingAndTrailingHTTPSpaces(), which in the common case would return |tmp| (because this case is optimized). However, there was still one String allocation for |tmp|, even in this optimized "common" case. >> >> With my code, we convert |header| to a StringView, do all operations on the StringView and then construct a String at the end. Still only one String allocation, and one String allocation is all cases, not just the "common" case. >> >> The one case where the previous code could be more efficient is if both the subString() AND the stripLeadingAndTrailingHTTPSpaces() are NO-OPs. But given the logic in this function, I don't think it is common for the substring() to return the original String as-is. >> >> Did I miss something? > > I didn’t spot why it’s not common for substring to return the original header as-is; I had guessed there might be a common degenerate case. But I trust you on that. The header always starts with: const char* contentTypeCharacters = "Content-Type:"; The substring() is meant to get the part of the header line after the "Content-Type:", thus the contentTypePrefixLength in the substring call. Only way substring() would return the same string is if `contentTypeBegin + contentTypePrefixLength` was 0 and `contentTypeEnd - contentTypeBegin - contentTypePrefixLength` was the length of the whole string (or greater). contentTypeBegin + contentTypePrefixLength cannot be 0 because contentTypePrefixLength is a constant greater than 0 (13). Comment on attachment 457653 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457653&action=review >>>>> Source/WebCore/Modules/fetch/FetchBodyConsumer.cpp:167 >>>>> + contentType = StringView(header).substring(contentTypeBegin + contentTypePrefixLength, contentTypeEnd - contentTypeBegin - contentTypePrefixLength).stripLeadingAndTrailingMatchedCharacters(isHTTPSpace).toString(); >>>> >>>> Is there a very-common case where the header string happens to have nothing that need to be trimmed, no "\r\n", no spaces? I ask because that is the case will become *less* efficient, now copying the String. >>>> >>>> I wish switching to StringView didn’t make the stripLeadingAndTrailingHTTPSpaces() function call *so* much longer stripLeadingAndTrailingMatchedCharacters(isHTTPSpace). On reflection, seems unnecessarily to have "matched characters" in that function name. >>> >>> I was careful about that and I don't think that's true? >>> >>> Previous code had a |header| String, then was allocating a new temporary String (let's call it |tmp|), then passing |tmp| to stripLeadingAndTrailingHTTPSpaces(), which in the common case would return |tmp| (because this case is optimized). However, there was still one String allocation for |tmp|, even in this optimized "common" case. >>> >>> With my code, we convert |header| to a StringView, do all operations on the StringView and then construct a String at the end. Still only one String allocation, and one String allocation is all cases, not just the "common" case. >>> >>> The one case where the previous code could be more efficient is if both the subString() AND the stripLeadingAndTrailingHTTPSpaces() are NO-OPs. But given the logic in this function, I don't think it is common for the substring() to return the original String as-is. >>> >>> Did I miss something? >> >> I didn’t spot why it’s not common for substring to return the original header as-is; I had guessed there might be a common degenerate case. But I trust you on that. > > The header always starts with: > const char* contentTypeCharacters = "Content-Type:"; > > The substring() is meant to get the part of the header line after the "Content-Type:", thus the contentTypePrefixLength in the substring call. Only way substring() would return the same string is if `contentTypeBegin + contentTypePrefixLength` was 0 and `contentTypeEnd - contentTypeBegin - contentTypePrefixLength` was the length of the whole string (or greater). > contentTypeBegin + contentTypePrefixLength cannot be 0 because contentTypePrefixLength is a constant greater than 0 (13). Got it. Created attachment 457731 [details]
Patch
Comment on attachment 457731 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457731&action=review > Source/WTF/wtf/text/CString.h:75 > + std::string toStdString() const { return m_buffer ? std::string(m_buffer->data()) : std::string(); } Why isn’t the function body just: { return data(); } (In reply to Darin Adler from comment #8) > Comment on attachment 457731 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=457731&action=review > > > Source/WTF/wtf/text/CString.h:75 > > + std::string toStdString() const { return m_buffer ? std::string(m_buffer->data()) : std::string(); } > > Why isn’t the function body just: > > { return data(); } I wasn't sure it was OK to pass null to std::string(const char*). (In reply to Chris Dumez from comment #9) > (In reply to Darin Adler from comment #8) > > Comment on attachment 457731 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=457731&action=review > > > > > Source/WTF/wtf/text/CString.h:75 > > > + std::string toStdString() const { return m_buffer ? std::string(m_buffer->data()) : std::string(); } > > > > Why isn’t the function body just: > > > > { return data(); } > > I wasn't sure it was OK to pass null to std::string(const char*). "If s is a null pointer, if n == npos, or if the range specified by [first,last) is not valid, it causes undefined behavior." from https://www.cplusplus.com/reference/string/string/string/ Comment on attachment 457731 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=457731&action=review >>>> Source/WTF/wtf/text/CString.h:75 >>>> + std::string toStdString() const { return m_buffer ? std::string(m_buffer->data()) : std::string(); } >>> >>> Why isn’t the function body just: >>> >>> { return data(); } >> >> I wasn't sure it was OK to pass null to std::string(const char*). > > "If s is a null pointer, if n == npos, or if the range specified by [first,last) is not valid, it causes undefined behavior." > from https://www.cplusplus.com/reference/string/string/string/ Oh, the old code had that problem, then! It was just passing .data() without checking for null. But maybe that one particular call site knew it could never be null. Committed r292939 (249704@main): <https://commits.webkit.org/249704@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 457731 [details]. |