215232 – REGRESSION (r257667): 1.9x more CPU time in IPC::SharedBufferDataReference decoding during Netflix playback

RESOLVED FIXED215232

REGRESSION (r257667): 1.9x more CPU time in IPC::SharedBufferDataReference decoding during Netflix playback

https://bugs.webkit.org/show_bug.cgi?id=215232

Summary REGRESSION (r257667): 1.9x more CPU time in IPC::SharedBufferDataReference de...

Andy Estes

Reported 2020-08-06 11:49:34 PDT

After r257667, traces show 1.9x more CPU time is spent decoding IPC::SharedBufferDataReference objects during video playback on Netflix.

Attachments
Patch for EWS (6.39 KB, patch) 2020-08-18 11:40 PDT, Andy Estes	no flags	Details Formatted Diff Diff
Patch for EWS (44.71 KB, patch) 2020-08-18 21:54 PDT, Andy Estes	no flags	Details Formatted Diff Diff
Patch for EWS (50.32 KB, patch) 2020-08-19 03:04 PDT, Andy Estes	no flags	Details Formatted Diff Diff
Patch (51.24 KB, patch) 2020-08-19 14:26 PDT, Andy Estes	no flags	Details Formatted Diff Diff
Patch for landing (92.81 KB, patch) 2020-08-20 17:19 PDT, Andy Estes	no flags	Details Formatted Diff Diff
Patch for EWS (91.81 KB, patch) 2020-08-20 17:52 PDT, Andy Estes	no flags	Details Formatted Diff Diff
Patch for landing (91.56 KB, patch) 2020-08-20 18:09 PDT, Andy Estes	no flags	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2020-08-06 11:51:06 PDT

<rdar://problem/66637920>

Andy Estes

Comment 2 2020-08-18 11:40:54 PDT Comment hidden (obsolete)

Created attachment 406796 [details] Patch for EWS

Andy Estes

Comment 3 2020-08-18 21:54:27 PDT Comment hidden (obsolete)

Created attachment 406828 [details] Patch for EWS

Andy Estes

Comment 4 2020-08-19 03:04:45 PDT Comment hidden (obsolete)

Created attachment 406832 [details] Patch for EWS

Geoffrey Garen

Comment 5 2020-08-19 12:15:48 PDT

Comment on attachment 406832 [details] Patch for EWS View in context: https://bugs.webkit.org/attachment.cgi?id=406832&action=review > Source/WebKit/ChangeLog:24 > + 1. While both techniques copy the SharedBuffer only once during encoding, the new technique > + combines the buffer's data segments (by calling SharedBuffer::data()) while the old > + technique avoids this. To fix a regression, a partial revert makes great sense. That said, should we also fix the encoder for SharedBuffer, so that it stops doing this inefficient thing? > Source/WebKit/ChangeLog:30 > + 2. Once a SharedMemory handle is decoded and mapped, the new technique copies its memory > + into a SharedBuffer for use by the message receiver. Unlike the old technique that > + decodes as a pointer into the existing Decoder buffer, the new technique must decode a > + handle, map a SharedMemory allocation, copy it into a new SharedBuffer on the heap, and > + then destroy the handle and SharedMemory. Same question here. > Source/WebKit/Platform/IPC/LegacySharedBufferDataReference.h:36 > +class LegacySharedBufferDataReference { Would be good to add a comment explaining why this should not be used, and what should be used instead.

Andy Estes

Comment 6 2020-08-19 14:26:41 PDT

Created attachment 406872 [details] Patch

Andy Estes

Comment 7 2020-08-19 14:37:46 PDT

(In reply to Geoffrey Garen from comment #5) > Comment on attachment 406832 [details] > Patch for EWS > > View in context: > https://bugs.webkit.org/attachment.cgi?id=406832&action=review > > > Source/WebKit/ChangeLog:24 > > + 1. While both techniques copy the SharedBuffer only once during encoding, the new technique > > + combines the buffer's data segments (by calling SharedBuffer::data()) while the old > > + technique avoids this. > > To fix a regression, a partial revert makes great sense. That said, should > we also fix the encoder for SharedBuffer, so that it stops doing this > inefficient thing? Yes, I think we should fix the segment combining issue in a different bug. > > > Source/WebKit/ChangeLog:30 > > + 2. Once a SharedMemory handle is decoded and mapped, the new technique copies its memory > > + into a SharedBuffer for use by the message receiver. Unlike the old technique that > > + decodes as a pointer into the existing Decoder buffer, the new technique must decode a > > + handle, map a SharedMemory allocation, copy it into a new SharedBuffer on the heap, and > > + then destroy the handle and SharedMemory. > > Same question here. Yes, also in a different bug. I'll file these and add FIXMEs.

Andy Estes

Comment 8 2020-08-19 14:40:36 PDT

(Note that https://bugs.webkit.org/attachment.cgi?id=406872 has a much differently worded ChangeLog)

Geoffrey Garen

Comment 9 2020-08-19 15:30:25 PDT

Comment on attachment 406872 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=406872&action=review r=me > Source/WebKit/ChangeLog:43 > + LegacySharedBufferDataReference type, which reverts to the old behaior of producing an behavior > Source/WebKit/ChangeLog:44 > + Resolve this slowdown by reverting the argument encoding changes made to > + IPC::SharedBufferDataReference in r257667. For compatibility with code added after r257667, > + IPC::SharedBufferDataReference retains its new behavior of copying into a SharedBuffer when > + decoded, but this type is now only used in cases where the SharedBuffer can be ref'd instead > + of copied. Pre-r257667 instances of SharedBufferDataReference were changed to a new > + LegacySharedBufferDataReference type, which reverts to the old behaior of producing an > + IPC::DataReference when decoded, avoiding many copies. So LegacySharedBufferDataReference is the fast IPC data type and SharedBufferDataReference is the slow IPC data type? That is... surprising. Are we in danger of new adoptions of SharedBufferDataReference causing perf regressions?

Darin Adler

Comment 10 2020-08-19 15:30:55 PDT

Comment on attachment 406872 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=406872&action=review Great work! So exciting to see this diagnosed. > Source/WebKit/ChangeLog:3 > + REGRESSION (r257667): 1.9x more CPU time in IPC::SharedBufferDataReference decoding during Netflix playback What’s our strategy to protect ourselves from making this kind of mistake again? Maybe we can find a way to count how often we copy data buffers during regression tests? There can’t be too many different places that do that "copy a lot of bytes" operation. > Source/WebKit/ChangeLog:21 > + handle. It makes only one copy like the old encoder did, but pays the added cost of > + combining SharedBuffer data segments prior to copying by needlessly calling > + SharedBuffer::data() (this should be fixed independently). Seems like we have a real opportunity there; doesn’t seem that hard to code! > Source/WebKit/ChangeLog:43 > + LegacySharedBufferDataReference type, which reverts to the old behaior of producing an I’m not sure that "Legacy" accurately captures the difference between these two. Normally we name something Legacy when the dependency on the old behavior is an unwanted remnant of the past, to eventually be eliminated. Here it seems from your description there are reasons to prefer the two different behaviors for different cases of sending data across the process boundary; ideally those would be reflected in the names. Typo: "behaior" > Tools/DumpRenderTree/DumpRenderTree.xcodeproj/project.pbxproj:414 > + BCB281EE0CFA713D007E533E /* Base.xcconfig */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.xcconfig; name = Base.xcconfig; path = mac/Configurations/Base.xcconfig; sourceTree = "<group>"; }; > + BCB281F00CFA713D007E533E /* DumpRenderTree.xcconfig */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.xcconfig; name = DumpRenderTree.xcconfig; path = mac/Configurations/DumpRenderTree.xcconfig; sourceTree = "<group>"; }; > + BCB282F40CFA7450007E533E /* DebugRelease.xcconfig */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.xcconfig; name = DebugRelease.xcconfig; path = mac/Configurations/DebugRelease.xcconfig; sourceTree = "<group>"; }; > + BCB283DE0CFA7C20007E533E /* TestNetscapePlugIn.xcconfig */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.xcconfig; name = TestNetscapePlugIn.xcconfig; path = mac/Configurations/TestNetscapePlugIn.xcconfig; sourceTree = "<group>"; }; Are these changes improvements? Is "4" a better file encoding than "30"? Or is this just a script doing some reverting?

Andy Estes

Comment 11 2020-08-19 17:06:33 PDT

(In reply to Geoffrey Garen from comment #9) > So LegacySharedBufferDataReference is the fast IPC data type and > SharedBufferDataReference is the slow IPC data type? That is... surprising. > > Are we in danger of new adoptions of SharedBufferDataReference causing perf > regressions? (In reply to Darin Adler from comment #10) > I’m not sure that "Legacy" accurately captures the difference between these > two. Normally we name something Legacy when the dependency on the old > behavior is an unwanted remnant of the past, to eventually be eliminated. > Here it seems from your description there are reasons to prefer the two > different behaviors for different cases of sending data across the process > boundary; ideally those would be reflected in the names. How about I rename IPC::LegacySharedBufferDataReference to IPC::SharedBufferDataReference and call the current IPC::SharedBufferDataReference something like IPC::SharedBufferCopy? The thinking that led to "legacy" was that while IPC::DataReference is currently the faster way to decode, IPC::SharedBufferDataReference provides the safer interface for message receivers with SharedBuffer instead of a raw pointer to an unretained buffer. So that LegacySharedBufferDataReference is decoded as a IPC::DataReference felt like a legacy behavior, but really it's a faster-or-safer tradeoff I want to convey. I think a modern design would be to phase out DataReference (and LegacySharedBufferDataReference) and make SharedBuffer faster to use for decoding. One thing I think would help here is to add a type of SharedBuffer data segment that just refs a SharedMemory or IPC::Decoder, but that's out-of-scope for this change.

Darin Adler

Comment 12 2020-08-19 17:21:47 PDT

All of that sounds smart.

Andy Estes

Comment 13 2020-08-20 17:19:37 PDT Comment hidden (obsolete)

Created attachment 406980 [details] Patch for landing

Andy Estes

Comment 14 2020-08-20 17:52:39 PDT

Created attachment 406989 [details] Patch for EWS

Andy Estes

Comment 15 2020-08-20 18:09:26 PDT Comment hidden (obsolete)

Created attachment 406992 [details] Patch for landing

Andy Estes

Comment 16 2020-08-20 18:17:21 PDT

(In reply to Darin Adler from comment #10) > Comment on attachment 406872 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=406872&action=review > > Great work! So exciting to see this diagnosed. > > > Source/WebKit/ChangeLog:3 > > + REGRESSION (r257667): 1.9x more CPU time in IPC::SharedBufferDataReference decoding during Netflix playback > > What’s our strategy to protect ourselves from making this kind of mistake > again? > > Maybe we can find a way to count how often we copy data buffers during > regression tests? There can’t be too many different places that do that > "copy a lot of bytes" operation. One thing we could do is assert in IPC::SharedBufferCopy's destructor that m_buffer is either null or has more than one ref. That would detect cases where we created a SharedBuffer that the message receiver never ref'd or moved.

EWS

Comment 17 2020-08-21 04:29:24 PDT

Committed r266003: <https://trac.webkit.org/changeset/266003> All reviewed patches have been landed. Closing bug and clearing flags on attachment 406992 [details].

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P1

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component WebKit2

Assignee

Andy Estes

Reported

2020-08-06 11:49 PDT

Modified

2020-08-21 04:29 PDT History

CC List

10 users Show

URL

Keywords InRadar

Depends on

Blocks

208090

Dependencies

tree graph