After r257667, traces show 1.9x more CPU time is spent decoding IPC::SharedBufferDataReference objects during video playback on Netflix.
<rdar://problem/66637920>
Created attachment 406796 [details] Patch for EWS
Created attachment 406828 [details] Patch for EWS
Created attachment 406832 [details] Patch for EWS
Comment on attachment 406832 [details] Patch for EWS View in context: https://bugs.webkit.org/attachment.cgi?id=406832&action=review > Source/WebKit/ChangeLog:24 > + 1. While both techniques copy the SharedBuffer only once during encoding, the new technique > + combines the buffer's data segments (by calling SharedBuffer::data()) while the old > + technique avoids this. To fix a regression, a partial revert makes great sense. That said, should we also fix the encoder for SharedBuffer, so that it stops doing this inefficient thing? > Source/WebKit/ChangeLog:30 > + 2. Once a SharedMemory handle is decoded and mapped, the new technique copies its memory > + into a SharedBuffer for use by the message receiver. Unlike the old technique that > + decodes as a pointer into the existing Decoder buffer, the new technique must decode a > + handle, map a SharedMemory allocation, copy it into a new SharedBuffer on the heap, and > + then destroy the handle and SharedMemory. Same question here. > Source/WebKit/Platform/IPC/LegacySharedBufferDataReference.h:36 > +class LegacySharedBufferDataReference { Would be good to add a comment explaining why this should not be used, and what should be used instead.
Created attachment 406872 [details] Patch
(In reply to Geoffrey Garen from comment #5) > Comment on attachment 406832 [details] > Patch for EWS > > View in context: > https://bugs.webkit.org/attachment.cgi?id=406832&action=review > > > Source/WebKit/ChangeLog:24 > > + 1. While both techniques copy the SharedBuffer only once during encoding, the new technique > > + combines the buffer's data segments (by calling SharedBuffer::data()) while the old > > + technique avoids this. > > To fix a regression, a partial revert makes great sense. That said, should > we also fix the encoder for SharedBuffer, so that it stops doing this > inefficient thing? Yes, I think we should fix the segment combining issue in a different bug. > > > Source/WebKit/ChangeLog:30 > > + 2. Once a SharedMemory handle is decoded and mapped, the new technique copies its memory > > + into a SharedBuffer for use by the message receiver. Unlike the old technique that > > + decodes as a pointer into the existing Decoder buffer, the new technique must decode a > > + handle, map a SharedMemory allocation, copy it into a new SharedBuffer on the heap, and > > + then destroy the handle and SharedMemory. > > Same question here. Yes, also in a different bug. I'll file these and add FIXMEs.
(Note that https://bugs.webkit.org/attachment.cgi?id=406872 has a much differently worded ChangeLog)
Comment on attachment 406872 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=406872&action=review r=me > Source/WebKit/ChangeLog:43 > + LegacySharedBufferDataReference type, which reverts to the old behaior of producing an behavior > Source/WebKit/ChangeLog:44 > + Resolve this slowdown by reverting the argument encoding changes made to > + IPC::SharedBufferDataReference in r257667. For compatibility with code added after r257667, > + IPC::SharedBufferDataReference retains its new behavior of copying into a SharedBuffer when > + decoded, but this type is now only used in cases where the SharedBuffer can be ref'd instead > + of copied. Pre-r257667 instances of SharedBufferDataReference were changed to a new > + LegacySharedBufferDataReference type, which reverts to the old behaior of producing an > + IPC::DataReference when decoded, avoiding many copies. So LegacySharedBufferDataReference is the fast IPC data type and SharedBufferDataReference is the slow IPC data type? That is... surprising. Are we in danger of new adoptions of SharedBufferDataReference causing perf regressions?
Comment on attachment 406872 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=406872&action=review Great work! So exciting to see this diagnosed. > Source/WebKit/ChangeLog:3 > + REGRESSION (r257667): 1.9x more CPU time in IPC::SharedBufferDataReference decoding during Netflix playback What’s our strategy to protect ourselves from making this kind of mistake again? Maybe we can find a way to count how often we copy data buffers during regression tests? There can’t be too many different places that do that "copy a lot of bytes" operation. > Source/WebKit/ChangeLog:21 > + handle. It makes only one copy like the old encoder did, but pays the added cost of > + combining SharedBuffer data segments prior to copying by needlessly calling > + SharedBuffer::data() (this should be fixed independently). Seems like we have a real opportunity there; doesn’t seem that hard to code! > Source/WebKit/ChangeLog:43 > + LegacySharedBufferDataReference type, which reverts to the old behaior of producing an I’m not sure that "Legacy" accurately captures the difference between these two. Normally we name something Legacy when the dependency on the old behavior is an unwanted remnant of the past, to eventually be eliminated. Here it seems from your description there are reasons to prefer the two different behaviors for different cases of sending data across the process boundary; ideally those would be reflected in the names. Typo: "behaior" > Tools/DumpRenderTree/DumpRenderTree.xcodeproj/project.pbxproj:414 > + BCB281EE0CFA713D007E533E /* Base.xcconfig */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.xcconfig; name = Base.xcconfig; path = mac/Configurations/Base.xcconfig; sourceTree = "<group>"; }; > + BCB281F00CFA713D007E533E /* DumpRenderTree.xcconfig */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.xcconfig; name = DumpRenderTree.xcconfig; path = mac/Configurations/DumpRenderTree.xcconfig; sourceTree = "<group>"; }; > + BCB282F40CFA7450007E533E /* DebugRelease.xcconfig */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.xcconfig; name = DebugRelease.xcconfig; path = mac/Configurations/DebugRelease.xcconfig; sourceTree = "<group>"; }; > + BCB283DE0CFA7C20007E533E /* TestNetscapePlugIn.xcconfig */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.xcconfig; name = TestNetscapePlugIn.xcconfig; path = mac/Configurations/TestNetscapePlugIn.xcconfig; sourceTree = "<group>"; }; Are these changes improvements? Is "4" a better file encoding than "30"? Or is this just a script doing some reverting?
(In reply to Geoffrey Garen from comment #9) > So LegacySharedBufferDataReference is the fast IPC data type and > SharedBufferDataReference is the slow IPC data type? That is... surprising. > > Are we in danger of new adoptions of SharedBufferDataReference causing perf > regressions? (In reply to Darin Adler from comment #10) > I’m not sure that "Legacy" accurately captures the difference between these > two. Normally we name something Legacy when the dependency on the old > behavior is an unwanted remnant of the past, to eventually be eliminated. > Here it seems from your description there are reasons to prefer the two > different behaviors for different cases of sending data across the process > boundary; ideally those would be reflected in the names. How about I rename IPC::LegacySharedBufferDataReference to IPC::SharedBufferDataReference and call the current IPC::SharedBufferDataReference something like IPC::SharedBufferCopy? The thinking that led to "legacy" was that while IPC::DataReference is currently the faster way to decode, IPC::SharedBufferDataReference provides the safer interface for message receivers with SharedBuffer instead of a raw pointer to an unretained buffer. So that LegacySharedBufferDataReference is decoded as a IPC::DataReference felt like a legacy behavior, but really it's a faster-or-safer tradeoff I want to convey. I think a modern design would be to phase out DataReference (and LegacySharedBufferDataReference) and make SharedBuffer faster to use for decoding. One thing I think would help here is to add a type of SharedBuffer data segment that just refs a SharedMemory or IPC::Decoder, but that's out-of-scope for this change.
All of that sounds smart.
Created attachment 406980 [details] Patch for landing
Created attachment 406989 [details] Patch for EWS
Created attachment 406992 [details] Patch for landing
(In reply to Darin Adler from comment #10) > Comment on attachment 406872 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=406872&action=review > > Great work! So exciting to see this diagnosed. > > > Source/WebKit/ChangeLog:3 > > + REGRESSION (r257667): 1.9x more CPU time in IPC::SharedBufferDataReference decoding during Netflix playback > > What’s our strategy to protect ourselves from making this kind of mistake > again? > > Maybe we can find a way to count how often we copy data buffers during > regression tests? There can’t be too many different places that do that > "copy a lot of bytes" operation. One thing we could do is assert in IPC::SharedBufferCopy's destructor that m_buffer is either null or has more than one ref. That would detect cases where we created a SharedBuffer that the message receiver never ref'd or moved.
Committed r266003: <https://trac.webkit.org/changeset/266003> All reviewed patches have been landed. Closing bug and clearing flags on attachment 406992 [details].