231607 – REGRESSION (iOS 15): Tab crashes when trying to render Projector stories

RESOLVED FIXED 231607

REGRESSION (iOS 15): Tab crashes when trying to render Projector stories

https://bugs.webkit.org/show_bug.cgi?id=231607

Summary REGRESSION (iOS 15): Tab crashes when trying to render Projector stories

Sidhu Alluri

Reported 2021-10-12 12:37:08 PDT

The following link crashes the iOS 15 Safari tab on iPhone X, 12 Mini, and 13. Works fine in all devices with iOS 14 or earlier. https://staging.projector.com/story/be1ca8c7-a46a-41fc-9ad6-de8b8f178a8b?scene=6f3e4207 After reloading once, the tab displays "A problem repeatedly occurred on https://staging.projector.com/story/be1ca8c7-a46a-41fc-9ad6-de8b8f178a8b?scene=6f3e4207" No console errors when trying to debug.

Attachments
Patch (4.59 KB, patch) 2021-10-28 17:17 PDT, Kyle Piddington	no flags	Details Formatted Diff Diff
Patch (8.39 KB, patch) 2021-10-29 20:03 PDT, Kyle Piddington	no flags	Details Formatted Diff Diff
Patch for landing (8.33 KB, patch) 2021-11-03 12:31 PDT, Kyle Piddington	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Kenneth Russell

Comment 1 2021-10-12 14:16:43 PDT

I only have the hardware to test this on macOS but the context creation attributes appear to be: alpha: false antialias: false powerPreference: "high-performance" premultipliedAlpha: true preserveDrawingBuffer: false stencil: false Not sure why this combination might cause the tab to crash. Kimmo, would it be possible for you to confirm this? It doesn't look like the xrCompatible context creation attribute is the issue here.

Sidhu Alluri

Comment 2 2021-10-12 18:41:41 PDT

Managed to get it to stop crashing after reducing the gl buffer size to 15MB (down from 96MB, we use a single large buffer). `createBuffer` call succeeds but leads to eventual tab crash after enough draw calls. I'll try to narrow it further, but wanted to give an update.

Sidhu Alluri

Comment 3 2021-10-13 08:21:51 PDT

(In reply to Sidhu Alluri from comment #0) > The following link crashes the iOS 15 Safari tab on iPhone X, 12 Mini, and > 13. Works fine in all devices with iOS 14 or earlier. > https://staging.projector.com/story/be1ca8c7-a46a-41fc-9ad6- > de8b8f178a8b?scene=6f3e4207 > > After reloading once, the tab displays "A problem repeatedly occurred on > https://staging.projector.com/story/be1ca8c7-a46a-41fc-9ad6- > de8b8f178a8b?scene=6f3e4207" > > No console errors when trying to debug. I'm deploying a lower sized gl buffer to work around the issue but I added a query param to manually set the buffer size. The issue can still be repro'd via this link https://staging.projector.com/story/be1ca8c7-a46a-41fc-9ad6-de8b8f178a8b?scene=6f3e4207&glBufferSize=100663296

Radar WebKit Bug Importer

Comment 4 2021-10-13 17:18:52 PDT

<rdar://problem/84224687>

Kimmo Kinnunen

Comment 5 2021-10-15 07:26:33 PDT

This appears to be due to WebContent process running over memory limit. It's unclear if it's Metal or GPU process difference to iOS 14.

Kimmo Kinnunen

Comment 6 2021-10-22 07:05:59 PDT

This is mtl::Buffer instances being leaked due to buffer pool creating them unbounded

Kyle Piddington

Comment 7 2021-10-22 12:47:09 PDT

Looking into this bug: It seems there’s some trouble at work with the per-frame data. I see the initial 96 MB allocation, but not any repetition of that initial allocation. Each frame, however, I'm seeing about 38 more MB of allocated data. We seem to hold on to the first few frames worth of data, and don't get around to clearing them, however, which I think is leading to this bug.

Kyle Piddington

Comment 8 2021-10-22 12:54:15 PDT

From analysis: We're creating a lot more buffer pools than we are buffers. It seems the issue isn't one buffer pool allocating too much, but too many buffer pools being created in the first place.

Le Hoang Quyen

Comment 9 2021-10-22 18:12:54 PDT

Sorry, the vertex conversion's buffer pool mechanism in the Metal back-end was originally written by me. So I did attempt to trace the Metal commands while running this website in order to know what's going on. Here what I observed: - The website created a single large buffer (roughly 100MB) then seemingly used many small portions of it for different draw calls: - portion from 0-100 for draw call 1 (only needs 10 vertices, this is an approximate number). - portion from 100-200 for draw call 2 (only needs 10 vertices). - ... - portion from 90000000-100000000 for draw calls n (only needs 10 vertices). - Due to irregular offsets, many of the portions above needed vertex attribute conversions. However the conversions were inefficient because they converted the whole buffer's worth of vertices starting from given offset to the end. - draw call 1 allocated a conversion buffer for 90000000 vertices' attributes even though it needs only 10 vertices. - draw call 2 allocated a conversion buffer for 80000000 vertices' attributes even though it needs only 10 vertices. - ... The problem is that the conversion buffer pool will always allocate a new sub-buffer to store the converted attribute values for a number of vertices starting from given offset to the end regardless of how many vertices the actually draw call needs. Leading to a lot of converted portions unused by any draw calls. Currently the conversion buffers are keyed by triplet {format, offset, stride}. It doesn't take into account how many vertices needed hence the issue above happened. The number of vertices would be included in the key i.e. {format, offset, stride, numVertices}. However, it would still lead to another problem in another use case: - users draw using a same offset but with different number of vertices many times => many conversion buffers would be created.

Kimmo Kinnunen

Comment 10 2021-10-25 06:34:11 PDT

(In reply to Le Hoang Quyen from comment #9) > Currently the conversion buffers are keyed by triplet {format, offset, > stride}. It doesn't take into account how many vertices needed hence the > issue above happened. > The number of vertices would be included in the key i.e. {format, offset, > stride, numVertices}. However, it would still lead to another problem in > another use case: > - users draw using a same offset but with different number of vertices many > times => many conversion buffers would be created. Thanks for the explanation! I think the general logic should be better. However, for this particular bug, it sounds like we need to just change the look up function from `offset == wantOffset` to `offset <= wantOffset` ?

Le Hoang Quyen

Comment 11 2021-10-25 07:30:32 PDT

(In reply to Kimmo Kinnunen from comment #10) > (In reply to Le Hoang Quyen from comment #9) > > Currently the conversion buffers are keyed by triplet {format, offset, > > stride}. It doesn't take into account how many vertices needed hence the > > issue above happened. > > The number of vertices would be included in the key i.e. {format, offset, > > stride, numVertices}. However, it would still lead to another problem in > > another use case: > > - users draw using a same offset but with different number of vertices many > > times => many conversion buffers would be created. > > Thanks for the explanation! I think the general logic should be better. > > However, for this particular bug, it sounds like we need to just change the > look up function from `offset == wantOffset` to `offset <= wantOffset` ? The above offsets are just approximate numbers. The actual numbers might not be multiples of 100. And each drawing portions might use different alignments from each other. Yes, the lookup could be changed to something better (perhaps using `offset mod alignment` as key instead of offset). However, `offset <= wantOffset` alone doesn't work because there are several limits of Metal I noticed: - The final offset of the vertex attribute is calculated by: final offset = bufferOffset + attribOffset. - `bufferOffset` = offset set via MTLRenderCommandEncoder.setVertexBuffer - `attribOffset` = MTLVertexAttributeDescriptor's offset field. - Metal requires: - `bufferOffset` to be multiples of the attribute's data type's alignment. - `attribOffset` must be multiples of 4. - Furthermore, `attribOffset` + attribute's size must be less than or equal to the fetching stride.

Kyle Piddington

Comment 12 2021-10-28 17:17:28 PDT

Created attachment 442768 [details] Patch

EWS Watchlist

Comment 13 2021-10-28 17:18:32 PDT

Note that there are important steps to take when updating ANGLE. See https://trac.webkit.org/wiki/UpdatingANGLE

Darin Adler

Comment 14 2021-10-28 17:31:22 PDT

Comment on attachment 442768 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=442768&action=review > Source/ThirdParty/ANGLE/src/libANGLE/renderer/metal/VertexArrayMtl.mm:1079 > + VertexConversionBufferMtl * vertexConverison = (VertexConversionBufferMtl *)conversion; Spelling error "conversion".

Kyle Piddington

Comment 15 2021-10-29 20:03:06 PDT

Created attachment 442895 [details] Patch

Dean Jackson

Comment 16 2021-11-01 15:37:21 PDT

Comment on attachment 442895 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=442895&action=review > Source/ThirdParty/ANGLE/src/libANGLE/renderer/metal/VertexArrayMtl.mm:1074 > + VertexConversionBufferMtl * vertexConverison = (VertexConversionBufferMtl *)conversion; Typo here: vertexConverison

Kyle Piddington

Comment 17 2021-11-03 12:31:09 PDT

Created attachment 443223 [details] Patch for landing

EWS

Comment 18 2021-11-03 13:10:28 PDT

Committed r285220 (243845@main): <https://commits.webkit.org/243845@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 443223 [details].

Note You need to log in before you can comment on or make changes to this bug.