Bug 207531

Summary: CachedResource should purge SharedBuffer if it is a particular type
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: Page LoadingAssignee: Yusuke Suzuki <ysuzuki>
Status: NEW ---    
Severity: Normal CC: Basuke.Suzuki, beidson, ggaren, simon.fraser
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=208683

Description Yusuke Suzuki 2020-02-11 01:05:02 PST
The detailed memgraph data collected from Membuster is saying there are many large Vectors,
and they are a data segment of SharedBuffer of CachedResources, including CachedScript, CachedCSSStyleSheet, CachedImage etc.
But important thing is that they also have decoded data too! This means we have double-sized data basically so long as CachedScript etc. is held by CachedScriptSourceProvider.

For example, we have CachedScript, and it has decoded string.
This means... We have duplicate data for this script, one in SharedBuffer and one in decoded String.
The same thing can be said for CachedCSSStyleSheet, CachedImage etc.

Instead of destroying SharedBuffer, we have a mechanism destroying decoded data (destroyDecodedData).
But this would not be called so long as CachedScriptSourceProvider is holding this CachedScript.
This basically means that we have duplicate data so long as we are in this page.
If we navigate to the other page, we could purge decoded data (and we could purge CachedResource too.)

For some CachedResource types, we should hold decoded data, and should purge SharedBuffer instead.
Comment 1 Yusuke Suzuki 2020-02-11 01:12:54 PST
I’ll try this tomorrow. Plan is using Variant<SharedBuffer, String, ...> as data
Comment 2 Yusuke Suzuki 2020-02-11 01:53:42 PST
Seems that blink folks are doing this. We should try.
https://docs.google.com/document/d/1v0yTAZ6wkqX2U_M6BNIGUJpM1s0TIw1VsqpxoL7aciY/edit#heading=h.hydebxiwp5hv
Comment 3 Yusuke Suzuki 2020-02-11 20:11:39 PST
We have a path using SharedBuffer as a content when it is ASCII. And seems that Membuster is using this path mainly, so maybe, this does not affect on memory usage of Membuster.

But for image case, we should do it.
And still, we should do it, but I'll check later since this would not affect on Membuster result.
Comment 4 Yusuke Suzuki 2020-02-12 23:40:21 PST
For CachedImage case,

// On Mac the NSData inside the SharedBuffer can be secretly appended to without the SharedBuffer's knowledge.
// We use SharedBuffer's ability to wrap itself inside CFData to get around this, ensuring that ImageIO is
// really looking at the SharedBuffer.

We are already doing this, cool.

Other possibility is,

1. non-ASCII string source code
2. script source code compression
3. style sheet source code compression
Comment 5 Yusuke Suzuki 2020-02-12 23:47:35 PST
(In reply to Yusuke Suzuki from comment #4)
> For CachedImage case,
> 
> // On Mac the NSData inside the SharedBuffer can be secretly appended to
> without the SharedBuffer's knowledge.
> // We use SharedBuffer's ability to wrap itself inside CFData to get around
> this, ensuring that ImageIO is
> // really looking at the SharedBuffer.
> 
> We are already doing this, cool.
> 
> Other possibility is,
> 
> 1. non-ASCII string source code
> 2. script source code compression
> 3. style sheet source code compression

Wait, I need to check whether ImageIO is using this buffer directly or doing some fancy buffering internally.