Bug 210208 - Fix handling non-linearized PDFs when incremental PDF loading is enabled
Summary: Fix handling non-linearized PDFs when incremental PDF loading is enabled
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Brady Eidson
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-04-08 11:55 PDT by Brady Eidson
Modified: 2020-04-08 15:15 PDT (History)
2 users (show)

See Also:


Attachments
Patch (7.84 KB, patch)
2020-04-08 12:16 PDT, Brady Eidson
no flags Details | Formatted Diff | Diff
Patch (8.38 KB, patch)
2020-04-08 13:40 PDT, Brady Eidson
no flags Details | Formatted Diff | Diff
Patch (8.37 KB, patch)
2020-04-08 13:49 PDT, Brady Eidson
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brady Eidson 2020-04-08 11:55:41 PDT
Fix handling non-linearized PDFs when incremental PDF loading is enabled

<rdar://problem/60619506>
Comment 1 Brady Eidson 2020-04-08 12:16:39 PDT
Created attachment 395852 [details]
Patch
Comment 2 Geoffrey Garen 2020-04-08 12:24:07 PDT
/Volumes/Data/worker/macOS-Mojave-Debug-Build-EWS/build/Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:1536:5: error: use of undeclared identifier 'pdfLog'
Comment 3 Tim Horton 2020-04-08 12:27:45 PDT
Comment on attachment 395852 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=395852&action=review

> Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:142
> +static const int32_t maximumRangeRequestPosition = std::numeric_limits<int32_t>::max();

INT32_MAX is a reasonable maximum? Seems ... big.
Comment 4 Geoffrey Garen 2020-04-08 12:46:42 PDT
Comment on attachment 395852 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=395852&action=review

> Source/WebKit/ChangeLog:9
> +        When we try to load a non-linearized PDF with PDFKit, it makes an outlandishly large range request
> +        to try to verify the PDF file size.

Are we working around a bug in PDFKit here? If so, let's cite the Radar for that bug.

> Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:142
> +static const int32_t maximumRangeRequestPosition = std::numeric_limits<int32_t>::max();

Should this be off_t, since the position argument passed to dataProviderGetBytesAtPositionCallback is off_t?

What is the actual request we get from PDFKit in practice? Maybe we can bump this number up, and do a direct comparison instead of a > comparison. This limit would prohibit streaming of any > 4GB PDF resource. While 4GB is very big, it doesn't strike me as impossibly big, and refusing to stream something because it was big would be a sad paradox.

> Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:701
> +void PDFPlugin::receivedInvalidRangeRequest()

I think this code would be clearer if we named and/or commented it according to the behavior we are working around / modeling. Maybe something like

    maximumRangeRequestPosition => nonLinearizedPDFSentinel
    receivedInvalidRangeRequest => receivedNonLinearizedPDFSentinel

> Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:741
> +    Ref<PDFPlugin> plugin = *((PDFPlugin*)info);

Let's put this at the top of the function so we can remove the casting above.

> Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:744
> +    // It's possible we previously encountered an invalid range and therefore disabled incremental loading,
> +    // but PDFDocument is still trying to read data using a different strategy.

Does this happen?

> Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:750
> +    auto debugPluginRef = plugin.copyRef();

I think you can just use plugin here.

> Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:859
> +        // The main thread dispatch below removes the last reference to the PDF thread.
> +        // It must be the last code executed in this function.
> +        callOnMainThread([protectedPlugin = WTFMove(protectedPlugin)] { });

callOnMainThread can execute its function concurrently at any time; so, this is a suspicious lifetime idiom, and probably not fully correct. What are we trying to accomplish here?
Comment 5 Geoffrey Garen 2020-04-08 12:47:20 PDT
Comment on attachment 395852 [details]
Patch

I accidentally overwrote thorton's r+, but now it's on purpose cuz EWS is angry.
Comment 6 Brady Eidson 2020-04-08 13:04:09 PDT
(In reply to Geoffrey Garen from comment #4)
> Comment on attachment 395852 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=395852&action=review
> 
> > Source/WebKit/ChangeLog:9
> > +        When we try to load a non-linearized PDF with PDFKit, it makes an outlandishly large range request
> > +        to try to verify the PDF file size.
> 
> Are we working around a bug in PDFKit here? If so, let's cite the Radar for
> that bug.
> 
> > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:142
> > +static const int32_t maximumRangeRequestPosition = std::numeric_limits<int32_t>::max();
> 
> Should this be off_t, since the position argument passed to
> dataProviderGetBytesAtPositionCallback is off_t?
> 
> What is the actual request we get from PDFKit in practice? Maybe we can bump
> this number up, and do a direct comparison instead of a > comparison. 

It's not constant, and it's closer to int64_t max.

> This limit would prohibit streaming of any > 4GB PDF resource. While 4GB is very
> big, it doesn't strike me as impossibly big, and refusing to stream
> something because it was big would be a sad paradox.

I can make it "bigger than 4gb"

> 
> > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:701
> > +void PDFPlugin::receivedInvalidRangeRequest()
> 
> I think this code would be clearer if we named and/or commented it according
> to the behavior we are working around / modeling. Maybe something like
> 
>     maximumRangeRequestPosition => nonLinearizedPDFSentinel
>     receivedInvalidRangeRequest => receivedNonLinearizedPDFSentinel

I'm fine with these

> 
> > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:741
> > +    Ref<PDFPlugin> plugin = *((PDFPlugin*)info);
> 
> Let's put this at the top of the function so we can remove the casting above.
> 
> > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:744
> > +    // It's possible we previously encountered an invalid range and therefore disabled incremental loading,
> > +    // but PDFDocument is still trying to read data using a different strategy.
> 
> Does this happen?

Yes.

> > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:750
> > +    auto debugPluginRef = plugin.copyRef();
> 
> I think you can just use plugin here.

Need a non-moved copy of plugin down at the bottom of the function for logging.

> 
> > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:859
> > +        // The main thread dispatch below removes the last reference to the PDF thread.
> > +        // It must be the last code executed in this function.
> > +        callOnMainThread([protectedPlugin = WTFMove(protectedPlugin)] { });
> 
> callOnMainThread can execute its function concurrently at any time; so, this
> is a suspicious lifetime idiom, and probably not fully correct. What are we
> trying to accomplish here?

It's a scope exit - It only runs the functor as the enclosing scope (e.g. ::threadEntry()) finishes up.

This is just moving the final "callOnMainThread" to a scope exit because there's now multiple return paths from this function.
Comment 7 Brady Eidson 2020-04-08 13:40:49 PDT
Created attachment 395861 [details]
Patch
Comment 8 Geoffrey Garen 2020-04-08 13:44:09 PDT
> > callOnMainThread can execute its function concurrently at any time; so, this
> > is a suspicious lifetime idiom, and probably not fully correct. What are we
> > trying to accomplish here?
> 
> It's a scope exit - It only runs the functor as the enclosing scope (e.g.
> ::threadEntry()) finishes up.
> 
> This is just moving the final "callOnMainThread" to a scope exit because
> there's now multiple return paths from this function.

How about changing the comment to "Keep PDFPlugin alive until the end of this function and the end of the last main thread task submitted by this function".
Comment 9 Brady Eidson 2020-04-08 13:49:36 PDT
Created attachment 395863 [details]
Patch
Comment 10 EWS 2020-04-08 15:15:22 PDT
Committed r259760: <https://trac.webkit.org/changeset/259760>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 395863 [details].