Fix handling non-linearized PDFs when incremental PDF loading is enabled <rdar://problem/60619506>
Created attachment 395852 [details] Patch
/Volumes/Data/worker/macOS-Mojave-Debug-Build-EWS/build/Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:1536:5: error: use of undeclared identifier 'pdfLog'
Comment on attachment 395852 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395852&action=review > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:142 > +static const int32_t maximumRangeRequestPosition = std::numeric_limits<int32_t>::max(); INT32_MAX is a reasonable maximum? Seems ... big.
Comment on attachment 395852 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395852&action=review > Source/WebKit/ChangeLog:9 > + When we try to load a non-linearized PDF with PDFKit, it makes an outlandishly large range request > + to try to verify the PDF file size. Are we working around a bug in PDFKit here? If so, let's cite the Radar for that bug. > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:142 > +static const int32_t maximumRangeRequestPosition = std::numeric_limits<int32_t>::max(); Should this be off_t, since the position argument passed to dataProviderGetBytesAtPositionCallback is off_t? What is the actual request we get from PDFKit in practice? Maybe we can bump this number up, and do a direct comparison instead of a > comparison. This limit would prohibit streaming of any > 4GB PDF resource. While 4GB is very big, it doesn't strike me as impossibly big, and refusing to stream something because it was big would be a sad paradox. > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:701 > +void PDFPlugin::receivedInvalidRangeRequest() I think this code would be clearer if we named and/or commented it according to the behavior we are working around / modeling. Maybe something like maximumRangeRequestPosition => nonLinearizedPDFSentinel receivedInvalidRangeRequest => receivedNonLinearizedPDFSentinel > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:741 > + Ref<PDFPlugin> plugin = *((PDFPlugin*)info); Let's put this at the top of the function so we can remove the casting above. > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:744 > + // It's possible we previously encountered an invalid range and therefore disabled incremental loading, > + // but PDFDocument is still trying to read data using a different strategy. Does this happen? > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:750 > + auto debugPluginRef = plugin.copyRef(); I think you can just use plugin here. > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:859 > + // The main thread dispatch below removes the last reference to the PDF thread. > + // It must be the last code executed in this function. > + callOnMainThread([protectedPlugin = WTFMove(protectedPlugin)] { }); callOnMainThread can execute its function concurrently at any time; so, this is a suspicious lifetime idiom, and probably not fully correct. What are we trying to accomplish here?
Comment on attachment 395852 [details] Patch I accidentally overwrote thorton's r+, but now it's on purpose cuz EWS is angry.
(In reply to Geoffrey Garen from comment #4) > Comment on attachment 395852 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=395852&action=review > > > Source/WebKit/ChangeLog:9 > > + When we try to load a non-linearized PDF with PDFKit, it makes an outlandishly large range request > > + to try to verify the PDF file size. > > Are we working around a bug in PDFKit here? If so, let's cite the Radar for > that bug. > > > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:142 > > +static const int32_t maximumRangeRequestPosition = std::numeric_limits<int32_t>::max(); > > Should this be off_t, since the position argument passed to > dataProviderGetBytesAtPositionCallback is off_t? > > What is the actual request we get from PDFKit in practice? Maybe we can bump > this number up, and do a direct comparison instead of a > comparison. It's not constant, and it's closer to int64_t max. > This limit would prohibit streaming of any > 4GB PDF resource. While 4GB is very > big, it doesn't strike me as impossibly big, and refusing to stream > something because it was big would be a sad paradox. I can make it "bigger than 4gb" > > > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:701 > > +void PDFPlugin::receivedInvalidRangeRequest() > > I think this code would be clearer if we named and/or commented it according > to the behavior we are working around / modeling. Maybe something like > > maximumRangeRequestPosition => nonLinearizedPDFSentinel > receivedInvalidRangeRequest => receivedNonLinearizedPDFSentinel I'm fine with these > > > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:741 > > + Ref<PDFPlugin> plugin = *((PDFPlugin*)info); > > Let's put this at the top of the function so we can remove the casting above. > > > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:744 > > + // It's possible we previously encountered an invalid range and therefore disabled incremental loading, > > + // but PDFDocument is still trying to read data using a different strategy. > > Does this happen? Yes. > > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:750 > > + auto debugPluginRef = plugin.copyRef(); > > I think you can just use plugin here. Need a non-moved copy of plugin down at the bottom of the function for logging. > > > Source/WebKit/WebProcess/Plugins/PDF/PDFPlugin.mm:859 > > + // The main thread dispatch below removes the last reference to the PDF thread. > > + // It must be the last code executed in this function. > > + callOnMainThread([protectedPlugin = WTFMove(protectedPlugin)] { }); > > callOnMainThread can execute its function concurrently at any time; so, this > is a suspicious lifetime idiom, and probably not fully correct. What are we > trying to accomplish here? It's a scope exit - It only runs the functor as the enclosing scope (e.g. ::threadEntry()) finishes up. This is just moving the final "callOnMainThread" to a scope exit because there's now multiple return paths from this function.
Created attachment 395861 [details] Patch
> > callOnMainThread can execute its function concurrently at any time; so, this > > is a suspicious lifetime idiom, and probably not fully correct. What are we > > trying to accomplish here? > > It's a scope exit - It only runs the functor as the enclosing scope (e.g. > ::threadEntry()) finishes up. > > This is just moving the final "callOnMainThread" to a scope exit because > there's now multiple return paths from this function. How about changing the comment to "Keep PDFPlugin alive until the end of this function and the end of the last main thread task submitted by this function".
Created attachment 395863 [details] Patch
Committed r259760: <https://trac.webkit.org/changeset/259760> All reviewed patches have been landed. Closing bug and clearing flags on attachment 395863 [details].