Bug 165993

Summary: WebAssembly: We should compile wasm functions in parallel
Product: WebKit Reporter: Saam Barati <saam>
Component: JavaScriptCoreAssignee: Saam Barati <saam>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, commit-queue, fpizlo, ggaren, gskachkov, jfbastien, keith_miller, mark.lam, msaboff, oliver, ossy, ticaiolima, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
WIP
none
patch
none
patch
none
patch
none
patch none

Saam Barati
Reported 2016-12-17 00:04:49 PST
Right now, compiling the unity benchmark takes 60 seconds on my machine. It's compiling 34341 functions. We should be way faster. We can fork off some threads that do compilation, and then allow the main thread to be woken up to do linking. Or we can allow the main thread to also compile, and then link on the main thread at the very end. It's probably more profitable to have the main thread also compile things.
Attachments
WIP (20.29 KB, patch)
2016-12-18 13:23 PST, Saam Barati
no flags
patch (20.77 KB, patch)
2016-12-19 14:14 PST, Saam Barati
no flags
patch (20.65 KB, patch)
2016-12-19 16:22 PST, Saam Barati
no flags
patch (22.71 KB, patch)
2016-12-20 11:57 PST, Saam Barati
no flags
patch (22.22 KB, patch)
2016-12-20 12:17 PST, Saam Barati
no flags
Saam Barati
Comment 1 2016-12-17 00:06:22 PST
It only takes ~99 milliseconds to validate those functions.
Saam Barati
Comment 2 2016-12-18 13:23:09 PST
Created attachment 297445 [details] WIP This is a pretty simple parallel compiler and I think it works. It makes compiling unity more than 4x faster on my MBP. I have some some other patch dependencies in here, so I'll wait for those to land before putting this one up for review. Comments welcome now, though.
Saam Barati
Comment 3 2016-12-18 13:24:03 PST
Comment on attachment 297445 [details] WIP View in context: https://bugs.webkit.org/attachment.cgi?id=297445&action=review > Source/JavaScriptCore/wasm/WasmPlan.cpp:144 > + constexpr uint32_t threadCount = 8; I need to figure out the API to get the number of core's on the CPU we're running on here, instead of the constant 8.
JF Bastien
Comment 4 2016-12-18 13:44:56 PST
Comment on attachment 297445 [details] WIP View in context: https://bugs.webkit.org/attachment.cgi?id=297445&action=review > Source/JavaScriptCore/wasm/WasmPlan.cpp:97 > + //dataLogLn("Took ", std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now() - start).count(), "ms to validate"); Debug. > Source/JavaScriptCore/wasm/WasmPlan.cpp:118 > + //Vector<Vector<UnlinkedWasmToWasmCall>> unlinkedWasmToWasmCalls; ? > Source/JavaScriptCore/wasm/WasmPlan.cpp:140 > + m_compilationContexts.resize(m_functionLocationInBinary.size()); I'd do this right after tryReserveCapacity. >> Source/JavaScriptCore/wasm/WasmPlan.cpp:144 >> + constexpr uint32_t threadCount = 8; > > I need to figure out the API to get the number of core's on the CPU we're running on here, instead of the constant 8. std::thread::hardware_concurrency for number of cores. Or does WebKit have its own thing? > Source/JavaScriptCore/wasm/WasmPlan.cpp:167 > + ASSERT(validateFunction(functionStart, functionLength, signature, m_functionIndexSpace, *m_moduleInformation)); Why here as well? > Source/JavaScriptCore/wasm/WasmPlan.cpp:189 > + //dataLogLn("Took ", std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::steady_clock::now() - start).count(), "us to do ending loop"); Debug.
Saam Barati
Comment 5 2016-12-19 14:14:41 PST
WebKit Commit Bot
Comment 6 2016-12-19 14:16:38 PST
Attachment 297478 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/wasm/WasmPlan.cpp:212: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 1 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Mark Lam
Comment 7 2016-12-19 14:36:15 PST
Comment on attachment 297478 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=297478&action=review > Source/JavaScriptCore/ChangeLog:12 > + inside Wasmb3IRGenerator to be thread safe. drive by comment: /b3/B3/
Filip Pizlo
Comment 8 2016-12-19 14:49:47 PST
Comment on attachment 297478 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=297478&action=review > Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:934 > + B3::generate(procedure, *compilationContext.wasmEntrypointJIT); In the FTL we run generate() with heap access and prepareForGeneration without heap access, so the GC can interleave during the more expensive prepare phase. Not sure that matters at all here. > Source/JavaScriptCore/wasm/WasmPlan.cpp:64 > + std::chrono::steady_clock::time_point startTime; Please use MonotonicTime. > Source/JavaScriptCore/wasm/WasmPlan.cpp:101 > + dataLogLn("Took ", std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::steady_clock::now() - startTime).count(), " us to validate module"); So much less code if you use MonotonicTime. > Source/JavaScriptCore/wasm/WasmPlan.cpp:164 > + m_lock.lock(); > + if (m_currentIndex >= m_functionLocationInBinary.size()) { > + m_lock.unlock(); > + return; > + } > + uint32_t functionIndex = m_currentIndex; > + ++m_currentIndex; > + m_lock.unlock(); Please use a RAII thing, like: { auto locker = holdLock(m_lock); if (stuff) return; things } Also, please consider the following: - ParallelHelperPool for compiling. This prevents you from starting 64 threads if 8 workers try to wasm compile at the same time. - ParallelVectorIterator for doing the m_currentIndex stuff. > Source/JavaScriptCore/wasm/WasmPlan.cpp:185 > + m_lock.lock(); > + if (!m_errorMessage) { > + // Multiple compiles could fail simultaneously. We arbitrarily choose the first. > + m_errorMessage = parseAndCompileResult.error(); // FIXME make this an Expected. > + } > + m_currentIndex = m_functionLocationInBinary.size(); > + m_lock.unlock(); Ditto. > Source/JavaScriptCore/wasm/WasmPlan.cpp:195 > + std::chrono::steady_clock::time_point startTime; MonotonicTime please.
Filip Pizlo
Comment 9 2016-12-19 14:52:00 PST
I think that out of my feedback, the most important is about std::chrono. Lets not use that anymore. MonotonicTime is what you want instead. ParallelVectorIterator seems like exactly what you're doing, so it might be a good idea. ParallelHelperPool is a neat optimization but we don't need it.
Saam Barati
Comment 10 2016-12-19 15:49:50 PST
(In reply to comment #9) > I think that out of my feedback, the most important is about std::chrono. > Lets not use that anymore. MonotonicTime is what you want instead. > > ParallelVectorIterator seems like exactly what you're doing, so it might be > a good idea. > > ParallelHelperPool is a neat optimization but we don't need it. ParallelHelperPool::runFunctionInParallel is what I want. However, I probably want a ParallelHelperPool per process for compiling Wasm, instead of separate ones per module. I'll start with just having one for the Module, but open a bug to have this shared across module compilations.
Saam Barati
Comment 11 2016-12-19 15:50:20 PST
(In reply to comment #10) > (In reply to comment #9) > > I think that out of my feedback, the most important is about std::chrono. > > Lets not use that anymore. MonotonicTime is what you want instead. > > > > ParallelVectorIterator seems like exactly what you're doing, so it might be > > a good idea. > > > > ParallelHelperPool is a neat optimization but we don't need it. > > ParallelHelperPool::runFunctionInParallel is what I want. However, I > probably want a ParallelHelperPool per process for compiling Wasm, instead > of separate ones per module. I'll start with just having one for the Module, > but open a bug to have this shared across module compilations. This will probably be more important once we have concurrent module compilation and may have multiple concurrent compilations happening simultaneously.
Saam Barati
Comment 12 2016-12-19 16:03:51 PST
(In reply to comment #11) > (In reply to comment #10) > > (In reply to comment #9) > > > I think that out of my feedback, the most important is about std::chrono. > > > Lets not use that anymore. MonotonicTime is what you want instead. > > > > > > ParallelVectorIterator seems like exactly what you're doing, so it might be > > > a good idea. > > > > > > ParallelHelperPool is a neat optimization but we don't need it. > > > > ParallelHelperPool::runFunctionInParallel is what I want. However, I > > probably want a ParallelHelperPool per process for compiling Wasm, instead > > of separate ones per module. I'll start with just having one for the Module, > > but open a bug to have this shared across module compilations. > > This will probably be more important once we have concurrent module > compilation and may have multiple concurrent compilations happening > simultaneously. Actually, I think that ParallelHelper might be overkill for what I'm doing now. I think we should move to it once we have truly concurrent compilations. I guess I could also create one ParallelHelperPool per process, but I think that's the only situation where this will make a difference now.
Saam Barati
Comment 13 2016-12-19 16:22:08 PST
Created attachment 297492 [details] patch Address Fil's feedback w.r.t MonotinicTime and LockHolder.
WebKit Commit Bot
Comment 14 2016-12-19 16:23:27 PST
Attachment 297492 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/wasm/WasmPlan.cpp:214: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 1 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Saam Barati
Comment 15 2016-12-19 16:46:44 PST
Hmm, I'm getting a debug crash while compiling unity where it looks like we're stack overflowing. I'm investigating.
Filip Pizlo
Comment 16 2016-12-19 16:48:43 PST
(In reply to comment #12) > (In reply to comment #11) > > (In reply to comment #10) > > > (In reply to comment #9) > > > > I think that out of my feedback, the most important is about std::chrono. > > > > Lets not use that anymore. MonotonicTime is what you want instead. > > > > > > > > ParallelVectorIterator seems like exactly what you're doing, so it might be > > > > a good idea. > > > > > > > > ParallelHelperPool is a neat optimization but we don't need it. > > > > > > ParallelHelperPool::runFunctionInParallel is what I want. However, I > > > probably want a ParallelHelperPool per process for compiling Wasm, instead > > > of separate ones per module. I'll start with just having one for the Module, > > > but open a bug to have this shared across module compilations. > > > > This will probably be more important once we have concurrent module > > compilation and may have multiple concurrent compilations happening > > simultaneously. > > Actually, I think that ParallelHelper might be overkill for what I'm doing > now. I think we should move to it once we have truly concurrent > compilations. I guess I could also create one ParallelHelperPool per > process, but I think that's the only situation where this will make a > difference now. What does it mean to have truly concurrent compilations?
Filip Pizlo
Comment 17 2016-12-19 16:49:52 PST
Comment on attachment 297492 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=297492&action=review > Source/JavaScriptCore/wasm/WasmPlan.cpp:104 > + dataLogLn("Took ", Seconds(MonotonicTime::now() - startTime).microseconds(), " us to validate module"); You don't need Seconds(...). You can simply do (MonotonicTime::now() - startTime).microseconds(). The type of subtracting MonotonicTimes is Seconds. > Source/JavaScriptCore/wasm/WasmPlan.cpp:236 > + dataLogLn("Took ", Seconds(MonotonicTime::now() - startTime).microseconds(), Ditto.
Saam Barati
Comment 18 2016-12-19 16:55:27 PST
(In reply to comment #15) > Hmm, I'm getting a debug crash while compiling unity where it looks like > we're stack overflowing. > > I'm investigating. I think it's because the Wasm Bound checks was a SharedTask instead of an std::function. (This might also be why we're throwing out of bounds memory exceptions!)
Saam Barati
Comment 19 2016-12-19 16:56:14 PST
(In reply to comment #16) > (In reply to comment #12) > > (In reply to comment #11) > > > (In reply to comment #10) > > > > (In reply to comment #9) > > > > > I think that out of my feedback, the most important is about std::chrono. > > > > > Lets not use that anymore. MonotonicTime is what you want instead. > > > > > > > > > > ParallelVectorIterator seems like exactly what you're doing, so it might be > > > > > a good idea. > > > > > > > > > > ParallelHelperPool is a neat optimization but we don't need it. > > > > > > > > ParallelHelperPool::runFunctionInParallel is what I want. However, I > > > > probably want a ParallelHelperPool per process for compiling Wasm, instead > > > > of separate ones per module. I'll start with just having one for the Module, > > > > but open a bug to have this shared across module compilations. > > > > > > This will probably be more important once we have concurrent module > > > compilation and may have multiple concurrent compilations happening > > > simultaneously. > > > > Actually, I think that ParallelHelper might be overkill for what I'm doing > > now. I think we should move to it once we have truly concurrent > > compilations. I guess I could also create one ParallelHelperPool per > > process, but I think that's the only situation where this will make a > > difference now. > > What does it mean to have truly concurrent compilations? While JavaScript code is executing, we can be compiling Wasm. When we're done compiling, we need to resolve the promise with a WebAssembly.Module object.
Filip Pizlo
Comment 20 2016-12-19 17:01:21 PST
(In reply to comment #18) > (In reply to comment #15) > > Hmm, I'm getting a debug crash while compiling unity where it looks like > > we're stack overflowing. > > > > I'm investigating. > > I think it's because the Wasm Bound checks was a SharedTask instead of an > std::function. (This might also be why we're throwing out of bounds memory > exceptions!) I think that it's important to avoid using std::function in B3, because std::function involves expected copying and system malloc. The difference between them is that std::function copies the closure's state around on assignment. I think we want to avoid such state copying. That's why we use WTF::Function, WTF::ScopedLambda, and WTF::SharedTask to get different kinds of zero-copy closures.
Saam Barati
Comment 21 2016-12-19 17:06:22 PST
(In reply to comment #18) > (In reply to comment #15) > > Hmm, I'm getting a debug crash while compiling unity where it looks like > > we're stack overflowing. > > > > I'm investigating. > > I think it's because the Wasm Bound checks was a SharedTask instead of an > std::function. (This might also be why we're throwing out of bounds memory > exceptions!) Ignore me.
Saam Barati
Comment 22 2016-12-19 17:24:28 PST
(In reply to comment #21) > (In reply to comment #18) > > (In reply to comment #15) > > > Hmm, I'm getting a debug crash while compiling unity where it looks like > > > we're stack overflowing. > > > > > > I'm investigating. > > > > I think it's because the Wasm Bound checks was a SharedTask instead of an > > std::function. (This might also be why we're throwing out of bounds memory > > exceptions!) > > Ignore me. The bug is this code: ``` static CCallHelpers::Jump generate(Inst& inst, CCallHelpers& jit, GenerationContext& context) { WasmBoundsCheckValue* value = inst.origin->as<WasmBoundsCheckValue>(); CCallHelpers::Jump outOfBounds = Inst(Air::Branch64, value, Arg::relCond(CCallHelpers::AboveOrEqual), inst.args[0], inst.args[1]).generate(jit, context); context.latePaths.append(createSharedTask<GenerationContext::LatePathFunction>( [=] (CCallHelpers& jit, Air::GenerationContext&) { outOfBounds.link(&jit); context.code->wasmBoundsCheckGenerator()->run(jit, value->pinnedGPR(), value->offset()); })); // We said we were not a terminal. return CCallHelpers::Jump(); } ``` Which is copying the Context! That's a no-no. I'll make it so that we don't allow this to happen elsewhere.
Saam Barati
Comment 23 2016-12-19 17:30:25 PST
(In reply to comment #22) > (In reply to comment #21) > > (In reply to comment #18) > > > (In reply to comment #15) > > > > Hmm, I'm getting a debug crash while compiling unity where it looks like > > > > we're stack overflowing. > > > > > > > > I'm investigating. > > > > > > I think it's because the Wasm Bound checks was a SharedTask instead of an > > > std::function. (This might also be why we're throwing out of bounds memory > > > exceptions!) > > > > Ignore me. > > The bug is this code: > ``` > static CCallHelpers::Jump generate(Inst& inst, CCallHelpers& jit, > GenerationContext& context) > { > WasmBoundsCheckValue* value = > inst.origin->as<WasmBoundsCheckValue>(); > CCallHelpers::Jump outOfBounds = Inst(Air::Branch64, value, > Arg::relCond(CCallHelpers::AboveOrEqual), inst.args[0], > inst.args[1]).generate(jit, context); > > > context.latePaths.append(createSharedTask<GenerationContext:: > LatePathFunction>( > [=] (CCallHelpers& jit, Air::GenerationContext&) { > outOfBounds.link(&jit); > context.code->wasmBoundsCheckGenerator()->run(jit, > value->pinnedGPR(), value->offset()); > })); > > // We said we were not a terminal. > return CCallHelpers::Jump(); > } > ``` > > Which is copying the Context! That's a no-no. I'll make it so that we don't > allow this to happen elsewhere. I'm still not entirely sure why that's the bug. But it appears to be it.
Saam Barati
Comment 24 2016-12-20 11:57:25 PST
Saam Barati
Comment 25 2016-12-20 12:12:58 PST
Saam Barati
Comment 26 2016-12-20 12:17:07 PST
WebKit Commit Bot
Comment 27 2016-12-20 12:19:24 PST
Attachment 297546 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/wasm/WasmPlan.cpp:210: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 1 in 8 files If any of these errors are false positives, please file a bug against check-webkit-style.
Keith Miller
Comment 28 2016-12-20 15:55:28 PST
Comment on attachment 297546 [details] patch r=me.
WebKit Commit Bot
Comment 29 2016-12-20 16:30:18 PST
Comment on attachment 297546 [details] patch Clearing flags on attachment: 297546 Committed r210047: <http://trac.webkit.org/changeset/210047>
WebKit Commit Bot
Comment 30 2016-12-20 16:30:25 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.