WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
165993
WebAssembly: We should compile wasm functions in parallel
https://bugs.webkit.org/show_bug.cgi?id=165993
Summary
WebAssembly: We should compile wasm functions in parallel
Saam Barati
Reported
2016-12-17 00:04:49 PST
Right now, compiling the unity benchmark takes 60 seconds on my machine. It's compiling 34341 functions. We should be way faster. We can fork off some threads that do compilation, and then allow the main thread to be woken up to do linking. Or we can allow the main thread to also compile, and then link on the main thread at the very end. It's probably more profitable to have the main thread also compile things.
Attachments
WIP
(20.29 KB, patch)
2016-12-18 13:23 PST
,
Saam Barati
no flags
Details
Formatted Diff
Diff
patch
(20.77 KB, patch)
2016-12-19 14:14 PST
,
Saam Barati
no flags
Details
Formatted Diff
Diff
patch
(20.65 KB, patch)
2016-12-19 16:22 PST
,
Saam Barati
no flags
Details
Formatted Diff
Diff
patch
(22.71 KB, patch)
2016-12-20 11:57 PST
,
Saam Barati
no flags
Details
Formatted Diff
Diff
patch
(22.22 KB, patch)
2016-12-20 12:17 PST
,
Saam Barati
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Saam Barati
Comment 1
2016-12-17 00:06:22 PST
It only takes ~99 milliseconds to validate those functions.
Saam Barati
Comment 2
2016-12-18 13:23:09 PST
Created
attachment 297445
[details]
WIP This is a pretty simple parallel compiler and I think it works. It makes compiling unity more than 4x faster on my MBP. I have some some other patch dependencies in here, so I'll wait for those to land before putting this one up for review. Comments welcome now, though.
Saam Barati
Comment 3
2016-12-18 13:24:03 PST
Comment on
attachment 297445
[details]
WIP View in context:
https://bugs.webkit.org/attachment.cgi?id=297445&action=review
> Source/JavaScriptCore/wasm/WasmPlan.cpp:144 > + constexpr uint32_t threadCount = 8;
I need to figure out the API to get the number of core's on the CPU we're running on here, instead of the constant 8.
JF Bastien
Comment 4
2016-12-18 13:44:56 PST
Comment on
attachment 297445
[details]
WIP View in context:
https://bugs.webkit.org/attachment.cgi?id=297445&action=review
> Source/JavaScriptCore/wasm/WasmPlan.cpp:97 > + //dataLogLn("Took ", std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now() - start).count(), "ms to validate");
Debug.
> Source/JavaScriptCore/wasm/WasmPlan.cpp:118 > + //Vector<Vector<UnlinkedWasmToWasmCall>> unlinkedWasmToWasmCalls;
?
> Source/JavaScriptCore/wasm/WasmPlan.cpp:140 > + m_compilationContexts.resize(m_functionLocationInBinary.size());
I'd do this right after tryReserveCapacity.
>> Source/JavaScriptCore/wasm/WasmPlan.cpp:144 >> + constexpr uint32_t threadCount = 8; > > I need to figure out the API to get the number of core's on the CPU we're running on here, instead of the constant 8.
std::thread::hardware_concurrency for number of cores. Or does WebKit have its own thing?
> Source/JavaScriptCore/wasm/WasmPlan.cpp:167 > + ASSERT(validateFunction(functionStart, functionLength, signature, m_functionIndexSpace, *m_moduleInformation));
Why here as well?
> Source/JavaScriptCore/wasm/WasmPlan.cpp:189 > + //dataLogLn("Took ", std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::steady_clock::now() - start).count(), "us to do ending loop");
Debug.
Saam Barati
Comment 5
2016-12-19 14:14:41 PST
Created
attachment 297478
[details]
patch
WebKit Commit Bot
Comment 6
2016-12-19 14:16:38 PST
Attachment 297478
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/wasm/WasmPlan.cpp:212: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 1 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Mark Lam
Comment 7
2016-12-19 14:36:15 PST
Comment on
attachment 297478
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=297478&action=review
> Source/JavaScriptCore/ChangeLog:12 > + inside Wasmb3IRGenerator to be thread safe.
drive by comment: /b3/B3/
Filip Pizlo
Comment 8
2016-12-19 14:49:47 PST
Comment on
attachment 297478
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=297478&action=review
> Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:934 > + B3::generate(procedure, *compilationContext.wasmEntrypointJIT);
In the FTL we run generate() with heap access and prepareForGeneration without heap access, so the GC can interleave during the more expensive prepare phase. Not sure that matters at all here.
> Source/JavaScriptCore/wasm/WasmPlan.cpp:64 > + std::chrono::steady_clock::time_point startTime;
Please use MonotonicTime.
> Source/JavaScriptCore/wasm/WasmPlan.cpp:101 > + dataLogLn("Took ", std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::steady_clock::now() - startTime).count(), " us to validate module");
So much less code if you use MonotonicTime.
> Source/JavaScriptCore/wasm/WasmPlan.cpp:164 > + m_lock.lock(); > + if (m_currentIndex >= m_functionLocationInBinary.size()) { > + m_lock.unlock(); > + return; > + } > + uint32_t functionIndex = m_currentIndex; > + ++m_currentIndex; > + m_lock.unlock();
Please use a RAII thing, like: { auto locker = holdLock(m_lock); if (stuff) return; things } Also, please consider the following: - ParallelHelperPool for compiling. This prevents you from starting 64 threads if 8 workers try to wasm compile at the same time. - ParallelVectorIterator for doing the m_currentIndex stuff.
> Source/JavaScriptCore/wasm/WasmPlan.cpp:185 > + m_lock.lock(); > + if (!m_errorMessage) { > + // Multiple compiles could fail simultaneously. We arbitrarily choose the first. > + m_errorMessage = parseAndCompileResult.error(); // FIXME make this an Expected. > + } > + m_currentIndex = m_functionLocationInBinary.size(); > + m_lock.unlock();
Ditto.
> Source/JavaScriptCore/wasm/WasmPlan.cpp:195 > + std::chrono::steady_clock::time_point startTime;
MonotonicTime please.
Filip Pizlo
Comment 9
2016-12-19 14:52:00 PST
I think that out of my feedback, the most important is about std::chrono. Lets not use that anymore. MonotonicTime is what you want instead. ParallelVectorIterator seems like exactly what you're doing, so it might be a good idea. ParallelHelperPool is a neat optimization but we don't need it.
Saam Barati
Comment 10
2016-12-19 15:49:50 PST
(In reply to
comment #9
)
> I think that out of my feedback, the most important is about std::chrono. > Lets not use that anymore. MonotonicTime is what you want instead. > > ParallelVectorIterator seems like exactly what you're doing, so it might be > a good idea. > > ParallelHelperPool is a neat optimization but we don't need it.
ParallelHelperPool::runFunctionInParallel is what I want. However, I probably want a ParallelHelperPool per process for compiling Wasm, instead of separate ones per module. I'll start with just having one for the Module, but open a bug to have this shared across module compilations.
Saam Barati
Comment 11
2016-12-19 15:50:20 PST
(In reply to
comment #10
)
> (In reply to
comment #9
) > > I think that out of my feedback, the most important is about std::chrono. > > Lets not use that anymore. MonotonicTime is what you want instead. > > > > ParallelVectorIterator seems like exactly what you're doing, so it might be > > a good idea. > > > > ParallelHelperPool is a neat optimization but we don't need it. > > ParallelHelperPool::runFunctionInParallel is what I want. However, I > probably want a ParallelHelperPool per process for compiling Wasm, instead > of separate ones per module. I'll start with just having one for the Module, > but open a bug to have this shared across module compilations.
This will probably be more important once we have concurrent module compilation and may have multiple concurrent compilations happening simultaneously.
Saam Barati
Comment 12
2016-12-19 16:03:51 PST
(In reply to
comment #11
)
> (In reply to
comment #10
) > > (In reply to
comment #9
) > > > I think that out of my feedback, the most important is about std::chrono. > > > Lets not use that anymore. MonotonicTime is what you want instead. > > > > > > ParallelVectorIterator seems like exactly what you're doing, so it might be > > > a good idea. > > > > > > ParallelHelperPool is a neat optimization but we don't need it. > > > > ParallelHelperPool::runFunctionInParallel is what I want. However, I > > probably want a ParallelHelperPool per process for compiling Wasm, instead > > of separate ones per module. I'll start with just having one for the Module, > > but open a bug to have this shared across module compilations. > > This will probably be more important once we have concurrent module > compilation and may have multiple concurrent compilations happening > simultaneously.
Actually, I think that ParallelHelper might be overkill for what I'm doing now. I think we should move to it once we have truly concurrent compilations. I guess I could also create one ParallelHelperPool per process, but I think that's the only situation where this will make a difference now.
Saam Barati
Comment 13
2016-12-19 16:22:08 PST
Created
attachment 297492
[details]
patch Address Fil's feedback w.r.t MonotinicTime and LockHolder.
WebKit Commit Bot
Comment 14
2016-12-19 16:23:27 PST
Attachment 297492
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/wasm/WasmPlan.cpp:214: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 1 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Saam Barati
Comment 15
2016-12-19 16:46:44 PST
Hmm, I'm getting a debug crash while compiling unity where it looks like we're stack overflowing. I'm investigating.
Filip Pizlo
Comment 16
2016-12-19 16:48:43 PST
(In reply to
comment #12
)
> (In reply to
comment #11
) > > (In reply to
comment #10
) > > > (In reply to
comment #9
) > > > > I think that out of my feedback, the most important is about std::chrono. > > > > Lets not use that anymore. MonotonicTime is what you want instead. > > > > > > > > ParallelVectorIterator seems like exactly what you're doing, so it might be > > > > a good idea. > > > > > > > > ParallelHelperPool is a neat optimization but we don't need it. > > > > > > ParallelHelperPool::runFunctionInParallel is what I want. However, I > > > probably want a ParallelHelperPool per process for compiling Wasm, instead > > > of separate ones per module. I'll start with just having one for the Module, > > > but open a bug to have this shared across module compilations. > > > > This will probably be more important once we have concurrent module > > compilation and may have multiple concurrent compilations happening > > simultaneously. > > Actually, I think that ParallelHelper might be overkill for what I'm doing > now. I think we should move to it once we have truly concurrent > compilations. I guess I could also create one ParallelHelperPool per > process, but I think that's the only situation where this will make a > difference now.
What does it mean to have truly concurrent compilations?
Filip Pizlo
Comment 17
2016-12-19 16:49:52 PST
Comment on
attachment 297492
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=297492&action=review
> Source/JavaScriptCore/wasm/WasmPlan.cpp:104 > + dataLogLn("Took ", Seconds(MonotonicTime::now() - startTime).microseconds(), " us to validate module");
You don't need Seconds(...). You can simply do (MonotonicTime::now() - startTime).microseconds(). The type of subtracting MonotonicTimes is Seconds.
> Source/JavaScriptCore/wasm/WasmPlan.cpp:236 > + dataLogLn("Took ", Seconds(MonotonicTime::now() - startTime).microseconds(),
Ditto.
Saam Barati
Comment 18
2016-12-19 16:55:27 PST
(In reply to
comment #15
)
> Hmm, I'm getting a debug crash while compiling unity where it looks like > we're stack overflowing. > > I'm investigating.
I think it's because the Wasm Bound checks was a SharedTask instead of an std::function. (This might also be why we're throwing out of bounds memory exceptions!)
Saam Barati
Comment 19
2016-12-19 16:56:14 PST
(In reply to
comment #16
)
> (In reply to
comment #12
) > > (In reply to
comment #11
) > > > (In reply to
comment #10
) > > > > (In reply to
comment #9
) > > > > > I think that out of my feedback, the most important is about std::chrono. > > > > > Lets not use that anymore. MonotonicTime is what you want instead. > > > > > > > > > > ParallelVectorIterator seems like exactly what you're doing, so it might be > > > > > a good idea. > > > > > > > > > > ParallelHelperPool is a neat optimization but we don't need it. > > > > > > > > ParallelHelperPool::runFunctionInParallel is what I want. However, I > > > > probably want a ParallelHelperPool per process for compiling Wasm, instead > > > > of separate ones per module. I'll start with just having one for the Module, > > > > but open a bug to have this shared across module compilations. > > > > > > This will probably be more important once we have concurrent module > > > compilation and may have multiple concurrent compilations happening > > > simultaneously. > > > > Actually, I think that ParallelHelper might be overkill for what I'm doing > > now. I think we should move to it once we have truly concurrent > > compilations. I guess I could also create one ParallelHelperPool per > > process, but I think that's the only situation where this will make a > > difference now. > > What does it mean to have truly concurrent compilations?
While JavaScript code is executing, we can be compiling Wasm. When we're done compiling, we need to resolve the promise with a WebAssembly.Module object.
Filip Pizlo
Comment 20
2016-12-19 17:01:21 PST
(In reply to
comment #18
)
> (In reply to
comment #15
) > > Hmm, I'm getting a debug crash while compiling unity where it looks like > > we're stack overflowing. > > > > I'm investigating. > > I think it's because the Wasm Bound checks was a SharedTask instead of an > std::function. (This might also be why we're throwing out of bounds memory > exceptions!)
I think that it's important to avoid using std::function in B3, because std::function involves expected copying and system malloc. The difference between them is that std::function copies the closure's state around on assignment. I think we want to avoid such state copying. That's why we use WTF::Function, WTF::ScopedLambda, and WTF::SharedTask to get different kinds of zero-copy closures.
Saam Barati
Comment 21
2016-12-19 17:06:22 PST
(In reply to
comment #18
)
> (In reply to
comment #15
) > > Hmm, I'm getting a debug crash while compiling unity where it looks like > > we're stack overflowing. > > > > I'm investigating. > > I think it's because the Wasm Bound checks was a SharedTask instead of an > std::function. (This might also be why we're throwing out of bounds memory > exceptions!)
Ignore me.
Saam Barati
Comment 22
2016-12-19 17:24:28 PST
(In reply to
comment #21
)
> (In reply to
comment #18
) > > (In reply to
comment #15
) > > > Hmm, I'm getting a debug crash while compiling unity where it looks like > > > we're stack overflowing. > > > > > > I'm investigating. > > > > I think it's because the Wasm Bound checks was a SharedTask instead of an > > std::function. (This might also be why we're throwing out of bounds memory > > exceptions!) > > Ignore me.
The bug is this code: ``` static CCallHelpers::Jump generate(Inst& inst, CCallHelpers& jit, GenerationContext& context) { WasmBoundsCheckValue* value = inst.origin->as<WasmBoundsCheckValue>(); CCallHelpers::Jump outOfBounds = Inst(Air::Branch64, value, Arg::relCond(CCallHelpers::AboveOrEqual), inst.args[0], inst.args[1]).generate(jit, context); context.latePaths.append(createSharedTask<GenerationContext::LatePathFunction>( [=] (CCallHelpers& jit, Air::GenerationContext&) { outOfBounds.link(&jit); context.code->wasmBoundsCheckGenerator()->run(jit, value->pinnedGPR(), value->offset()); })); // We said we were not a terminal. return CCallHelpers::Jump(); } ``` Which is copying the Context! That's a no-no. I'll make it so that we don't allow this to happen elsewhere.
Saam Barati
Comment 23
2016-12-19 17:30:25 PST
(In reply to
comment #22
)
> (In reply to
comment #21
) > > (In reply to
comment #18
) > > > (In reply to
comment #15
) > > > > Hmm, I'm getting a debug crash while compiling unity where it looks like > > > > we're stack overflowing. > > > > > > > > I'm investigating. > > > > > > I think it's because the Wasm Bound checks was a SharedTask instead of an > > > std::function. (This might also be why we're throwing out of bounds memory > > > exceptions!) > > > > Ignore me. > > The bug is this code: > ``` > static CCallHelpers::Jump generate(Inst& inst, CCallHelpers& jit, > GenerationContext& context) > { > WasmBoundsCheckValue* value = > inst.origin->as<WasmBoundsCheckValue>(); > CCallHelpers::Jump outOfBounds = Inst(Air::Branch64, value, > Arg::relCond(CCallHelpers::AboveOrEqual), inst.args[0], > inst.args[1]).generate(jit, context); > > > context.latePaths.append(createSharedTask<GenerationContext:: > LatePathFunction>( > [=] (CCallHelpers& jit, Air::GenerationContext&) { > outOfBounds.link(&jit); > context.code->wasmBoundsCheckGenerator()->run(jit, > value->pinnedGPR(), value->offset()); > })); > > // We said we were not a terminal. > return CCallHelpers::Jump(); > } > ``` > > Which is copying the Context! That's a no-no. I'll make it so that we don't > allow this to happen elsewhere.
I'm still not entirely sure why that's the bug. But it appears to be it.
Saam Barati
Comment 24
2016-12-20 11:57:25 PST
Created
attachment 297544
[details]
patch
Saam Barati
Comment 25
2016-12-20 12:12:58 PST
<
rdar://problem/29758107
>
Saam Barati
Comment 26
2016-12-20 12:17:07 PST
Created
attachment 297546
[details]
patch
WebKit Commit Bot
Comment 27
2016-12-20 12:19:24 PST
Attachment 297546
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/wasm/WasmPlan.cpp:210: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 1 in 8 files If any of these errors are false positives, please file a bug against check-webkit-style.
Keith Miller
Comment 28
2016-12-20 15:55:28 PST
Comment on
attachment 297546
[details]
patch r=me.
WebKit Commit Bot
Comment 29
2016-12-20 16:30:18 PST
Comment on
attachment 297546
[details]
patch Clearing flags on attachment: 297546 Committed
r210047
: <
http://trac.webkit.org/changeset/210047
>
WebKit Commit Bot
Comment 30
2016-12-20 16:30:25 PST
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug