Bug 153321

Summary: B3 CSE should be able to match a full redundancy even if none of the matches dominate the value in question
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, benjamin, commit-queue, ggaren, keith_miller, mario, mark.lam, mhahnenb, msaboff, oliver, saam, sam
Priority: P2    
Version: WebKit Nightly Build   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 150507    
Attachments:
Description Flags
wrote some code
none
makes FTL B3 faster than FTL LLVM on Octane/richards
none
passing so many tests
none
current performance standings
none
the patch benjamin: review+

Filip Pizlo
Reported 2016-01-21 13:48:53 PST
Patch forthcoming. This is important for Octane/richards. And probably other things too.
Attachments
wrote some code (12.99 KB, patch)
2016-01-21 13:50 PST, Filip Pizlo
no flags
makes FTL B3 faster than FTL LLVM on Octane/richards (19.62 KB, patch)
2016-01-21 15:28 PST, Filip Pizlo
no flags
passing so many tests (22.97 KB, patch)
2016-01-21 16:17 PST, Filip Pizlo
no flags
current performance standings (80.92 KB, text/plain)
2016-01-21 17:11 PST, Filip Pizlo
no flags
the patch (26.42 KB, patch)
2016-01-21 18:40 PST, Filip Pizlo
benjamin: review+
Filip Pizlo
Comment 1 2016-01-21 13:50:05 PST
Created attachment 269488 [details] wrote some code
Filip Pizlo
Comment 2 2016-01-21 15:28:15 PST
Created attachment 269504 [details] makes FTL B3 faster than FTL LLVM on Octane/richards I should probably run more than just richards before r?.
Geoffrey Garen
Comment 3 2016-01-21 15:45:25 PST
Comment on attachment 269504 [details] makes FTL B3 faster than FTL LLVM on Octane/richards View in context: https://bugs.webkit.org/attachment.cgi?id=269504&action=review > Source/JavaScriptCore/dfg/DFGCommon.h:41 > -#define FTL_USES_B3 0 > +#define FTL_USES_B3 1 Revert.
Filip Pizlo
Comment 4 2016-01-21 15:46:46 PST
(In reply to comment #3) > Comment on attachment 269504 [details] > makes FTL B3 faster than FTL LLVM on Octane/richards > > View in context: > https://bugs.webkit.org/attachment.cgi?id=269504&action=review > > > Source/JavaScriptCore/dfg/DFGCommon.h:41 > > -#define FTL_USES_B3 0 > > +#define FTL_USES_B3 1 > > Revert. I will this time!
Filip Pizlo
Comment 5 2016-01-21 16:17:02 PST
Created attachment 269513 [details] passing so many tests
Filip Pizlo
Comment 6 2016-01-21 17:11:27 PST
Created attachment 269523 [details] current performance standings
Benjamin Poulain
Comment 7 2016-01-21 18:18:46 PST
Comment on attachment 269513 [details] passing so many tests View in context: https://bugs.webkit.org/attachment.cgi?id=269513&action=review I have a question for Load8Z and Load16Z. > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:64 > typedef Vector<MemoryValue*, 1> MemoryMatches; Looks like this can be scoped inside the CSE now. > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:211 > return candidate->offset() == offset > && (candidate->opcode() == Load8Z || candidate->opcode() == Store8); Say I have 3 blocks: A, B, C. A: @1 = Const32(0xffffffff) @2 = Store8(@1, @mem) @3 = Jump C. B: @4 = Const32(0xff00ff00) @5 = Store8(@1, @ mem) @6 = Jump C. C: @7 = Load8Z(@mem) -> The load-store would nuke the top bits of my constants. Now after changed into StackSlots and back to SSA, the full values is taken. Am I missing something? > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:466 > typedef Vector<Value*, 1> Matches; Looks like this redefines Matches from B3PureCSE. That's probably unnecessary.
Filip Pizlo
Comment 8 2016-01-21 18:27:05 PST
(In reply to comment #7) > Comment on attachment 269513 [details] > passing so many tests > > View in context: > https://bugs.webkit.org/attachment.cgi?id=269513&action=review > > I have a question for Load8Z and Load16Z. > > > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:64 > > typedef Vector<MemoryValue*, 1> MemoryMatches; > > Looks like this can be scoped inside the CSE now. > > > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:211 > > return candidate->offset() == offset > > && (candidate->opcode() == Load8Z || candidate->opcode() == Store8); > > Say I have 3 blocks: A, B, C. > > A: > @1 = Const32(0xffffffff) > @2 = Store8(@1, @mem) > @3 = Jump C. > > B: > @4 = Const32(0xff00ff00) > @5 = Store8(@1, @ mem) > @6 = Jump C. > > C: > @7 = Load8Z(@mem) > > -> The load-store would nuke the top bits of my constants. > > Now after changed into StackSlots and back to SSA, the full values is taken. > Am I missing something? It's a bug! I'll add a test and fix. > > > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:466 > > typedef Vector<Value*, 1> Matches; > > Looks like this redefines Matches from B3PureCSE. That's probably > unnecessary.
Filip Pizlo
Comment 9 2016-01-21 18:40:58 PST
Created attachment 269537 [details] the patch
Benjamin Poulain
Comment 10 2016-01-21 18:44:07 PST
Comment on attachment 269537 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=269537&action=review > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:217 > + Value* sext = m_proc.add<Value>( sext->zext? > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:257 > + Value* sext = m_proc.add<Value>( sext->zext? > Source/JavaScriptCore/b3/testb3.cpp:9400 > +void testStore8Load8Z(int32_t value) I think it would be good to have a new test for the new feature too (with the value being stored in predecessors).
Filip Pizlo
Comment 11 2016-01-21 19:32:21 PST
Note You need to log in before you can comment on or make changes to this bug.