Bug 153321

Summary:

B3 CSE should be able to match a full redundancy even if none of the matches dominate the value in question

Product:

WebKit

Reporter:

Filip Pizlo <fpizlo>

Component:

JavaScriptCore

Assignee:

Filip Pizlo <fpizlo>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

barraclough, benjamin, commit-queue, ggaren, keith_miller, mario, mark.lam, mhahnenb, msaboff, oliver, saam, sam

Priority:

Version:

WebKit Nightly Build

Hardware:

All

OS:

All

Bug Depends on:

Bug Blocks:

150507

Attachments:

Description	Flags
wrote some code	none
makes FTL B3 faster than FTL LLVM on Octane/richards	none
passing so many tests	none
current performance standings	none
the patch	benjamin: review+

Filip Pizlo

Reported 2016-01-21 13:48:53 PST

Patch forthcoming. This is important for Octane/richards. And probably other things too.

Attachments
wrote some code (12.99 KB, patch) 2016-01-21 13:50 PST, Filip Pizlo	no flags	Details Formatted Diff Diff
makes FTL B3 faster than FTL LLVM on Octane/richards (19.62 KB, patch) 2016-01-21 15:28 PST, Filip Pizlo	no flags	Details Formatted Diff Diff
passing so many tests (22.97 KB, patch) 2016-01-21 16:17 PST, Filip Pizlo	no flags	Details Formatted Diff Diff
current performance standings (80.92 KB, text/plain) 2016-01-21 17:11 PST, Filip Pizlo	no flags	Details
the patch (26.42 KB, patch) 2016-01-21 18:40 PST, Filip Pizlo	benjamin: review+	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Filip Pizlo

Comment 1 2016-01-21 13:50:05 PST

Created attachment 269488 [details] wrote some code

Filip Pizlo

Comment 2 2016-01-21 15:28:15 PST

Created attachment 269504 [details] makes FTL B3 faster than FTL LLVM on Octane/richards I should probably run more than just richards before r?.

Geoffrey Garen

Comment 3 2016-01-21 15:45:25 PST

Comment on attachment 269504 [details] makes FTL B3 faster than FTL LLVM on Octane/richards View in context: https://bugs.webkit.org/attachment.cgi?id=269504&action=review > Source/JavaScriptCore/dfg/DFGCommon.h:41 > -#define FTL_USES_B3 0 > +#define FTL_USES_B3 1 Revert.

Filip Pizlo

Comment 4 2016-01-21 15:46:46 PST

(In reply to comment #3) > Comment on attachment 269504 [details] > makes FTL B3 faster than FTL LLVM on Octane/richards > > View in context: > https://bugs.webkit.org/attachment.cgi?id=269504&action=review > > > Source/JavaScriptCore/dfg/DFGCommon.h:41 > > -#define FTL_USES_B3 0 > > +#define FTL_USES_B3 1 > > Revert. I will this time!

Filip Pizlo

Comment 5 2016-01-21 16:17:02 PST

Created attachment 269513 [details] passing so many tests

Filip Pizlo

Comment 6 2016-01-21 17:11:27 PST

Created attachment 269523 [details] current performance standings

Benjamin Poulain

Comment 7 2016-01-21 18:18:46 PST

Comment on attachment 269513 [details] passing so many tests View in context: https://bugs.webkit.org/attachment.cgi?id=269513&action=review I have a question for Load8Z and Load16Z. > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:64 > typedef Vector<MemoryValue*, 1> MemoryMatches; Looks like this can be scoped inside the CSE now. > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:211 > return candidate->offset() == offset > && (candidate->opcode() == Load8Z || candidate->opcode() == Store8); Say I have 3 blocks: A, B, C. A: @1 = Const32(0xffffffff) @2 = Store8(@1, @mem) @3 = Jump C. B: @4 = Const32(0xff00ff00) @5 = Store8(@1, @ mem) @6 = Jump C. C: @7 = Load8Z(@mem) -> The load-store would nuke the top bits of my constants. Now after changed into StackSlots and back to SSA, the full values is taken. Am I missing something? > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:466 > typedef Vector<Value*, 1> Matches; Looks like this redefines Matches from B3PureCSE. That's probably unnecessary.

Filip Pizlo

Comment 8 2016-01-21 18:27:05 PST

(In reply to comment #7) > Comment on attachment 269513 [details] > passing so many tests > > View in context: > https://bugs.webkit.org/attachment.cgi?id=269513&action=review > > I have a question for Load8Z and Load16Z. > > > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:64 > > typedef Vector<MemoryValue*, 1> MemoryMatches; > > Looks like this can be scoped inside the CSE now. > > > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:211 > > return candidate->offset() == offset > > && (candidate->opcode() == Load8Z || candidate->opcode() == Store8); > > Say I have 3 blocks: A, B, C. > > A: > @1 = Const32(0xffffffff) > @2 = Store8(@1, @mem) > @3 = Jump C. > > B: > @4 = Const32(0xff00ff00) > @5 = Store8(@1, @ mem) > @6 = Jump C. > > C: > @7 = Load8Z(@mem) > > -> The load-store would nuke the top bits of my constants. > > Now after changed into StackSlots and back to SSA, the full values is taken. > Am I missing something? It's a bug! I'll add a test and fix. > > > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:466 > > typedef Vector<Value*, 1> Matches; > > Looks like this redefines Matches from B3PureCSE. That's probably > unnecessary.

Filip Pizlo

Comment 9 2016-01-21 18:40:58 PST

Created attachment 269537 [details] the patch

Benjamin Poulain

Comment 10 2016-01-21 18:44:07 PST

Comment on attachment 269537 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=269537&action=review > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:217 > + Value* sext = m_proc.add<Value>( sext->zext? > Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp:257 > + Value* sext = m_proc.add<Value>( sext->zext? > Source/JavaScriptCore/b3/testb3.cpp:9400 > +void testStore8Load8Z(int32_t value) I think it would be good to have a new test for the new feature too (with the value being stored in predecessors).

Filip Pizlo

Comment 11 2016-01-21 19:32:21 PST

Landed in http://trac.webkit.org/changeset/195434

Note You need to log in before you can comment on or make changes to this bug.