WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED WONTFIX
153535
IRC should know which Tmps hold constants, and emit spill code accordingly
https://bugs.webkit.org/show_bug.cgi?id=153535
Summary
IRC should know which Tmps hold constants, and emit spill code accordingly
Filip Pizlo
Reported
2016-01-26 21:58:25 PST
Patch forthcoming.
Attachments
the patch
(34.75 KB, patch)
2016-01-26 22:02 PST
,
Filip Pizlo
oliver
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Filip Pizlo
Comment 1
2016-01-26 22:02:14 PST
Created
attachment 269981
[details]
the patch
WebKit Commit Bot
Comment 2
2016-01-26 22:03:11 PST
Attachment 269981
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/b3/air/AirTmpSummary.cpp:83: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 1 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Oliver Hunt
Comment 3
2016-01-26 22:17:08 PST
Comment on
attachment 269981
[details]
the patch R=me -- do you have full perf output?
Benjamin Poulain
Comment 4
2016-01-26 22:27:45 PST
View in context:
https://bugs.webkit.org/attachment.cgi?id=269981&action=review
The idea looks okay but I am not convinced that is safe. You can have: Move(Imm, Tmp1) Move(Tmp1, Tmp2) Add(const, Tmp2) Use(Tmp2) In the first round, you figure that Tmp1 and 2 should really be merged. Due to back luck, Tmp1 is picked as the name. In the second round, you have to spill Tmp1. You check it's value and figure it is a constant. You would need some very bad luck but that seems feasible to me. It seems to me that you have to update the TmpSummary when aliasing. What was before a worse spill selection may end up into a bug now.
> Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp:1226 > + plan.constant = summary->constant;
You can probably get away with less duplication by doing everything bug filling the SlotOrConstant outside the branch. SlotOrConstant plan; if () ... constant plan else ... stackslot plan Give it a temp
> Source/JavaScriptCore/b3/air/AirTmpSummary.h:49 > + int64_t constant { 0 };
Why not store the Air::Arg?
Filip Pizlo
Comment 5
2016-01-26 22:42:47 PST
(In reply to
comment #4
)
> View in context: >
https://bugs.webkit.org/attachment.cgi?id=269981&action=review
> > The idea looks okay but I am not convinced that is safe. > > You can have: > > Move(Imm, Tmp1) > Move(Tmp1, Tmp2) > Add(const, Tmp2) > Use(Tmp2) > > In the first round, you figure that Tmp1 and 2 should really be merged. Due > to back luck, Tmp1 is picked as the name. > In the second round, you have to spill Tmp1. You check it's value and figure > it is a constant. > > You would need some very bad luck but that seems feasible to me. > > It seems to me that you have to update the TmpSummary when aliasing. What > was before a worse spill selection may end up into a bug now.
Yup, you're right.
> > > Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp:1226 > > + plan.constant = summary->constant; > > You can probably get away with less duplication by doing everything bug > filling the SlotOrConstant outside the branch. > > SlotOrConstant plan; > if () > ... constant plan > else > ... stackslot plan > > Give it a temp > > > Source/JavaScriptCore/b3/air/AirTmpSummary.h:49 > > + int64_t constant { 0 }; > > Why not store the Air::Arg?
Filip Pizlo
Comment 6
2016-01-26 22:47:52 PST
I'm not seeing a big speed-up from this. I will retest. But if implementing this means having to do a lot of analysis updates during spilling, then that may be too much complexity for no speed-up.
Filip Pizlo
Comment 7
2016-01-26 23:05:24 PST
Yeah I'm going to abandon this. "FixStackSlot" is really this patch. Benchmark report for SunSpider, V8Spider, Octane, Kraken, and AsmBench on shakezilla (MacBookPro11,3). VMs tested: "LLVM" at /Volumes/Data/secondary/OpenSource/WebKitBuild/Release/jsc (
r195637
) "B3ToT" at /Volumes/Data/tertiary/OpenSource/WebKitBuild/Release/jsc (
r195653
) "FixStackSlot" at /Volumes/Data/quinary/OpenSource/WebKitBuild/Release/jsc (
r195653
) Collected 6 samples per benchmark/VM, with 6 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. LLVM B3ToT FixStackSlot FixStackSlot v. LLVM SunSpider: 3d-cube 4.7894+-0.2559 ? 4.9212+-0.4678 4.6844+-0.0956 might be 1.0224x faster 3d-morph 5.4443+-0.1522 5.4110+-0.2089 ? 5.5740+-0.2290 ? might be 1.0238x slower 3d-raytrace 5.6170+-0.2053 5.5291+-0.0769 5.4171+-0.0780 might be 1.0369x faster access-binary-trees 2.2149+-0.1390 ? 2.5319+-0.6268 2.5278+-0.8981 ? might be 1.1413x slower access-fannkuch 5.7083+-0.2275 ? 5.7520+-0.2915 5.6160+-0.0437 might be 1.0164x faster access-nbody 2.8453+-0.2651 2.8114+-0.3230 ? 2.9543+-0.3683 ? might be 1.0383x slower access-nsieve 3.3110+-0.3001 ? 3.7904+-0.4962 ^ 3.1191+-0.0811 might be 1.0615x faster bitops-3bit-bits-in-byte 1.1755+-0.0140 1.1733+-0.0181 ? 1.2276+-0.0572 ? might be 1.0444x slower bitops-bits-in-byte 3.1551+-0.0492 ? 3.1640+-0.0737 3.1606+-0.0467 ? bitops-bitwise-and 1.9755+-0.0293 ? 2.1064+-0.1746 1.9745+-0.0246 bitops-nsieve-bits 2.9530+-0.0587 ? 3.0345+-0.0485 ^ 2.9362+-0.0125 controlflow-recursive 2.6573+-0.7910 2.4211+-0.0975 2.2872+-0.0419 might be 1.1619x faster crypto-aes 4.0558+-0.0580 4.0356+-0.0843 ? 4.1029+-0.1861 ? might be 1.0116x slower crypto-md5 2.5686+-0.1678 2.4742+-0.0215 ? 2.6687+-0.3550 ? might be 1.0390x slower crypto-sha1 2.5059+-0.2783 2.4903+-0.1492 2.4428+-0.1370 might be 1.0258x faster date-format-tofte 7.3312+-0.2966 6.9319+-0.1673 6.8319+-0.0348 ^ definitely 1.0731x faster date-format-xparb 4.6788+-0.1351 ? 5.0529+-0.8775 4.9351+-0.3952 ? might be 1.0548x slower math-cordic 2.9197+-0.0281 ? 3.1418+-0.3785 2.8995+-0.0521 math-partial-sums 4.9575+-0.2886 4.8896+-0.0920 4.8880+-0.0905 might be 1.0142x faster math-spectral-norm 2.0713+-0.0082 ? 2.0731+-0.0459 2.0570+-0.0632 regexp-dna 6.3435+-0.1982 6.1319+-0.1590 ? 6.2488+-0.2575 might be 1.0152x faster string-base64 4.8879+-0.5220 4.8155+-0.1738 ^ 4.5019+-0.0843 might be 1.0857x faster string-fasta 5.8739+-0.0692 5.8568+-0.1465 5.8030+-0.0912 might be 1.0122x faster string-tagcloud 8.0479+-0.3202 8.0140+-0.3554 7.7443+-0.0373 might be 1.0392x faster string-unpack-code 19.2266+-1.7077 18.3458+-0.8107 ? 18.8305+-1.5089 might be 1.0210x faster string-validate-input 4.4493+-0.1844 ? 4.6125+-0.4244 4.3435+-0.2975 might be 1.0244x faster <arithmetic> 4.6833+-0.1094 4.6735+-0.0593 4.6068+-0.0730 might be 1.0166x faster LLVM B3ToT FixStackSlot FixStackSlot v. LLVM V8Spider: crypto 50.4202+-1.0291 ^ 38.0345+-0.4847 37.6337+-0.6090 ^ definitely 1.3398x faster deltablue 76.6903+-2.3078 ^ 51.4685+-0.8000 ? 53.4559+-1.3808 ^ definitely 1.4346x faster earley-boyer 45.5573+-0.8511 ^ 41.9361+-0.8742 41.8469+-0.4467 ^ definitely 1.0887x faster raytrace 30.8907+-1.3434 ^ 20.3018+-0.6365 ? 20.5273+-0.9701 ^ definitely 1.5049x faster regexp 63.8780+-2.8251 ? 64.8892+-3.2281 64.2653+-2.6902 ? richards 53.2545+-1.1488 ^ 40.1047+-0.5869 ? 47.0843+-16.2073 might be 1.1310x faster splay 38.9781+-1.9016 36.6893+-1.7494 36.2443+-1.8178 might be 1.0754x faster <geometric> 49.4164+-0.6504 ^ 39.8205+-0.3884 ? 40.6534+-1.8265 ^ definitely 1.2156x faster LLVM B3ToT FixStackSlot FixStackSlot v. LLVM Octane: encrypt 0.15251+-0.00208 ! 0.16297+-0.00296 0.15864+-0.00407 ? might be 1.0402x slower decrypt 2.90266+-0.00906 ^ 2.85430+-0.02266 2.83900+-0.00300 ^ definitely 1.0224x faster deltablue x2 0.13560+-0.00320 0.13395+-0.00335 ? 0.13412+-0.00481 might be 1.0111x faster earley 0.28595+-0.00715 0.27764+-0.00259 ? 0.28103+-0.00357 might be 1.0175x faster boyer 4.25335+-0.02297 ? 4.36977+-0.13231 4.29641+-0.00876 ! definitely 1.0101x slower navier-stokes x2 4.86960+-0.17996 4.83430+-0.02650 4.80787+-0.01486 might be 1.0128x faster raytrace x2 0.85306+-0.00544 ! 0.86828+-0.00268 ! 0.88498+-0.00581 ! definitely 1.0374x slower richards x2 0.08555+-0.00069 ^ 0.07882+-0.00122 ? 0.07907+-0.00081 ^ definitely 1.0819x faster splay x2 0.35048+-0.00259 ? 0.35344+-0.00386 0.34989+-0.00126 regexp x2 25.27622+-0.42318 ^ 24.27359+-0.15231 24.05678+-0.33669 ^ definitely 1.0507x faster pdfjs x2 38.03603+-0.31473 37.69347+-0.10104 ? 37.80200+-0.28297 mandreel x2 42.71740+-0.82776 ? 43.15691+-0.64838 ? 44.03764+-2.64013 ? might be 1.0309x slower gbemu x2 29.18268+-0.34543 ! 32.38942+-2.62173 ? 34.13318+-3.96887 ! definitely 1.1696x slower closure 0.56746+-0.00255 0.56601+-0.00445 0.56455+-0.00149 jquery 7.32850+-0.04465 7.32338+-0.03271 ? 7.33013+-0.07065 ? box2d x2 9.16494+-0.05151 ^ 9.02753+-0.08029 ! 9.28304+-0.05046 ! definitely 1.0129x slower zlib x2 376.69156+-16.47696 ? 385.74034+-10.18427 382.34942+-17.89433 ? might be 1.0150x slower typescript x2 686.20626+-22.49363 675.96883+-17.63547 668.51090+-12.70805 might be 1.0265x faster <geometric> 5.29248+-0.02903 ? 5.29388+-0.03583 ? 5.31485+-0.06388 ? might be 1.0042x slower LLVM B3ToT FixStackSlot FixStackSlot v. LLVM Kraken: ai-astar 129.422+-3.409 ^ 94.272+-3.070 93.845+-0.878 ^ definitely 1.3791x faster audio-beat-detection 49.567+-0.523 ! 55.176+-0.205 ? 55.540+-0.838 ! definitely 1.1205x slower audio-dft 95.289+-1.367 ? 96.266+-1.702 ? 96.545+-1.788 ? might be 1.0132x slower audio-fft 35.509+-0.656 ! 44.368+-0.817 43.882+-0.801 ! definitely 1.2358x slower audio-oscillator 59.799+-1.799 ^ 52.255+-1.896 51.419+-1.543 ^ definitely 1.1630x faster imaging-darkroom 60.026+-0.746 ? 60.150+-0.083 ? 60.180+-0.075 ? imaging-desaturate 49.252+-2.382 ! 86.944+-0.573 86.811+-0.858 ! definitely 1.7626x slower imaging-gaussian-blur 89.319+-4.651 ^ 68.899+-2.958 68.210+-0.399 ^ definitely 1.3095x faster json-parse-financial 37.175+-0.642 ? 37.700+-1.064 ? 43.903+-15.552 ? might be 1.1810x slower json-stringify-tinderbox 23.588+-0.719 23.321+-1.359 ? 23.795+-3.512 ? stanford-crypto-aes 43.097+-1.236 ^ 40.649+-1.166 40.378+-1.087 ^ definitely 1.0673x faster stanford-crypto-ccm 37.085+-1.656 ? 37.134+-1.540 ? 37.901+-1.152 ? might be 1.0220x slower stanford-crypto-pbkdf2 97.625+-0.655 ! 98.836+-0.420 ! 99.920+-0.597 ! definitely 1.0235x slower stanford-crypto-sha256-iterative 37.577+-1.650 ? 38.586+-1.140 ? 39.282+-1.340 ? might be 1.0454x slower <arithmetic> 60.309+-0.264 ^ 59.611+-0.375 ? 60.115+-1.551 might be 1.0032x faster LLVM B3ToT FixStackSlot FixStackSlot v. LLVM AsmBench: bigfib.cpp 446.1048+-7.0531 ? 460.2276+-16.1047 453.6863+-6.0451 ? might be 1.0170x slower cray.c 390.9513+-1.0508 ^ 375.1922+-5.9795 ? 376.4558+-4.4787 ^ definitely 1.0385x faster dry.c 426.2250+-2.4951 ^ 411.9704+-7.2703 ? 415.3663+-11.4569 might be 1.0261x faster FloatMM.c 680.7536+-1.3438 ! 734.1637+-2.4712 ? 734.8278+-4.1856 ! definitely 1.0794x slower gcc-loops.cpp 3414.7884+-16.0611 ! 3506.4532+-9.9958 ? 3510.0105+-22.5020 ! definitely 1.0279x slower n-body.c 820.3629+-1.0765 ! 852.7295+-2.3957 851.1387+-2.3577 ! definitely 1.0375x slower Quicksort.c 406.3832+-1.0965 ^ 397.3653+-4.6427 392.7680+-2.4018 ^ definitely 1.0347x faster stepanov_container.cpp 3478.7534+-26.8152 ^ 3220.7915+-31.0237 ? 3245.4424+-15.6376 ^ definitely 1.0719x faster Towers.c 232.1444+-1.1496 ! 250.9630+-2.9381 ? 253.3819+-5.0959 ! definitely 1.0915x slower <geometric> 711.1437+-1.5959 ? 717.0700+-5.3172 ? 717.3056+-2.8316 ! definitely 1.0087x slower LLVM B3ToT FixStackSlot FixStackSlot v. LLVM Geomean of preferred means: <scaled-result> 34.9997+-0.1968 ^ 33.4868+-0.1406 ? 33.6097+-0.4412 ^ definitely 1.0414x faster
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug