Bug 213374

Summary: Bring back llint on x86_32
Product: WebKit Reporter: Angelos Oikonomopoulos <angelos>
Component: New BugsAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: annulen, benjamin, cdumez, cmarcelo, ews-watchlist, gyuyoung.kim, jsc32, keith_miller, mark.lam, msaboff, ryuan.choi, saam, sergio, tzagallo, xan.lopez
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch angelos: review?, ews-feeder: commit-queue-

Description Angelos Oikonomopoulos 2020-06-19 07:00:29 PDT
Bring back llint on x86_32
Comment 1 Angelos Oikonomopoulos 2020-06-19 07:00:49 PDT
Created attachment 402279 [details]
Patch
Comment 2 Angelos Oikonomopoulos 2020-06-19 07:03:22 PDT
The idea behind bringing back llint on x86_32 is to have faster feedback when patches introduce 32-bit-only issues.

This is to get some feedback on the approach taken; it currently doesn't work on !X86 as it unconditionally treats PB as a 'special' register. That's something that can be fixed if this is a viable approach.
Comment 3 Angelos Oikonomopoulos 2020-06-23 07:38:33 PDT
Some indicative performance numbers (--outer=40 --sunspider):

Without llint:
3d-cube                        51.5102+-0.2449
3d-morph                       63.6048+-0.2603
3d-raytrace                    64.8813+-0.1663
access-binary-trees            25.9787+-0.1334
access-fannkuch                78.2869+-0.7029
access-nbody                   59.6337+-0.1311
access-nsieve                  21.9120+-0.2588
bitops-3bit-bits-in-byte       34.9708+-0.4621
bitops-bits-in-byte            60.5052+-0.2608
bitops-bitwise-and             54.8364+-0.2157
bitops-nsieve-bits             74.9143+-0.3179
controlflow-recursive          30.6324+-0.1998
crypto-aes                     40.6281+-0.2096
crypto-md5                     26.3450+-0.2055
crypto-sha1                    27.5882+-0.2081
date-format-tofte              46.7974+-0.2031
date-format-xparb              42.7510+-0.2802
math-cordic                    79.9764+-0.2333
math-partial-sums              60.7575+-0.1682
math-spectral-norm             34.5550+-0.1516
regexp-dna                    267.1121+-2.7087
string-base64                  36.3143+-0.2191
string-fasta                   71.9946+-0.3827
string-tagcloud                66.6965+-0.9069
string-unpack-code             95.6829+-0.5120
string-validate-input          42.4582+-0.1960

<arithmetic>                   60.0509+-0.1167

With llint:
3d-cube                        43.0637+-0.0248
3d-morph                       79.0947+-0.1341
3d-raytrace                    66.4359+-0.0810
access-binary-trees            32.2224+-0.0970
access-fannkuch               112.2811+-0.3679
access-nbody                   57.2866+-0.0982
access-nsieve                  31.4300+-0.1631
bitops-3bit-bits-in-byte       52.1172+-0.1066
bitops-bits-in-byte            83.6667+-0.2314
bitops-bitwise-and             94.8909+-0.2878
bitops-nsieve-bits             97.1129+-0.1766
controlflow-recursive          39.5990+-0.0967
crypto-aes                     52.2182+-0.1408
crypto-md5                     36.5657+-0.0971
crypto-sha1                    37.2783+-0.1114
date-format-tofte              54.5934+-0.1388
date-format-xparb              43.9826+-0.1481
math-cordic                    93.3353+-0.1531
math-partial-sums              63.9966+-0.1377
math-spectral-norm             40.6732+-0.2665
regexp-dna                    264.8693+-1.1019
string-base64                  43.1855+-0.1120
string-fasta                   83.9799+-0.1919
string-tagcloud                73.1036+-0.1208
string-unpack-code             95.8466+-0.1343
string-validate-input          47.0231+-0.0753

<arithmetic>                   69.9943+-0.0545
Comment 4 Saam Barati 2020-06-23 12:22:47 PDT
the unit here is score or time?
Comment 5 Angelos Oikonomopoulos 2020-06-24 07:46:18 PDT
Score, AFAICT. What run-javascript-benchmarks produces by default.
Comment 6 Angelos Oikonomopoulos 2020-06-24 07:47:00 PDT
I should note that the purpose of this patch is not better performance on x86_32. Rather, it is to enable faster testing of llint-related changes on 32-bit platforms.
Comment 7 Angelos Oikonomopoulos 2020-07-03 04:48:49 PDT
Created attachment 403450 [details]
Patch
Comment 8 Angelos Oikonomopoulos 2020-07-20 02:21:28 PDT
So, the main thing I'm looking for feedback on is the magical treatment of PB on x86 (32 bits).

Instead of abstracting away all accesses to PB behind offlineasm macros, we leave PB as a register and, on X86,
- - rewrite the actual PB-using instruction to reference a temporary reg (which we get by spilling a value)
- prepend instructions to load the value of PB from the stack in the temporary reg (if it's being read from)
- append instructions to write it back to the stack from the temporary reg

The alternative would be to mediate all accesses to PB via offlineasm macros on all platforms. This way, we don't complicate the llint .asm code for the benefit of a single platform.
Comment 9 Keith Miller 2020-07-20 09:00:16 PDT
(In reply to Angelos Oikonomopoulos from comment #6)
> I should note that the purpose of this patch is not better performance on
> x86_32. Rather, it is to enable faster testing of llint-related changes on
> 32-bit platforms.

Can you clarify how your patch achieves this goal? It's not immediately obvious. So it's hard for me to provide actionable feedback.

(In reply to Angelos Oikonomopoulos from comment #8)
> So, the main thing I'm looking for feedback on is the magical treatment of
> PB on x86 (32 bits).
> 
> Instead of abstracting away all accesses to PB behind offlineasm macros, we
> leave PB as a register and, on X86,
> - - rewrite the actual PB-using instruction to reference a temporary reg
> (which we get by spilling a value)
> - prepend instructions to load the value of PB from the stack in the
> temporary reg (if it's being read from)
> - append instructions to write it back to the stack from the temporary reg
> 
> The alternative would be to mediate all accesses to PB via offlineasm macros
> on all platforms. This way, we don't complicate the llint .asm code for the
> benefit of a single platform.

Generally, I'd rather have a special .asm file for X86, if it's particularly special. We don't look at the offline assembler code too often, so keeping that code as clean as possible has a lot of value.
Comment 10 Angelos Oikonomopoulos 2020-07-21 00:37:15 PDT
(In reply to Keith Miller from comment #9)
> (In reply to Angelos Oikonomopoulos from comment #6)
> > I should note that the purpose of this patch is not better performance on
> > x86_32. Rather, it is to enable faster testing of llint-related changes on
> > 32-bit platforms.
> 
> Can you clarify how your patch achieves this goal? It's not immediately
> obvious. So it's hard for me to provide actionable feedback.

Thanks for taking a look. The idea is that we get back test results faster on 32-bit X86 so, if llint is being tested there too, developers would get feedback sooner from the EWS if a change they introduce breaks 32-bit platforms.

> (In reply to Angelos Oikonomopoulos from comment #8)
[...]
> > The alternative would be to mediate all accesses to PB via offlineasm macros
> > on all platforms. This way, we don't complicate the llint .asm code for the
> > benefit of a single platform.
> 
> Generally, I'd rather have a special .asm file for X86, if it's particularly
> special. We don't look at the offline assembler code too often, so keeping
> that code as clean as possible has a lot of value.

While I'm not against this in principle, I think it would defeat our purpose here. We're only interested in 32-bit X86 in as much as it shares code paths with the other 32-bit platforms we do care about ;-)