WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
150777
Consider still matching an address expression even if B3 has already assigned a Tmp to it
https://bugs.webkit.org/show_bug.cgi?id=150777
Summary
Consider still matching an address expression even if B3 has already assigned...
Filip Pizlo
Reported
2015-11-01 11:40:45 PST
Probably if we want to do hoisting of address expressions, we should have some better heuristics for it.
Attachments
the patch
(2.49 KB, patch)
2015-12-10 09:16 PST
,
Filip Pizlo
ggaren
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Filip Pizlo
Comment 1
2015-12-10 09:16:24 PST
Created
attachment 267111
[details]
the patch
Geoffrey Garen
Comment 2
2015-12-10 15:34:51 PST
Comment on
attachment 267111
[details]
the patch I would have guessed 2.
Filip Pizlo
Comment 3
2015-12-10 18:44:29 PST
(In reply to
comment #2
)
> Comment on
attachment 267111
[details]
> the patch > > I would have guessed 2.
Depends on the architecture and a lot of other things. On Intel, any address expression takes 1 cycle, regardless of its complexity. So if you do this: mov (%rax), %rax then you will spend 1 cycle deducing that the address is simply the thing in %rax. And if you do this: mov 42(%rcx,%rdx,4), %rax then you will also spend 1 cycle deducing what the address is. Therefore, if you had to pick between the following two snippets, you would pick the one with fewer instructions even though it computes the same address twice: This is faster: mov 42(%rcx,%rdx,4), %rax mov 46(%rcx,%rdx,4), %rdi than this: lea 42(%rcx,%rdx,4), %rsi mov (%rsi), %rax mov 4(%rsi), %rdi This would still be true even if the address expressions were identical rather than the second one being offset by +4. The first version is faster because even though we recompute the same seemingly complex address expression, it's actually free to do so. Hence, setting the threshold to 2 is probably not a good thing. It's far too low. The last time I wrote a compiler, I set the threshold to +Inf. Later on, I learned through whispers in the wind that you want to have some kind of upper limit, though I never learned the reason. I suspect that the reason is just registers. It's possible that in the "faster" example above, keeping %rcx and %rdx alive causes too much register pressure. You don't know if that's an issue at the time that you do instruction selection, and usually it won't be an issue. But you can see how if you really had a lot of uses of the same address, then the second form may be better because it requires only one register to be alive for the memory access to compute the address. So, 2 is probably too low because it adds instructions without reducing the amount of work, but +Inf is probably too high because at some point the register pressure of keeping all of the inputs to the address computation alive is a bigger issue than the cost of the "lea" instruction.
Filip Pizlo
Comment 4
2015-12-10 19:42:31 PST
Landed in
http://trac.webkit.org/changeset/193941
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug