RESOLVED DUPLICATE of bug 169773 169815
WebAssembly: eliminate redundant ARM64 TLS load
https://bugs.webkit.org/show_bug.cgi?id=169815
Summary WebAssembly: eliminate redundant ARM64 TLS load
JF Bastien
Reported 2017-03-17 10:05:08 PDT
This is a small optimization, I'm not sure it'll pay off much but it's neat. As part of bug #169611 we're moving the WebAssembly context to a TLS slot. On x86 that's a single load / store off the segment register, but on ARM64 it uses mrs + mask + {load,store}. the `mrs TPIDRRO EL0` instruction, coupled with the mask and the address generation, simply return the location of our TLS slot (the offset is defined as WTF_WASM_CONTEXT_KEY in wtf/FastTls.h). That value is idempotent as long as we're executing in the same thread, and that's an invariant of WebAssembly: different instances are set in that context but the location is the same per thread. Right now this mrs+mask+memory combo is generated by the ARM64 macro assembler. This is inefficient. We could instead teach the compiler about the idempotent part (i.e. "get TLS slot #x") and then split off the load / store from that slot. For x86 that could mean combining both operations after the fact or keeping the same model we have now. For ARM64 that would allow us to eliminate redundant mrs+mask if profitable, or dematerializing them under register pressure.
Attachments
JF Bastien
Comment 1 2017-03-17 10:56:07 PDT
Fil thinks we just want to pin a register on ARM because the optimization I propose will likely do the same thing by hoisting the redundant load to the top of each function. May as well get rid of the load entirely. Let's just do is as part of bug #169773 then *** This bug has been marked as a duplicate of bug 169773 ***
Note You need to log in before you can comment on or make changes to this bug.