Xtensa LX Square Root/Reciprocal Square Root Inline ASM Sequence
or: how I learned to stop using 0x5F3759DF
and love the RSQRT0.S
While working on a personal project with an ESP32-S3, I noticed that while the toolchain-provided GCC seemed to generate the proper assembly sequences for floating-point division and square root, there wasn’t really a way to coax it into generating the reciprocal square root sequence, and so you’d be stuck doing both sequences on top of eachother (it’s been a while, it may have generated the actual RECIP.S
sequence instead of the DIV.S
one). That’s slow and it sucks and I hate it which are the three main factors that motivate me to write any software at all, so I made one of those fancy-pants GCC inline assembly functions with register constraints and clobbers and whatnot. I don’t know how often this will actually be useful but I couldn’t find it online at the time so I figure it’s worth documenting.
static float rsqrtsf(float x) {
float a,b,c,d;
float result;
asm volatile(
"rsqrt0.s %[r], %[x]\n"
"mul.s %[a], %[x], %[r]\n"
"const.s %[b], 3\n"
"mul.s %[c], %[b], %[r]\n"
"const.s %[d], 1\n"
"msub.s %[d], %[a], %[r]\n"
"madd.s %[r], %[c], %[d]\n"
"mul.s %[a], %[x], %[r]\n"
"mul.s %[c], %[b], %[r]\n"
"const.s %[d], 1\n"
"msub.s %[d], %[a], %[r]\n"
"maddn.s %[r], %[c], %[d]\n"
: [r]"=&f"(result),
[a]"=&f"(a),
[b]"=&f"(b),
[c]"=&f"(c),
[d]"=&f"(d)
: [x]"f"(x)
);
return result;
}