Skip to main content

Xtensa LX Square Root/Reciprocal Square Root Inline ASM Sequence

or: how I learned to stop using 0x5F3759DF and love the RSQRT0.S

While working on a personal project with an ESP32-S3, I noticed that while the toolchain-provided GCC seemed to generate the proper assembly sequences for floating-point division and square root, there wasn’t really a way to coax it into generating the reciprocal square root sequence, and so you’d be stuck doing both sequences on top of eachother (it’s been a while, it may have generated the actual RECIP.S sequence instead of the DIV.S one). That’s slow and it sucks and I hate it which are the three main factors that motivate me to write any software at all, so I made one of those fancy-pants GCC inline assembly functions with register constraints and clobbers and whatnot. I don’t know how often this will actually be useful but I couldn’t find it online at the time so I figure it’s worth documenting.

static float rsqrtsf(float x) {
	float a,b,c,d;
	float result;
	asm volatile(
		"rsqrt0.s	%[r], %[x]\n"
		"mul.s		%[a], %[x], %[r]\n"
		"const.s	%[b], 3\n"
		"mul.s		%[c], %[b], %[r]\n"
		"const.s	%[d], 1\n"
		"msub.s		%[d], %[a], %[r]\n"
		"madd.s		%[r], %[c], %[d]\n"
		"mul.s		%[a], %[x], %[r]\n"
		"mul.s		%[c], %[b], %[r]\n"
		"const.s	%[d], 1\n"
		"msub.s		%[d], %[a], %[r]\n"
		"maddn.s	%[r], %[c], %[d]\n"
		: [r]"=&f"(result),
		  [a]"=&f"(a),
		  [b]"=&f"(b),
		  [c]"=&f"(c),
		  [d]"=&f"(d)
		: [x]"f"(x)
	);
	return result;
}