What task benefits from using such a complex instruction so easily dividable in ...

colechristensen · 2024-04-07T16:54:40 1712508880

Inverse square root is for normalizing vectors particularly in computer graphics calculations, it needs to be run a whole lot very fast.

https://en.m.wikipedia.org/wiki/Fast_inverse_square_root#Mot...

hinkley · 2024-04-07T19:44:07 1712519047

Famously the magic constant in the Quake engine that nobody remembers inventing.

That article does say there’s an SSE instruction rsqrtss that is better.

nanidin · 2024-04-07T16:45:10 1712508310

Neon is SIMD so I would presume these instructions let you vectorize those calculations and do them in parallel on a lot of data more efficiently than if you broke it down into simpler operations and did them one by one.

voidbert · 2024-04-07T17:18:08 1712510288

Yes, but the part that got me was the halving of the result followed by the clamping. SIMD generally makes sense, but for something like this to exist usually there's something very specific (like a certain video codec, for example) that greatly benefits from such a complex instruction.

ekelsen · 2024-04-08T03:09:14 1712545754

The halving could come from an intended use in a Newton Raphson iteration of a square root refinement.

See for example https://math.mit.edu/~stevenj/18.335/newton-sqrt.pdf

The initial guess is the approximate square root, but it needs to be halved as part of the calculation.

creato · 2024-04-07T17:23:32 1712510612

It's probably not about avoiding extra instructions/performance, but making the range of the result more useful and avoiding overflow. Or in other words, the entire instruction may be useless if you don't do these things.

epcoa · 2024-04-07T21:30:00 1712525400

The halving and clamping is nothing particularly remarkable in the context of usefully using fixed point numbers (scaled integers) to avoid overflow. Reciprocal square root itself is a fundamental operation for DSP algorithms and of course computer graphics. This is a fairly generic instruction really, though FRSQRTE likely gets more real world use.