Message from @DanielKO

Note how the compiler understands that both formulas end with `/2`, so that part of the code is common and get folded into the same code, if optimization is enabled.

M4Gunner

2018-03-31 03:10:25 UTC

if I change -O0 it wont let me revert without reloading page

DanielKO

2018-03-31 03:11:39 UTC

Really? Click the compiler output at the bottom, it should open the full compiler output to the right.

DanielKO

2018-03-31 03:14:20 UTC

M4Gunner

2018-03-31 03:15:48 UTC

maybe it doesnt wanna play nice with Ublock Origin

DanielKO

2018-03-31 03:16:48 UTC

With `-fverbose-asm`, although the editor already uses colors to indicate what ASM line corresponds to what C++ line.

M4Gunner

2018-03-31 03:22:56 UTC

cool. kinda demotiviating to see the comparison

DanielKO

2018-03-31 03:33:59 UTC

<https://godbolt.org/g/4FGukD>

M4Gunner

2018-03-31 03:35:14 UTC

the line correspondence is trippy here

DanielKO

2018-03-31 03:36:10 UTC

This is x86-64, so it's using the MMX/SSE registers for the floating point.

DanielKO

2018-03-31 03:36:34 UTC

Call convention allows the arguments to come in as registers instead of the stack.

DanielKO

2018-03-31 03:36:42 UTC

xmm0, xmm1, xmm2

DanielKO

2018-03-31 03:37:24 UTC

First 3 lines multiplies each register by itself, which squares the value.

DanielKO

2018-03-31 03:37:34 UTC

Then 2 lines to add them.

DanielKO

2018-03-31 03:38:36 UTC

Then `pxor` a register with itself, it always generates a zero. It's the fastest way to load a zero into a register, in Intel/AMD processors.

M4Gunner

2018-03-31 03:38:45 UTC

okay so in the beginning he doesnt manually move your variables into the first 3 registers

DanielKO

2018-03-31 03:41:55 UTC

I'm not too familiar with amd64/x86-64 ABI, I'm guessing it can assume the first few arguments are both on the stack and on the XMM registers.

M4Gunner

2018-03-31 03:42:13 UTC

ok

M4Gunner

2018-03-31 03:42:25 UTC

had to look this part up
https://c9x.me/x86/html/file_module_x86_id_180.html

M4Gunner

2018-03-31 03:42:58 UTC

Most of the other commands are somewhat familiar

M4Gunner

2018-03-31 03:46:49 UTC

it's using single-precision floating point, so if you typed in different numbers it may have automatically chosen a format with more precision?

DanielKO

2018-03-31 03:47:30 UTC

That's governed by how basic types arithmetic works in C++.

DanielKO

2018-03-31 03:48:18 UTC

Replace the `float` by `double`, and call `::sqrt()` to see it use different instructions.

DanielKO

2018-03-31 03:49:12 UTC

DanielKO

2018-03-31 03:51:50 UTC

The `pxor`, `ucomis` and `ja` serve to check if the argument is positive; if so, it can just use the `sqrt` instruction; otherwise, it needs to call the `sqrt()` function from the standard library, which handles all the nasty NaN, Infinity, negative arguments.

M4Gunner

2018-03-31 03:54:52 UTC

okay. so the chip has its own primitive math

DanielKO

2018-03-31 03:56:29 UTC

Yeah, it's a CISC architecture.

DanielKO

2018-03-31 03:56:58 UTC

Switch to the MIPS gcc, and you'll get a very different result.

M4Gunner

2018-03-31 03:57:13 UTC

RISC dont have its own maths intructions?