Message from @DanielKO
Discord ID: 429484636226191370
Even then, just because you think your code is better than the compiler's, doesn't mean it's actually faster.
I dodnt think i'd ever be good enough at it to compete with the compiler
This website is good, if you want to quickly play around with assembly output from various compilers.
Simple test with ARM assembly.
Try changing the `-Ox` argument to change the optimization level. Try `-Og`, `-O1`, `-O3`...
Note how the compiler understands that both formulas end with `/2`, so that part of the code is common and get folded into the same code, if optimization is enabled.
if I change -O0 it wont let me revert without reloading page
Really? Click the compiler output at the bottom, it should open the full compiler output to the right.
With `-fverbose-asm`, although the editor already uses colors to indicate what ASM line corresponds to what C++ line.

cool. kinda demotiviating to see the comparison
the line correspondence is trippy here
This is x86-64, so it's using the MMX/SSE registers for the floating point.
Call convention allows the arguments to come in as registers instead of the stack.
xmm0, xmm1, xmm2
First 3 lines multiplies each register by itself, which squares the value.
Then 2 lines to add them.
Then `pxor` a register with itself, it always generates a zero. It's the fastest way to load a zero into a register, in Intel/AMD processors.
okay so in the beginning he doesnt manually move your variables into the first 3 registers
I'm not too familiar with amd64/x86-64 ABI, I'm guessing it can assume the first few arguments are both on the stack and on the XMM registers.
ok
had to look this part up
https://c9x.me/x86/html/file_module_x86_id_180.html
Most of the other commands are somewhat familiar
it's using single-precision floating point, so if you typed in different numbers it may have automatically chosen a format with more precision?
That's governed by how basic types arithmetic works in C++.
Replace the `float` by `double`, and call `::sqrt()` to see it use different instructions.
The `pxor`, `ucomis` and `ja` serve to check if the argument is positive; if so, it can just use the `sqrt` instruction; otherwise, it needs to call the `sqrt()` function from the standard library, which handles all the nasty NaN, Infinity, negative arguments.
okay. so the chip has its own primitive math
Yeah, it's a CISC architecture.
Switch to the MIPS gcc, and you'll get a very different result.
RISC dont have its own maths intructions?
ive only read a couple pages about MIPS so far
Reduced Instruction Set Computer, the whole point is to have so few instructions in the architecture that the circuitry is very small.
Being small means there's less need for synchronization, thus it can run faster, and there are more transistors that can be used for caches.
PowerISA stands for Performance Optimized With Enhanced Reduced Instruction Set Computer Instruction Set Architecture
So a typical RISC arch won't have any advanced instructions. No specialized math, no instructions that mix register operands with memory, etc.




