Message from @DanielKO

Discord ID: 429484636226191370


2018-03-31 02:59:04 UTC  

Even then, just because you think your code is better than the compiler's, doesn't mean it's actually faster.

2018-03-31 02:59:41 UTC  

I dodnt think i'd ever be good enough at it to compete with the compiler

2018-03-31 03:00:32 UTC  

This website is good, if you want to quickly play around with assembly output from various compilers.

2018-03-31 03:02:36 UTC  

Simple test with ARM assembly.

2018-03-31 03:05:23 UTC  
2018-03-31 03:06:28 UTC  

Try changing the `-Ox` argument to change the optimization level. Try `-Og`, `-O1`, `-O3`...

2018-03-31 03:07:47 UTC  

Note how the compiler understands that both formulas end with `/2`, so that part of the code is common and get folded into the same code, if optimization is enabled.

2018-03-31 03:10:25 UTC  

if I change -O0 it wont let me revert without reloading page

2018-03-31 03:11:39 UTC  

Really? Click the compiler output at the bottom, it should open the full compiler output to the right.

2018-03-31 03:14:20 UTC  

https://cdn.discordapp.com/attachments/423219052849397773/429478527474204681/godbolt-asm-test.png

2018-03-31 03:15:48 UTC  

maybe it doesnt wanna play nice with Ublock Origin

https://cdn.discordapp.com/attachments/423219052849397773/429478895629107211/2018-03-30_20_12_06-Compiler_Explorer.png

2018-03-31 03:16:48 UTC  

With `-fverbose-asm`, although the editor already uses colors to indicate what ASM line corresponds to what C++ line.

https://cdn.discordapp.com/attachments/423219052849397773/429479149372178435/fverbose-asm.png

2018-03-31 03:22:56 UTC  

cool. kinda demotiviating to see the comparison

2018-03-31 03:33:59 UTC  

https://cdn.discordapp.com/attachments/423219052849397773/429483474861293569/sqrt-test.png

2018-03-31 03:35:14 UTC  

the line correspondence is trippy here

2018-03-31 03:36:10 UTC  

This is x86-64, so it's using the MMX/SSE registers for the floating point.

2018-03-31 03:36:34 UTC  

Call convention allows the arguments to come in as registers instead of the stack.

2018-03-31 03:36:42 UTC  

xmm0, xmm1, xmm2

2018-03-31 03:37:24 UTC  

First 3 lines multiplies each register by itself, which squares the value.

2018-03-31 03:37:34 UTC  

Then 2 lines to add them.

2018-03-31 03:38:36 UTC  

Then `pxor` a register with itself, it always generates a zero. It's the fastest way to load a zero into a register, in Intel/AMD processors.

2018-03-31 03:38:45 UTC  

okay so in the beginning he doesnt manually move your variables into the first 3 registers

2018-03-31 03:41:55 UTC  

I'm not too familiar with amd64/x86-64 ABI, I'm guessing it can assume the first few arguments are both on the stack and on the XMM registers.

2018-03-31 03:42:13 UTC  

ok

2018-03-31 03:42:25 UTC  
2018-03-31 03:42:58 UTC  

Most of the other commands are somewhat familiar

2018-03-31 03:46:49 UTC  

it's using single-precision floating point, so if you typed in different numbers it may have automatically chosen a format with more precision?

2018-03-31 03:47:30 UTC  

That's governed by how basic types arithmetic works in C++.

2018-03-31 03:48:18 UTC  

Replace the `float` by `double`, and call `::sqrt()` to see it use different instructions.

2018-03-31 03:49:12 UTC  

https://cdn.discordapp.com/attachments/423219052849397773/429487300972380160/sqrt-test-2.png

2018-03-31 03:51:50 UTC  

The `pxor`, `ucomis` and `ja` serve to check if the argument is positive; if so, it can just use the `sqrt` instruction; otherwise, it needs to call the `sqrt()` function from the standard library, which handles all the nasty NaN, Infinity, negative arguments.

2018-03-31 03:54:52 UTC  

okay. so the chip has its own primitive math

2018-03-31 03:56:29 UTC  

Yeah, it's a CISC architecture.

2018-03-31 03:56:58 UTC  

Switch to the MIPS gcc, and you'll get a very different result.

2018-03-31 03:57:13 UTC  

RISC dont have its own maths intructions?

2018-03-31 03:57:22 UTC  

https://cdn.discordapp.com/attachments/423219052849397773/429489359289319424/sqrt-test-3.png

2018-03-31 03:57:58 UTC  

ive only read a couple pages about MIPS so far

2018-03-31 03:59:18 UTC  

Reduced Instruction Set Computer, the whole point is to have so few instructions in the architecture that the circuitry is very small.

2018-03-31 03:59:59 UTC  

Being small means there's less need for synchronization, thus it can run faster, and there are more transistors that can be used for caches.

2018-03-31 04:00:36 UTC  

PowerISA stands for Performance Optimized With Enhanced Reduced Instruction Set Computer Instruction Set Architecture

2018-03-31 04:01:02 UTC  

So a typical RISC arch won't have any advanced instructions. No specialized math, no instructions that mix register operands with memory, etc.