Velvet Lemonade

Some Code to Come

Compiler Optimizations

leave a comment »

I’m writing this post to share my findings on something that was keeping me up last night, and although I was pretty sure about the answer I still wanted to test them in the morning just to make sure that my assumptions were correct.

The matter at hand is C coding practices.

In this episode: Constant declaration

I think we can all agree that writing code that uses magic numbers or hardcoded strings directly is a huge mistake and can lead to great pains specially when it comes to fixing bugs and refactoring.

The two main practices I’ve seen are:

I – #define -> Use the pre-processor for every constant.

Pros:

  1. There is no runtime overhead because the numbers get replaced by their real value while building
  2. There is a nice color difference in most IDEs
  3. Scope is maintained

Cons:

1. Scope is not maintained. This code would give some annoying warnings:

code:  C1

{
    #define MEANING_OF_LIFE 42
    int i = MEANING_OF_LIFE;
}
{
    #define MEANING_OF_LIFE 33
    int i = MEANING_OF_LIFE;
}

gcc -S constTest.c
constTest.c:17:1: warning: “MEANING_OF_LIFE” redefined
constTest.c:12:1: warning: this is the location of the previous definition

*notes: To test the code I’m just compiling with gcc on the terminal. the -S flags means that I don’t want to link the code, I just want to compile it into ASM language. The result is .s file with the ASM code.

2. Type checking is not obvious since it’s just a string replacement at the source level.

II – const -> define constants in the form of kTheMeaningOfLife to store your magic numbers

Pros:

  1. I love how the k looks in front of the name
  2. IDEs like Xcode can be configured to color code constants
  3. Scope if really maintained

Cons:

  1. Constants are actually stored values. gcc with no optimization flags in a regular scope with actually make space for them in the stack, creating them and setting values to them so there is a runtime overhead

code: C2

{
    const int kA = 5;
    const int kB = 5;
    int a = kA;
    int b = kB;
    int c = a + b;
    printf("%i",c);
}

gcc -S constTest.c

movl $5, -4(%rbp)     ;kA = 5
movl $5, -8(%rbp)     ;kB = 5
movl -4(%rbp), %eax   ;eax = kA
movl %eax, -12(%rbp)  ;a = eax
movl -8(%rbp), %eax   ;eax = kB
movl %eax, -16(%rbp)  ;b = eax
movl -16(%rbp), %eax  ;eax = b
addl -12(%rbp), %eax  ;eax = eax + a
movl %eax, -20(%rbp)  ;c = eax
...                   ;Call printf

Now this is f*cking ridiculous… what a damn stupid compiler. And look at all that overhead! it’s actually writing the values into memory (the stack) <kA, kB>, and then reading that memory to them into other places in the stack <a, b> to finally add them together.

This is a lot compared to the same code using #define:

code: C3

{
    #define kA 5
    #define kB 5
    int a = kA;
    int b = kB;
    int c = a + b;
    ...
}

gcc -S constTest.c

movl $5, -4(%rbp) ;a = 5
movl $5, -8(%rbp) ;b = 5
movl -8(%rbp), %eax ;eax = b
addl -4(%rbp), %eax ;eax = eax + a
movl %eax, -12(%rbp) ;c = eax

This has less overhead on the runtime, also it takes less space making the code smaller and tighter. All code using defines no matter the scope will always look like this and #define do have a file-level scope which is why most people put them all at the top. I like to keep my scopes very tight which is one of the pros I have for using const.

Let’s see what happens when we add a global const. I expect it to move to heap instead of the stack.

code: C4

const int kA = 5;
int main()
{
const int kB = 5;
int a = kA;
int b = kB;
int c = a + b;
...
}

gcc -S constTest.c

.globl _kA
.literal4
.align 2
_kA:
.long 5
...
movl $5, -4(%rbp)     ;kB = 5
movl _kA(%rip), %eax  ;eax = _kA (Global)
movl %eax, -8(%rbp)   ;a = eax
movl -4(%rbp), %eax   ;eax = kB
movl %eax, -12(%rbp)  ;b = kB
...                   ; you know the rest

So darn it… if I change the scope I’m adding more overhead and taking up space in the heap. There are the arguments that christians and people that like to do micro-optimizations will tell you. Let’s now turn on some optimization flags to see if the compiler gets any smarter. First let’s see what compiling without optimization means:

-from man gcc:

-O0 Without any optimization option, the compiler’s goal is to reduce the cost of compilation and to make debugging produce

the expected results.  Statements are independent: if you stop the program with a breakpoint between statements, you can

then assign a new value to any variable or change the program counter to any other statement in the function and get

exactly the results you would expect from the source code.

Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of

compilation time and possibly the ability to debug the program.

This is probably what you get when running the program in Debug mode. In fact in Xcode the default Debug optimization is -O0 which means no optimization. However on release builds the flag is -Os which tries to produce the fastest and smallest binary:

-from man gcc:

-Os Optimize for size, but not at the expense of speed.  -Os enables all -O2 optimizations that do not typically

increase code size.  However, instructions are chosen for best performance, regardless of size.  To optimize solely

for size on Darwin, use -Oz (APPLE ONLY).

The following options are set for -O2, but are disabled under -Os: -falign-functions  -falign-jumps  -falign-loops

-falign-labels  -freorder-blocks  -freorder-blocks-and-partition -fprefetch-loop-arrays  -ftree-vect-loop-version

When optimizing with -Os or -Oz (APPLE ONLY) on Darwin, any function up to 30 “estimated insns” in size will be

considered for inlining.  When compiling C and Objective-C sourcefiles with -Os or -Oz on Darwin, functions

explictly marked with the “inline” keyword up to 450 “estimated insns” in size will be considered for inlining.

When compiling for Apple POWERPC targets, -Os and -Oz (APPLE ONLY) disable use of the string instructions even

though they would usually be smaller, because the kernel can’t emulate them correctly in some rare cases.  This

behavior is not portable to any other gcc environment, and will not affect most programs at all.  If you really want

the string instructions, use -mstring.

Let’s see the generated code using this flag:

Code: C2

gcc -S -Oz constTest.c

movl $10, %esi  ;Compute the end value at compile time!

*note: When possible the compiler will use register to pass function parameters, %esi is used to pass the parameter to printf

Code: C3

gcc -S -Oz constTest.c

movl $10, %esi ;Same thing

Code: C4

gcc -S -Oz constTest.c

movl $10, %esi ;The compiler is smart enough to see that the global is not used anywhere so it doesn't even define it.

So as you can see. While micro-optimizations can make you feel smart while writing code the truth is that the compiler is already very smart and makes all your puny attempts at optimization seem completely futile. So just write the most readable and maintainable code you possibly can and let the compiler do the rest for you. Only optimize when you have a performance issue NEVER, EVER, EVER before. And always measure before and after you optimize. Also whenever you have a performance problem and you think you know where the problem is chances are you are wrong so learn to use the tools and measure to find the issues.

Ah and as for #define vs const. const is obviously the way to go unless you are in a header.

Written by mephl

June 12, 2011 at 6:58 am

Posted in C, Code