14. Efficiency

When CPU-time comes at a premium (e.g. in some firmware applications), your knowledge of, and ability to control, execution efficiency is essential to producing successful applications. The following are some tips that increase execution efficiency.

If you are ever in doubt about what a compiler is doing with your code, have a look at the assembly language it generates. This is easy to do by adding a command-line argument to the compiler command line and compiling just the one file you are interested in. The GCC compiler uses the -S (capital S) command-line option to signal that its output should be only the assembly listing rather than the normal binary object module. Example:

$ gcc -S my_file.c  >my_file.s

or

$ gcc -S my_file.c  -o my_file.s

Sometimes you can gauge the efficiency by knowing how many clock cycles each assembly instruction takes. In RISC CPUs there is often a sort of “assembly line” within the CPU so that various steps of each instruction are chain-ganged in stages (e.g. address computations, and other things finally leading up to the execution step). Each stage is carried out in synchronization with the main SYSCLK. Because instructions are fed into this “assembly line”, instead of taking 5 SYSCLKs (e.g. for a 5-stage execution sequence), the average instruction time winds up being 1 SYSCLK per instruction. Such is the case with the MIPS32 and ARM instruction sets for example. But the compiler’s Assembly Reference will tell the story (how many SYSCLKs it “consumes”) for each instruction. For MIPS32, 98% of the instructions use 1 SYSCLK, which is very convenient for gauging the efficiency of the generated assembly code: just count the lines!

It is also important to know — at least to some degree — what the assembly instructions mean, so you can identify loops and branches where instructions are skipped and can gauge accordingly. Be advised, sometimes branching instructions take longer depending on whether the branch is taken or not. Your assembly manual will tell the story – the SYSCLKs required for each instruction are usually listed with the details for that instruction.

Alternately, you can simply generate a disassembly listing (with C code embedded) from the the generated executable (e.g. .elf file) like this:

$ objdump  -Sz  app_name.production.elf   >app_name.s

14.1. Some Common C Efficiency Tips

When local variables don’t have to be defined, they aren’t. Reason: this encourages the compiler to store intermediate and return values in registers, avoiding the CPU overhead of storage and retrieval to/from RAM. Sometimes for efficiency purposes, the local variables are declared as register storage class, hinting to the compiler that they be stored as registers instead of in RAM for sake of efficiency.

Example:

long double  Vector2_qldDotProduct(vector2_t * v1, vector2_t * v2) {
    return ((v1->x * v2->x) + (v1->y * v2->y));
}

or this version is nicer for debugging because it permits a stop-point on the return statement to allow the programmer to view the arguments and result value, while simultaneously generating identical assembly language for the processor:

long double  Vector2_qldDotProduct(vector2_t * v1, vector2_t * v2) {
    register long double  ldResult;

    ldResult = (v1->x * v2->x) + (v1->y * v2->y);

    return ldResult;
}

Note on clarity: when this causes expressions to become less readable (understandable) due to complexity, note carefully that the following also generates identical assembly language for the processor, because of the register modifiers in the variable definitions. This can be helpful to increase readability (and visibility under the debugger) when the expressions get complex:

long double  Vector2_qldDotProduct(vector2_t * v1, vector2_t * v2) {
    register long double  ldResult;
    register long double  ldXPart;
    register long double  ldYPart;

    ldXPart = (v1->x * v2->x);
    ldYPart = (v1->y * v2->y);
    ldResult = ldXPart + ldYPart;

    return ldResult;
}