14. Efficiency
When CPU-time comes at a premium (e.g. in some firmware applications), your knowledge of, and ability to control, execution efficiency is essential to producing successful applications. The following are some tips that increase execution efficiency.
If you are ever in doubt about what a compiler is doing with your code, have a look
at the assembly language it generates. This is easy to do by adding a command-line
argument to the compiler command line and compiling just the one file you are
interested in. The GCC compiler uses the -S
(capital S) command-line option
to signal that its output should be only the assembly listing rather than the normal
binary object module. Example:
$ gcc -S my_file.c >my_file.s
or
$ gcc -S my_file.c -o my_file.s
Sometimes you can gauge the efficiency by knowing how many clock cycles each assembly instruction takes. In RISC CPUs there is often a sort of “assembly line” within the CPU so that various steps of each instruction are chain-ganged in stages (e.g. address computations, and other things finally leading up to the execution step). Each stage is carried out in synchronization with the main SYSCLK. Because instructions are fed into this “assembly line”, instead of taking 5 SYSCLKs (e.g. for a 5-stage execution sequence), the average instruction time winds up being 1 SYSCLK per instruction. Such is the case with the MIPS32 and ARM instruction sets for example. But the compiler’s Assembly Reference will tell the story (how many SYSCLKs it “consumes”) for each instruction. For MIPS32, 98% of the instructions use 1 SYSCLK, which is very convenient for gauging the efficiency of the generated assembly code: just count the lines!
It is also important to know — at least to some degree — what the assembly instructions mean, so you can identify loops and branches where instructions are skipped and can gauge accordingly. Be advised, sometimes branching instructions take longer depending on whether the branch is taken or not. Your assembly manual will tell the story – the SYSCLKs required for each instruction are usually listed with the details for that instruction.
Alternately, you can simply generate a disassembly listing (with C code embedded)
from the the generated executable (e.g. .elf
file) like this:
$ objdump -Sz app_name.production.elf >app_name.s
14.1. Some Common C Efficiency Tips
When local variables don’t have to be defined, they aren’t. Reason: this encourages the
compiler to store intermediate and return values in registers, avoiding the CPU overhead
of storage and retrieval to/from RAM. Sometimes for efficiency purposes, the local
variables are declared as register
storage class, hinting to the compiler that they
be stored as registers instead of in RAM for sake of efficiency.
Example:
long double Vector2_qldDotProduct(vector2_t * v1, vector2_t * v2) {
return ((v1->x * v2->x) + (v1->y * v2->y));
}
or this version is nicer for debugging because it permits a stop-point on the return statement to allow the programmer to view the arguments and result value, while simultaneously generating identical assembly language for the processor:
long double Vector2_qldDotProduct(vector2_t * v1, vector2_t * v2) {
register long double ldResult;
ldResult = (v1->x * v2->x) + (v1->y * v2->y);
return ldResult;
}
Note on clarity: when this causes expressions to become less readable
(understandable) due to complexity, note carefully that
the following also generates identical assembly language for the processor, because
of the register
modifiers in the variable definitions. This can be helpful to
increase readability (and visibility under the debugger) when the expressions get
complex:
long double Vector2_qldDotProduct(vector2_t * v1, vector2_t * v2) {
register long double ldResult;
register long double ldXPart;
register long double ldYPart;
ldXPart = (v1->x * v2->x);
ldYPart = (v1->y * v2->y);
ldResult = ldXPart + ldYPart;
return ldResult;
}