Friday, 14 March 2014

Using rarely used compiler options to generate *real* business value for software

Recently an article appeared in Dr. Dobb's Journal on the usage of some rarely used compilation options for MS VC++ Compiler:

The Most Underused Compiler Switches in Visual C++


The question is fairly generic and can be asked in context of even the gcc compiler. Or even others. And I have encountered this many times myself & other developers. Especially in context of performance Optimization where it is quite tempting to enable them and squeeze out whatever performance we can.

Generally these compiler options are of the of the following types:

(1) Syntactical checking Options, which usually do not have any bearing on the generated output.
(2) Optimization options which influence the generated output
     (a) Generic target independent optimizations like like Loop unrolling, strength reduction etc
     (b) Optimization Options which target a particular CPU architecture/family (like X86, MIPS, ARM, etc)
   (c) Optimization Options which target a specific CPU- micro-architecture (like ARMV6, ARMv7, Nehalem, etc)
      (d) Optimization Options that tune the generated code to a particular CPU


The first question to ask is "Should we enable these options ?". The answer is not so simple. My opinion is that we should be circumspect. If the compile switch is new or not well tested, it is very risky to use. Compilers also have bugs. Bugs in the generated machine code. Developers struggle to debug high level language source codes. Having to debug generated machine/assembly code is beyond the skill of most and not something one bargained for when you wrote the program in a high-level language like C/C++.  I have faced problems with gcc 2.7x versions where some of my C-Code works fine with -O2 option, but *randomly* crashes in one specific place with -O3 option, which i sent out in the production version because it would run 20% faster and the customer wanted more throughput (as much as possible). Ultimately I got someone skilled in Hex Debugging to look at the generated code and he was able to pin point the faulty generated code. And it wasted 1/2 weeks of time before that gentleman rescued me. This was my first lesson learning.  Its not that we shouldn't use the latest and the greatest switch out there, but ensure that it  is mature enough (widely used in market, well tested, not many bugs, etc). Beyond this I will let your instinct guide your decision. And this is influenced by what domain you work in. Whether safety/reliability is more important than absolutely the last bit of performance you can squeeze out.

Another unrelated mitigation action that may help is to ensure that the Production (not the debug version) is tested in your unit, functional, performance and other tests. We should always test what we will ship, not a variant of it ...

The next question one asks is which one of the above 5 options I can use. The answer , IMO, lies in your software deployment & distribution model:

(a) You can use (1) AFAP to improve your code quality. It depends ion your company and business domain policy. Its just a static code analyzer which helps to avoid pitfalls in which you or other developers that work on the code in future may fall into. It is just safe Programming if you are into this. And if you do not require your code to be compiled by pre-ANSI compilers, no need to use legacy environment preserving flags like -pedantic (in case of gcc)

(b) My opinion to the first question covers the use of (2)(a).

(b) If you are shipping a library or a pure general purpose software application (like Windows, Linux) which will run on particular architecture (like x86, ARM), then the underlying CPU or micro-architecture ehere the software will execute is not in your control. You can try (2)(b) keeping in mind the answer to the first question. If you use other options, then it will create many libraries and complicate your software management.

(c) If you are shipping the software for a particular CPU micro-architecture family (for e.g., a very tightly related product line), then you could add (2)(c) too because you know the user (product line) will just use CPUs of a particular micro-architecture. And you could still keep the answer to the first question in mind, if you do not want to lose sleep.

(d) And finally, if you are shipping an embedded system where the software and hardware (exact CPU) is tightly coupled and distributed, then (2)(s\d) could also be used.


The overall idea is that as your flexibility of target CPU reduces, the flexibility of using more and more optimization options increases.

And what Knuth said ("Premature Optimization is the root of all evil") applies to even compiler driven optimization. Their has to be a pressing reason to use the above optimizations. If the performance of the application/library can be met without these options and no competitive advantage or business value to the application/equipment can be derived from using these options, you can always leave them alone. The risks would outweigh the benefits in this case by far.

No comments:

Post a Comment