Monday, 21 April 2014

Transition to 64 bit - Pain or Panacea ?

Once in a while the 64 bit moniker raises its head and promises to administer a much needed testosterone boost to whatever is ailing the computing industry. Most recent example is Apple with its IOS 7 release. Yet the facts are far different detached in most cases with this marketing hype created.

64-bit at a programming level comes with the promise of more registers in the CPU, the ability to process larger data (64 bits at a time) than 32 bit and ofcourse an understated/unspecified improvement in the basic CPU architecture (say removing flaws of the 32 bit architecture). To be frank the first two capabilities effect a certain class of math intensive applications like image processing, signal processing, compression/decompression, security algorithms etc. But not everything under the sun can get significant jumps.

Infact most general purpose applications are not written from scratch for 64 bit (as it should be), but are ported. Those applications are optimized for a 32 bit architecture, and when compiled with minimum changes for 64 bit can work on 64 bit but are not tuned for it. It results in no gains whatsoever in many cases, despite the compiler using some of the extra registers and sometimes come with a performance penalty. Surprise ! Surprise !! Surprise !!! 

The reason are that many routines in C/C++ which are not CPU compute intensive cannot use all the extra registers and benefit somewhat little by the extra registers and ofcourse the understated newer & better CPU architecture. To make matters worse the data which was aligned for 32 bit (and multicore cache lines) is sometimes thrown out of alignment for 64 bit which imposes a performance penalty (memsets, memcopy, etc can become heavier for structures). The net result varies and is anybody's guess but definetely not what the 64 bit technology promised in the marketing materials. And as long & pointers (and sometimes ints in some architectures) data are now 64 bit instead of 32, in general the dynamic footprint also increases and this increase is sometimes upto 30%. And sometimes our programmer who ported has not used all the possible static analyzers correctly, or supressed compiler warnings that resulted in some bad type conversions to linger around that crash sometimes when values greater than a 32 bit integer range could hold but work fine when the values are within the 32 bit range !!! So instead of a higher performance you expected,  you end up getting  a 30% code bloat and maybe a 10% drop in performance and top it a few of random bugs (Heisenbugs) leading to a question mark on stability. I have seen scene this play out many a times. 

And if you are using an IOS 7 on iPad or IPhone 5S, you may have observed that after migrating to IOS 7 the low memory errors, random crashes Safari (with multiple tabs), frequent browser tab refreshes and crashes in other apps have increased with the diagnostic data pointing to low memory (which was otherwise absent in 32-bit IOS 6).

My intention is not to blow a  gun on your face and scare you off 64 bit Computing. 64 bit was meant for applications that need to use more than 2 or 3 GB addressable memory space in one process (IPhone/iPad have 1 GB RAM). Those are anyways not very common in general purpose applications. Maybe for large in-memory databases(caches) or scientific computing applications. And a majority of applications don't need 64 bit. So if you are a developer, look before you leap. And for those who can benefit from 64 bit, my advice would be to:

(1) Compile the code with the highest level of compiler warnings and ensure that the warnings don't point to 64 bit problems
(2) Use static PClint, KlocWorks, Coverity, Viva64 etc (better to include a special 64 bit code checker oif available) to detect 64 bit errors and get rid of them
(3) Wherever you are casting pointers to integers and vice-versa, check it carefully.
(4) Expect Code bloat. If its not acceptable you may have to use compilert size optimization and/or hand-tuning (wondering what apple and IOS app developers are going to do)
(5) Measure performance and tune data structures to play nicely on 64 bit and Multi-core cache lines
(6) Test Comprehensively.

I am not aware of a test method which can ensure that all pointers are greater than 32 bit range even if run on a machine with more than 4G physical RAM. So very strict static analysis and code review is your best bet to remove heisenbugs, even though you may be having a very comprehensive test net. 


Are there any low hanging fruits in memory footprint optimization ?

That could be one question asked after my last post. The answer is a very subjective "maybe" and here's why:

First of all the memory footprint optimization would frequently involve a space and time tradeoff. If you use code  transforms to reduce footprint, many transforms will also result in a decrease in speed (or performance). Quite common with data structures.

Second, the overall size optimization problems is actually two independent sub-problems viz. Static (or Code size) footprint optimization and Dynamic (or runtime) footprint optimization. Code size impacts you secondary storage (ROM, Flash or Disk) as well as primary storage (RAM), while runtime data influences your primary storage only. The importance of these are domain dependant. Mobile devices usually have 8/16/32 GB of flash storage (without occasionally a scope of explansion through SD Card by another 64 GB) while desktop/server storage can be in terabytes. Mobile devices have 1/2/3 GB RAM, but that available in desktop or server can range from 4-100 GB. Increasing any storagehas huge financial and battery consumption implications for mobile devices, not always so much for desktops or servers.  In some cases dynamic footprint optimization can help systems which are memory bound, by eliminating the need of extra machines. It is important, however, to remember that footprint optimization is two sub-problems and needs two seperate medicines.

Static Code sizes comprises two parts viz. Program (instructions in code segment) and Static Constant data (in bss & data segments). Usually the code segment is much larger than a data segment, unless you encounter a library which uses huge tables for some type of data mapping or transforms (like iconv).

To optimize data segment, you need to cut the data and sometimes the data is so linear that can be replaced by a mapping function which computes the result rather than look it up directly from the table. This is where my "maybe" comes from. And its very rare. Such replacing of data with computing code is good for size optimization, but negatively impacts speed. Data can also be cut by reducing duplication of constant data such as strings by storing it at one place and using a reference to it everywhere.

The code segment optimization needs more effort and is often the outcome of software bloat by enhancement or careless coding. The most efficient way to deal with it is not to take a hammer and beat out every function in sight, but to take a chisel first and carve out the software suited to the exact use case. The sculptor's toolbox should include the following methods and practices:

(i) Avoid Static linking of reused infrastructure. Consider dynamic linking as it will maintain only one copy of the instruction in memory but different data segments per process. 
(ii) if sharing is within a process space, consider developing the library without any global writable data but thread-specific (or context specific data)
(iii) Use compile/link time exclusions, code removal, etc to ensure that only the needed features are built-in and the rest are excluded from the build. This tends to give the maximum returns on investment.
(iv) After having exhaused all the above, you need to take the hammer. Unlike performance, I have not come across footprint hot-spots in instruction segments at a micro level. Maybe you will find one module/layer is very big but not one function. In such a case, you need to use the hammer on that module and not the rest. Otherwise you have no choice but to beat out everything in sight.
(v) Cutting instructions involve enabling compiler's size optimization flag, removing large loop unrolling, repetitive code, duplication & simplification of error handling, sharing of code between FSM handlers and so on. It will also involve removal of common log strings by having a table and using a pointer to that. The idea is to have less code for the same functionality.
(vi) in some cases, some implementations can compress sttaic code and uncompress it dynamically while loading into memory

In the worst outcome, you may even need to seperate the performance sensitive codebase and the size constrained one, where the latter is written completely differently. This used to be quite a common outcome if one starts by taking a server/desktop side librray and tries to port it to a memory constrained mobile device.

On the other hand, dynamic footprint is usually impacted by your runtime data structures in addition to code segment. if runtime data is the problem, then you can try transforms like structure alignment to natural boundaries, structure packing without alignment to any boundary, decreasing the storage width of elements based on range, opening less file descriptors at any stage, implementing custom memory mangement to avoid small allocations by preserving the data structure design. This may also be at times a low hanging fruit (my "maybe" again). Or you need to design a more compact (even if less efficient speed-wise) data structure. On more example is stack usage. If stack usage is the problem, things like recursion or very deep function calling tree sequence need to be avoided.

When we start any dynamic footprint optimization activity, we must clearly measure how much is the static and how much is the additional runtime footprint and then direct attention to the right place for maximum impact. Attacking impulsively any low-hanging fruit without judging how much value it can generate, is just an inefficient way of working that fails to produce tangible outcome. To summarize:

Static Footprint = Instruction segment + Data Segment
Dynamic Footprint = Static Footprint (its loaded into RAM) + Runtime Data

Usually most of the time the problem is the dynamic data, not just the static footprint in isolation on both server side and client (mobile) software.

Sunday, 20 April 2014

Low Hanging Fruits for Performance Optimization

If you are optimizing code, the biggest gains for you are likely going to come from optimizations in software design (i.e. your data structures and algorithms) and architecture (concerns like distribution, functionality, interface, multi core, etc). But usually when you run into a system which needs to be put on sterioids, this not the first thing you should be looking at. 

Let me begin break down the optimizatin problem into three stages:

First, I have seen many teams working hard to achieve some internal goal on performance of a software sub-system/module, when the software module has very little impact on the overall system performance. For eg., a protocol stack takes just 10% CPU usage in a product sub-system. If we improve the performance of this stack by 100% (in most cases by literally burning the midnight lamp), then the impact on the overall system is hardly 5%, something which is not significant. IMO optimizations like performance modeling have to be partitioned top-down, not bottom-up as software teams might be tempted to do. At times sub-system architected when challenged with a 50% improvment might be tempted to set every module in the critical path an improvment goal of 50% so that he can meet the goal. This is not the optimal way of working as the bottlnecks have to be identified quantitatively and then targets set. The benefits must be tangible for an optimization activity to be started on any peice (or whole) in the first place. 

Second, there is no end to the performance improvment that can be achieved on software. You identify & remove one bortleneck & something else gets significant. You remove that, a third might crop up and it goes one. At some stage you will not see one bottlneck but many equal contributors the conclusion being you need to deal with all of them to go to the next one. But their has to be an end to this practically. IMO, we must leran to accept that no design is final or perfect, but their is something in between that is "Just Good Enough" for that time. Optimization takes efforts. And as you optimize further it gets tougher each time. You have to consider the value (impact) of the optimization paarmeter (throughput, latency, resource usage, etc) on the whole (entire system). At some stage you are likely going to come to the conclusion that further optimization does not yield any *tangible* benefit to the end user or the product. That's when you need to stop.

Third, we assume that we have crossed both these stages, and you are convinced about the need to optimize one step, the question of how arises. As I have already said design & architecture chnages bring big benfits, but involve the maximum risk and efforts. Before you jump to this, you need to check the "hotspots"  in the present software design. Its likely that a few functions (or even routines) in the architecture are consuming a lot of CPU. For eg., an avoiadble socket open and close, an OS System call, a partcular hash lookup, etc. Many of them are low-level coding problems. You could consider if the usage of these hotspots can be reduced (avoidance), or the bottleneck itself can be improved.  A hotspot's contribution is  the product (multiplication) of both the number of times it invoked and the cost of each invocation. Optimize either (or both) and you will observe gains. These changes are localized  at times but the common trait is that they are very simple and less risky to make. That's why I call them the "Low Hanging Fruits".

A time will come when you can't do anything more with the the hotspots or their contribution to the overall system is not dominant. That's when you need to stop the hotspot based optimization and go to the  design or architecture changes (again). Data structure design & algorithm should be the first targets in this step and we can follow if required with architectural modifications.


Wednesday, 16 April 2014

Should we standardize technology or move one step ahead to standardizesoftware infrastructure

I have worked a better part of my career on standards related protocol stacks and middle-ware (SDKs, Frameworks, etc). Yet today the utility of the way the entire standardization activity is carried out can be questioned.  Standards involve the following main stages

(1) Identify a common direction or problem, encountered in the ecosystem.
(2) Multiple players get together to create a forum/working-group within a suitable SDO (that supports the ecosystem)
(3) The forum meets, brainstorms and designs a working solution on paper (i.e. charter, architecture and design). The technology is licensed under non-discriminatory to all players whether they are part f the forum or not.
(4) The ecosystem adopts the above work and everybody goes to the computer to write the supporting software infrastructure, carries out trials, inter-operability tests and the like. The problems are identified, solutions  proposed and this goes back as feedback [step (3)] for correction and enhancement.
(5) At some point the standards mature, the changes reduce and commercial deployments happen

This is how IETF, 3GPP, etc used to work. The entire cycle would take years (sometimes a decade) from start to finish. Very un-agile so as to say. Today we value frequent software releases, early feedback, welcome changes and prefer working software to bulky documentations.

 Now consider the following alternative:

(a) We have a (1) and (2), but instead of a big document (or revision) at the end of every 6 months, the SDO working group releases a simple architecture document and working software infrastructure as open-source to members every month (or even faster) highlighting how to use the new features, defects fixed rather than describe what is the wire-protocol or how it is working internally.
(b) The players immediately integrate (upgrade) the software infrastructure released by SDO in their Prototype products, carry out the tests and give feedback.
(c) Finally at some point the changes reduce and the software matures. At this point a small optimization phase is done on the software infrastructure, changes from multiple players (who are willing to share their innovation) merged into the forum OSS code-base.
(d) Subsequent testing achieves stabilization and now a final detail design document can be produced (as a subject of academic study and knowledge dissemination)  along with the working software.

How would (a)-(d) fare in terms of agility ? I think it would cut the overall development costs for every player, shorten the time to market drastically without sacrificing inter-operability (because everyone uses the  same implementation). In fact its likely that better inter-operability is achieved compared to the former process which leaves out chances of people interpreting (& implementing) some under specified parts of the standard differently (this is a very common problem). At the same time business logic innovations (that are the revenue spinners) are still protected.

Open source supporting such standardization work collaboratively done by leading players will result in a terrific force multiplier for the entire market. I already see the newer SDOs (like SDN forum with OpenFlow) working in a model similar to this by producing an implementation side by side of a document. 

Tuesday, 1 April 2014

Energy Efficiency for C Programmers

In one of my previous posts  titled "How software goes Green", I had propose that a design for resource efficiency leads to energy efficient software. What can we do as programmer ? Extend this to programming.

As with performance engineering, 70-80% of the energy profile of an application can be addressed by good design (better data structures, better algorithms). The remaining 20-30% needs some support from coding. I would warn the readers that code hand tuning is not the most rewarding process, unless for some reason you are hard pressed to squeeze the last bit of energy consumption out of the application. It is useful in thin infrastructure components and platforms, where their is a lot of functionality but the complexity (perhaps indicated by call stack depth) is low. Or some function/routine is so frqeuently used that it becomes a significant contributor to the overall energy consumption of the software.

Tune code for least runtime instruction cost (not size). As a general the fewer the runtime instructions, the more energy efficient the code will be. Unrolling small loops, were the cost of setting up a loop (initialization condition, exit condition and loop progression) is comparable to the cost of the loop body. Similarly using vector instructions will tak emuch less energy as one instruction does the jpb of many. avoid unwanted memsets, memcopy etc.  Also multi-core is the norm today, not exception. If a large computation can be paralleized across cores, its better to do that rather than run one core to its peak power for longer duration.

You can refer to any code performance tuning material for such examples. 


Monday, 31 March 2014

Difference between Software Architecture & Design

A lot of people are confused between what architecture mean and what design means for software. At times even i got confused early on in my career, as I worked only on parts of a big system (homogeneous if big) not the whole system.

What they mean ?
Architecture defines the big picture. It describes structure. What is done by the system, where its done and why it is done (strategy) like that. If you are given a problem statement and asked to solve it, you will do architecture to arrive at the most optimal solution. Architecture is thus essential to sub-divide the complexity of a big system into smaller parts (the where part). Architecture is externally visible. Architecture is a strategic (stable for long-term) activity. It can cover goals, scope, choice of frameworks, high level methodologies, etc. Architecture is therefore free from finer Implementation details.

Design on the other hand is the small picture and talks about how things are done in the sytem. If you are given a solution to a problem, then you wiill carry out design to move further. Design is tactical (or short term) activity dealing with code organization, interfaces between parts (layers, modules, classes, etc), design patterns, programming idioms & refactoring steps. Design is not visible externally. If a design has no implementation details, then it is not design because it does not help you write the software.

Both architecture and design can have patterns. Such as LAMP stack, MVC, N-Tier Web architecture for architecture and factory pattern, Proxy, etc for design.

If you are in the process of writing a framework, toolkit or SDK for a particular application and you do not know what is the level of abstraction (boundary) that you will provide, then that decision making process however small the software is has architectural underpinnings. If the API level is clear (like provide a parsing interface, or transaction based interface or session based or just limit to XXX functionality), then all you need is a design step. With my previous employer, we used to follow this principle to decide whether to have a  seperate architecture phase or not in the project lifecycle. 

Getting an architecture wrong is a a sure-fire step to product disaster while if you get design wrong (usually its limited to parts or sub-systems) you at-least have a chance to learn and correct yourself at the expense of some schedule slippage. Architects thus require to have more business, domain and general software skills (width) while designers need to be good in writing software (depth). I believe you can be a good architect without necessarily being great at software design. A person aiming to be a good software architect needs to work on many different sub-systems, domains & technologies to get the necessary knowledge & interest width for solving a problem.

It should be clear that the order in development lifecycle is Architecture followed by Design whether you are doing Waterfall or Agile or Rational or something else.


When they are different ?
When developing a very large system or software. There has to be a concept of the system defined before it can be constructed and that's why it requires a seperate architecture definition step, before a design can be even attempted. You need to know what you have to build, before you build it.

When they are same ?
When developing a small piece of software, where the purpose is clear from just the name of the software, component or project. Or a project where scope, language, functionality is decided and provided. A seperate architecture hardly conveys much by itself apart from what is the most obvious. Maybe a small architecture section could  be the first section of the design documentation in such cases.

Hope this discussion will restore order. I owe a lot of description of these ideas to the remarks/definitions/phrases coined by the folks at www.stackoverflow.com on a similar discussion. Acknowledging all of them here. One comment I found particularly intriguing:

"...long time ago in a faraway place philosophers worried about the distinction between the one and the many. Architecture is about relationship, which requires the many. Architecture has components. Design is about content, which requires the one. Design has properties, qualities, characteristics. We typically think that design is within architecture. Dualistic thinking gives the many as primordial. But architecture is also within design. It's all how we choose to view what is before us - the one or the many."

One can also refer to Part-I (about 20 crisp pages) of the book "Software Achitecture for Developers" by Simon Brown on this topic.





Sunday, 30 March 2014

Aim Small, Miss Small - How to engineer high quality and competitivesoftware

One of the most common issues while developing a big system is the lack of focus on the "-ilities" or quality attributes which are characterised as non-functional requirements (NFRs). The development starts with a 100% focus on functional requirements and NFRs only being an afterthought. Yet in this competitive environment, almost all the competitors are able to implement the FRs, leaving differentiation only in the NFR implementation. Its what in many cases differentiates a great product from a me-too one.

Their are 60+ of these quality attributes as per wikipedia and not each and every one is important to every product. Therefore one needs to devise a quality model specific to the product (may even be a common in a business unit with some tailoring for each product). This is the first step and serves as an table of contents for the NFR specification. Then each section is elaborated into requirements and requirements analysis. Usually each NFR has to be measurable (quality is) and that is one BIG difference from FRs and also puts them at a much higher risk to accomplish technically and effortwise. What is also different here is that techniques and tools applied to FR capturing & analysis (UML) may not apply (and usually don't) to NFR analysis. The overall list can get quite daunting if completed and can push people towards a BDUF approach which is so un-agile or waterfall. I also in the favour that design & coding of functional and non-functional features is not done together as the focus on NFR is usually not enough and cannot achieve desirable results as far as NFR implementation is concerned. 

But their is also a problem with this unbundled desiogn approach. Some requirements like performance are very tightly coupled to vast tranches of code and ignoring them is suitable only for a prototype (reserach project, proof-of-concept or testing that we are building the right product) not product development. There is a difference between refactoring (small transformations) & rework (massive transformation). My guiding principle is the "Open-Closed" principle which states that the software (module, sub-system, or whole) should be open for extension (adding new functionality) not modification (deleting or modifying existing functionality). We should be refactoring (modifying) software to accomodate new things or improve the code, not rework the algorithms, design & data structure completely. Its a tough ask from developers to do a big project all over again and expect them to be enthusiastic to the task


So what do I suggest for the development steps. Something like this:

(1) The first build (comprising multiple iterations or one waterfall project as you may decide) focusses only functional requirements + the most key NFRs which have tight dependency with functional requirement implementation. the architecture and design should be cohesive, loose-coupled and show all other attributes of good design & architecture so that this skeleton is very receptive to enhancements. Again the Open-Closed principle applies entirely here. Ideally I would prefer that the architecture team does the requirement analysis and produces the story cards/SRS requirements to implement along with the software design.

(2) Next while the implementation of first build is going on we can theoretically in parallel work on NFRs one by one and produce NFR specifications and Design for the "ilities" (DFx - the title of this blog). The architects can do thsi work in parallel with development of the first build. The focus is very clear for teh architects and designers, the method tailored to the quality attribute in question, the identified NFRs measurable as far as possible. NFR and "-ilities" need system thinking, not modular thinking to be effective.

(3) After the first development build is over and is being tested by user, we can implement teh NFRS in a series of small builds (iteration). I have seen too many times logs being written by developers based on some API specification and finallty being rendered useless in the market beacuse they are excessive and load the system if enabled (which masks the problem under study), or are not effective to isolate the problem in field or backend technical support.

Good quality is not a cliche. Its not an afterthought. On ground its a very hard engineering challenge to accomplish especially when the opinion of the customer is the very important for proclaiming victory. Focus is everything to produce the results that customer wants to see. 

Aim Small, Miss Small !!!