Monday, 21 April 2014

Are there any low hanging fruits in memory footprint optimization ?

That could be one question asked after my last post. The answer is a very subjective "maybe" and here's why:

First of all the memory footprint optimization would frequently involve a space and time tradeoff. If you use code  transforms to reduce footprint, many transforms will also result in a decrease in speed (or performance). Quite common with data structures.

Second, the overall size optimization problems is actually two independent sub-problems viz. Static (or Code size) footprint optimization and Dynamic (or runtime) footprint optimization. Code size impacts you secondary storage (ROM, Flash or Disk) as well as primary storage (RAM), while runtime data influences your primary storage only. The importance of these are domain dependant. Mobile devices usually have 8/16/32 GB of flash storage (without occasionally a scope of explansion through SD Card by another 64 GB) while desktop/server storage can be in terabytes. Mobile devices have 1/2/3 GB RAM, but that available in desktop or server can range from 4-100 GB. Increasing any storagehas huge financial and battery consumption implications for mobile devices, not always so much for desktops or servers.  In some cases dynamic footprint optimization can help systems which are memory bound, by eliminating the need of extra machines. It is important, however, to remember that footprint optimization is two sub-problems and needs two seperate medicines.

Static Code sizes comprises two parts viz. Program (instructions in code segment) and Static Constant data (in bss & data segments). Usually the code segment is much larger than a data segment, unless you encounter a library which uses huge tables for some type of data mapping or transforms (like iconv).

To optimize data segment, you need to cut the data and sometimes the data is so linear that can be replaced by a mapping function which computes the result rather than look it up directly from the table. This is where my "maybe" comes from. And its very rare. Such replacing of data with computing code is good for size optimization, but negatively impacts speed. Data can also be cut by reducing duplication of constant data such as strings by storing it at one place and using a reference to it everywhere.

The code segment optimization needs more effort and is often the outcome of software bloat by enhancement or careless coding. The most efficient way to deal with it is not to take a hammer and beat out every function in sight, but to take a chisel first and carve out the software suited to the exact use case. The sculptor's toolbox should include the following methods and practices:

(i) Avoid Static linking of reused infrastructure. Consider dynamic linking as it will maintain only one copy of the instruction in memory but different data segments per process. 
(ii) if sharing is within a process space, consider developing the library without any global writable data but thread-specific (or context specific data)
(iii) Use compile/link time exclusions, code removal, etc to ensure that only the needed features are built-in and the rest are excluded from the build. This tends to give the maximum returns on investment.
(iv) After having exhaused all the above, you need to take the hammer. Unlike performance, I have not come across footprint hot-spots in instruction segments at a micro level. Maybe you will find one module/layer is very big but not one function. In such a case, you need to use the hammer on that module and not the rest. Otherwise you have no choice but to beat out everything in sight.
(v) Cutting instructions involve enabling compiler's size optimization flag, removing large loop unrolling, repetitive code, duplication & simplification of error handling, sharing of code between FSM handlers and so on. It will also involve removal of common log strings by having a table and using a pointer to that. The idea is to have less code for the same functionality.
(vi) in some cases, some implementations can compress sttaic code and uncompress it dynamically while loading into memory

In the worst outcome, you may even need to seperate the performance sensitive codebase and the size constrained one, where the latter is written completely differently. This used to be quite a common outcome if one starts by taking a server/desktop side librray and tries to port it to a memory constrained mobile device.

On the other hand, dynamic footprint is usually impacted by your runtime data structures in addition to code segment. if runtime data is the problem, then you can try transforms like structure alignment to natural boundaries, structure packing without alignment to any boundary, decreasing the storage width of elements based on range, opening less file descriptors at any stage, implementing custom memory mangement to avoid small allocations by preserving the data structure design. This may also be at times a low hanging fruit (my "maybe" again). Or you need to design a more compact (even if less efficient speed-wise) data structure. On more example is stack usage. If stack usage is the problem, things like recursion or very deep function calling tree sequence need to be avoided.

When we start any dynamic footprint optimization activity, we must clearly measure how much is the static and how much is the additional runtime footprint and then direct attention to the right place for maximum impact. Attacking impulsively any low-hanging fruit without judging how much value it can generate, is just an inefficient way of working that fails to produce tangible outcome. To summarize:

Static Footprint = Instruction segment + Data Segment
Dynamic Footprint = Static Footprint (its loaded into RAM) + Runtime Data

Usually most of the time the problem is the dynamic data, not just the static footprint in isolation on both server side and client (mobile) software.

No comments:

Post a Comment