Evaluating Performance and Power of Object-oriented vs. Procedural Programming in Embedded Processors A. Chatzigeorgiou, G. Stephanides Department of Applied Informatics University of Macedonia, Greece
Motivation Low Power Requirements for Portable Systems - Battery Lifetime - Integration Scale - Cooling/Reliability Issues Challenge: Increased performance increased power Existing Low-Level Tools for Energy Estimation Processor PowerMemory Power University of Macedonia Widespread application of embedded systems
Does Software Affect Power Consumption ? Until recently, power reduction was the goal of hardware optimizations (transistor sizing, supply voltage reduction etc) Tiwari (1994, 1996) proved that software has a significant impact on the energy consumption of the underlying hardware, which can be measured Software addresses higher levels of the design hierarchy Therefore, energy savings are larger Moreover, for software there is no tradeoff between performance and power: Fewer instructions lead to reduced power University of Macedonia
Sources of Power Consumption Power dissipation in digital systems is due to charging/discharging of node capacitances : University of Macedonia However: Dynamic Power:switchingactivity
Sources of Power Consumption Sources of power consumption in an embedded system - Instruction level power consumption (power consumed during the processor operation) - Instruction and Data Memories (power consumed when accessing memories) - Interconnect switching (power consumed when bus lines change state) University of Macedonia
Instruction Level Power Models University of Macedonia Instruction Energy Base Cost ADD R2, R0, #1 Overhead Cost ADD R2, R0, #1 CMP R2, #0 Energy consumption of a program (Tiwari et al.)
Processor Energy Consumption 6-8 % University of Macedonia
Instruction Level Power Models University of Macedonia
Memory Power Consumption University of Macedonia Energy cost of a memory access >> instruction energy Depends on: - number of accesses (directly proportional) - size of memory (between linear and logarithmic) - number of ports, power supply, technology Instruction Memory Power, depends on code size required memory size #executed instructions #accesses Data Memory Power depends on Amount of data being processed memory size On whether the application is data-intensive #accesses
OOPACK Benchmarks Small suite of kernels that compares the relative performance of object oriented programming in C++ versus plain C-style code: University of Macedonia Max Max: Computes the maximum over a vector Aim: Aim: To measure how well a compiler inlines a function within a conditional C-style: C-style: performs the comparison between two elements explicitly OOP: OOP: performs the comparison by calling an inline function.
OOPACK Benchmarks Matrix Matrix: multiplies two matrices containing real numbers Aim: Aim: to measure how well a compiler hoists simple invariantsC-style: University of Macedonia where, for example, the term L*i is constant for each iteration of k and should be computed as an invariant outside the k loop.
OOPACK Benchmarks OOP: OOP: performs the multiplication employing member functions and overloading to access an element, given the row and the column. Modern C compilers are good enough at this sort of optimization for scalars. However, in OOP style, invariants often concern members of objects. Optimizers that do not peer into objects miss the opportunities. University of Macedonia
OOPACK Benchmarks Iterator Iterator: computes a dot-product Aim: Aim: to measure how well a compiler inlines short-lived small objects (short-lived object should never reside in main memory; its entire lifetime should be spent inside registers) University of Macedonia OOP: OOP: employs iterators Iterators are a common abstraction in OOP. Although iterators are usually called "light-weight" objects, they may incur a high cost if compiled inefficiently. All methods of the iterator are inline and in principle correspond exactly to the C-style code. C-style: C-style: uses a common single index for( int i=0; i<N; i++ ) sum += A[i]*B[i];
OOPACK Benchmarks Complex: Complex: multiplies the elements of two arrays containing complex numbers Aim: Aim: to measure how well a compiler eliminates temporaries C-style: C-style: the calculation is performed by explicitly writing out the real and imaginary parts OOP: OOP: complex addition and multiplication is done using overloaded operations Complex numbers are a common abstraction in scientific programming. The complex arithmetic is all inlined in the OOP-style, so in principle the code should run as fast as the version using explicit real and imaginary parts. University of Macedonia
OOPACK Benchmarks Temporaries are eliminated: Y[k].re = Y[k].re + c.re*X[k].re – c.im*X[k].im; Y[k].im = Y[k].im + c.re*X[k].im + c.im*X[k].re; University of Macedonia SAXPY operation: Y = Y + c*X(c is scalar, X and Y are vectors) Calculation employing temporaries: tmp1.re = c.re * X[k].re – c.im * X[k].im; tmp1.im = c.re * X[k].im + c.im * X[k].re; tmp2. re = Y[k].re + tmp1.re; tmp2.im = Y[k].im + tmp1.im; Y[k] = tmp2; Dynamically allocating and deleting temporaries causes severe performance loss for small vectors
Target Architecture Processing unit: ARM7 TDMI Dedicated instruction memory (on-chip ROM) On-chip data memory University of Macedonia ARM7 integer processor core (3stage-pipeline) Bus Interface ROM controller RAM controller Memory interface signals A [31:0] D [31:0] Instruction memory Chip boundary Data memory
OOPACK Benchmark Code size RAM requirements #instructions#memory accesses ARM STD 2.50ARM Debugger Trace File Profiler Processor Energy Data Memory Energy Instruction Memory Energy Memory Model Total Power
Results – Performance Comparison University of Macedonia
Results – Memory Comparison University of Macedonia
Results – Energy Comparison (mJ) University of Macedonia
OOPACK1 – Energy distribution (mJ) University of Macedonia
Conclusions University of Macedonia Power Consumption should be taken into account in the design of an embedded system. OOP can result in a significant increase of both execution time and power consumption. If a compiler cannot optimize code to reach the level of procedural programming performance, the number of executed instructions increases, increasing proportionally the instruction level power consumption. Especially in large programs, data abstraction can lead to a large code size increase, resulting in higher power consumption of the instruction memory.
Future Work University of Macedonia Currently building an accurate energy profiler (considering cache layers, pipeline stalls) Compare large programs implemented following the object oriented and the procedural programming paradigm Perform the comparisons for other compilers Identify energy-consuming programming structures and automatically convert them to energy efficient ones