© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program design and analysis zProgram-level performance analysis. zOptimizing for: yExecution.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program design and analysis zProgram-level performance analysis. zOptimizing for: yExecution time. yEnergy/power. yProgram size. zProgram validation and testing.

Program-level performance analysis zNeed to understand performance in detail: yReal-time behavior, not just typical. yOn complex platforms.  Program performance  CPU performance: yPipeline, cache are windows into program. yWe must analyze the entire program. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Complexities of program performance zVaries with input data: yDifferent-length paths. zCache effects. zInstruction-level performance variations: yPipeline interlocks. yFetch times.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. How to measure program performance zSimulate execution of the CPU. yMakes CPU state visible. zMeasure on real CPU using timer. yRequires modifying the program to control the timer. zMeasure on real CPU using logic analyzer. yRequires events visible on the pins.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program performance metrics zAverage-case execution time. yTypically used in application programming. zWorst-case execution time. yA component in deadline satisfaction. zBest-case execution time. yTask-level interactions can cause best-case program behavior to result in worst-case system behavior.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Elements of program performance zBasic program execution time formula: yexecution time = program path + instruction timing zSolving these problems independently helps simplify analysis. yEasier to separate on simpler CPUs. zAccurate performance analysis requires: yAssembly/binary code. yExecution platform.

Data-dependent paths in an if statement if (a || b) { /* T1 */ if ( c ) /* T2 */ x = r*s+t; /* A1 */ else y=r+s; /* A2 */ z = r+s+u; /* A3 */ } else { if ( c ) /* T3 */ y = r-t; /* A4 */ } abcpath 000 T1=F, T3=F: no assignments 001 T1=F, T3=T: A4 010 T1=T, T2=F: A2, A3 011 T1=T, T2=T: A1, A3 100 T1=T, T2=F: A2, A3 101 T1=T, T2=T: A1, A3 110 T1=T, T2=F: A2, A3 111 T1=T, T2=T: A1, A3 © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Instruction timing zNot all instructions take the same amount of time. yMulti-cycle instructions. yFetches. zExecution times of instructions are not independent. yPipeline interlocks. yCache effects. zExecution times may vary with operand value. yFloating-point operations. ySome multi-cycle integer operations.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Mesaurement-driven performance analysis zNot so easy as it sounds: yMust actually have access to the CPU. yMust know data inputs that give worst/best case performance. yMust make state visible. zStill an important method for performance analysis.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Feeding the program zNeed to know the desired input values. zMay need to write software scaffolding to generate the input values. zSoftware scaffolding may also need to examine outputs to generate feedback- driven inputs.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Trace-driven measurement zTrace-driven: yInstrument the program. ySave information about the path. zRequires modifying the program. zTrace files are large. zWidely used for cache analysis.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Physical measurement zIn-circuit emulator allows tracing. yAffects execution timing. zLogic analyzer can measure behavior at pins. yAddress bus can be analyzed to look for events. yCode can be modified to make events visible. zParticularly important for real-world input streams.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. CPU simulation zSome simulators are less accurate. zCycle-accurate simulator provides accurate clock-cycle timing. ySimulator models CPU internals. ySimulator writer must know how CPU works.

SimpleScalar FIR filter simulation int x[N] = {8, 17, … }; int c[N] = {1, 2, … }; main() { int i, k, f; for (k=0; k<COUNT; k++) for (i=0; i<N; i++) f += c[i]*x[i]; } Ntotal sim cycles sim cycles per filter execution 10025854259 1,000155759156 1,00001451840145 © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Performance optimization motivation zEmbedded systems must often meet deadlines. yFaster may not be fast enough. zNeed to be able to analyze execution time. yWorst-case, not typical. zNeed techniques for reliably improving execution time.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Programs and performance analysis zBest results come from analyzing optimized instructions, not high-level language code: ynon-obvious translations of HLL statements into instructions; ycode may move; ycache effects are hard to predict.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Loop optimizations zLoops are good targets for optimization. zBasic loop optimizations: ycode motion; yinduction-variable elimination; ystrength reduction (x*2 -> x<<1).

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Induction variable elimination zInduction variable: loop index. zConsider loop: for (i=0; i<N; i++) for (j=0; j<M; j++) z[i,j] = b[i,j]; zRather than recompute i*M+j for each array in each iteration, share induction variable between arrays, increment at end of loop body.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Cache analysis zLoop nest: set of loops, one inside other. zPerfect loop nest: no conditionals in nest. zBecause loops use large quantities of data, cache conflicts are common.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Array conflicts, cont’d. zArray elements conflict because they are in the same line, even if not mapped to same location. zSolutions: ymove one array; ypad array.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Performance optimization hints zUse registers efficiently. zUse page mode memory accesses. zAnalyze cache behavior: yinstruction conflicts can be handled by rewriting code, rescheudling; yconflicting scalar data can easily be moved; yconflicting array data can be moved, padded.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Energy/power optimization zEnergy: ability to do work. yMost important in battery-powered systems. zPower: energy per unit time. yImportant even in wall-plug systems---power becomes heat.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Sources of energy consumption zRelative energy per operation (Catthoor et al): ymemory transfer: 33 yexternal I/O: 10 ySRAM write: 9 ySRAM read: 4.4 ymultiply: 3.6 yadd: 1

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Cache behavior is important zEnergy consumption has a sweet spot as cache size changes: ycache too small: program thrashes, burning energy on external memory accesses; ycache too large: cache itself burns too much power.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Optimizing for energy, cont’d. zUse registers efficiently. zIdentify and eliminate cache conflicts. zModerate loop unrolling eliminates some loop overhead instructions. zEliminate pipeline stalls. zInlining procedures may help: reduces linkage, but may increase cache thrashing.

Efficient loops zGeneral rules: yDon’t use function calls. yKeep loop body small to enable local repeat (only forward branches). yUse unsigned integer for loop counter. yUse <= to test loop counter. yMake use of compiler---global optimization, software pipelining. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Single-instruction repeat loop example STM #4000h,AR2 ; load pointer to source STM #100h,AR3 ; load pointer to destination RPT #(1024-1) MVDD *AR2+,*AR3+ ; move © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Optimizing for program size zGoal: yreduce hardware cost of memory; yreduce power consumption of memory units. zTwo opportunities: ydata; yinstructions.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Data size minimization zReuse constants, variables, data buffers in different parts of code. yRequires careful verification of correctness. zGenerate data using instructions.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program validation and testing zBut does it work? zConcentrate here on functional verification. zMajor testing strategies: yBlack box doesn’t look at the source code. yClear box (white box) does look at the source code.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Clear-box testing zExamine the source code to determine whether it works: yCan you actually exercise a path? yDo you get the value you expect along a path? zTesting procedure: yControllability: rovide program with inputs. yExecute. yObservability: examine outputs.

Controlling and observing programs firout = 0.0; for (j=curr, k=0; j<N; j++, k++) firout += buff[j] * c[k]; for (j=0; j<curr; j++, k++) firout += buff[j] * c[k]; if (firout > 100.0) firout = 100.0; if (firout < -100.0) firout = -100.0; z Controllability: yMust fill circular buffer with desired N values. yOther code governs how we access the buffer. z Observability: yWant to examine firout before limit testing. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Execution paths and testing zPaths are important in functional testing as well as performance analysis. zIn general, an exponential number of paths through the program. yShow that some paths dominate others. yHeuristically limit paths.

Choosing the paths to test zPossible criteria: yExecute every statement at least once. yExecute every branch direction at least once. zEquivalent for structured programs. zNot true for gotos. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. not covered

Basis paths z Approximate CDFG with undirected graph. z Undirected graphs have basis paths: yAll paths are linear combinations of basis paths. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Cyclomatic complexity © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. z Cyclomatic complexity is a bound on the size of basis sets: ye = # edges yn = # nodes yp = number of graph components yM = e – n + 2p.

Branch testing zHeuristic for testing branches. yExercise true and false branches of conditional. yExercise every simple condition at least once. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Branch testing example zCorrect: yif (a || (b >= c)) { printf(“OK\n”); } zIncorrect: yif (a && (b >= c)) { printf(“OK\n”); } z Test: ya = F y(b >=c) = T z Example: yCorrect: [0 || (3 >= 2)] = T yIncorrect: [0 && (3 >= 2)] = F © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Another branch testing example zCorrect: yif ((x == good_pointer) && x->field1 == 3)) { printf(“got the value\n”); } zIncorrect: zif ((x = good_pointer) && x->field1 == 3)) { printf(“got the value\n”); } z Incorrect code changes pointer. yAssignment returns new LHS in C. z Test that catches error: y(x != good_pointer) && x->field1 = 3) © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Def-use pairs zVariable def-use: yDef when value is assigned (defined). yUse when used on right-hand side. zExercise each def-use pair. yRequires testing correct path. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Loop testing zLoops need specialized tests to be tested efficiently. zHeuristic testing strategy: ySkip loop entirely. yOne loop iteration. yTwo loop iterations. y# iterations much below max. yn-1, n, n+1 iterations where n is max.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Black-box test vectors zRandom tests. yMay weight distribution based on software specification. zRegression tests. yTests of previous versions, bugs, etc. yMay be clear-box tests of previous versions.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. How much testing is enough? zExhaustive testing is impractical. zOne important measure of test quality---bugs escaping into field. zGood organizations can test software to give very low field bug report rates. zError injection measures test quality: yAdd known bugs. yRun your tests. yDetermine % injected bugs that are caught.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program design and analysis zProgram-level performance analysis. zOptimizing for: yExecution.

Similar presentations

Presentation on theme: "© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program design and analysis zProgram-level performance analysis. zOptimizing for: yExecution."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program design and analysis zProgram-level performance analysis. zOptimizing for: yExecution.

Similar presentations

Presentation on theme: "© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program design and analysis zProgram-level performance analysis. zOptimizing for: yExecution."— Presentation transcript:

Similar presentations

About project

Feedback