© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program design and analysis zProgram-level performance analysis. zOptimizing for: yExecution.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Processes and operating systems zScheduling policies: yRMS; yEDF. zScheduling modeling.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program design and analysis zCompilation flow. zBasic statement translation. zBasic optimizations.
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CPUs CPU performance CPU power consumption. 1.
Give qualifications of instructors: DAP
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Accelerators zExample: video accelerator.
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Instruction sets Computer architecture taxonomy. Assembly language. 1.
Why Concurrency? Allows multiple applications to run at the same time  Analogy: juggling.
Introduction to Operating Systems CS-2301 B-term Introduction to Operating Systems CS-2301, System Programming for Non-majors (Slides include materials.
CS 151 Digital Systems Design Lecture 37 Register Transfer Level
Interrupts What is an interrupt? What does an interrupt do to the “flow of control” Interrupts used to overlap computation & I/O – Examples would be console.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. CPUs zCaches. zMemory management.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Embedded and Real Time Systems Lecture #4 David Andrews
Program Testing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Processes and operating systems zInterprocess communication. zOperating system performance.
© 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Introduction Object-oriented design. Unified Modeling Language (UML). 1.
Embedded Computer Systems Chapter1: Embedded Computing Eng. Husam Y. Alzaq Islamic University of Gaza.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
CMSC 345 Fall 2000 Unit Testing. The testing process.
Software Component Technology and Component Tracing CSC532 Presentation Developed & Presented by Feifei Xu.
© 2000 Morgan Kaufman Overheads for Computers as Components Program design and analysis zProgram validation and testing.
© 2000 Morgan Kaufman Overheads for Computers as Components CPUs: Memory System Mechanism zMicroprocessor clock rates are increasing zMemories are falling.
Real Time Operating Systems Lecture 10 David Andrews
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Program design and analysis Software components. Representations of programs. Assembly.
Programming in Java Unit 4. Learning outcome:  LO2: Be able to design Java solutions  LO3: Be able to implement Java solutions Assessment criteria:
Program design and analysis zProgram-level performance analysis. zOptimizing for: yExecution time. yEnergy/power. yProgram size. zProgram validation and.
Analysis of Algorithms CSCI Previous Evaluations of Programs Correctness – does the algorithm do what it is supposed to do? Generality – does it.
Computer Science 516 Week 3 Lecture Notes. Computer Architecture - Common Points This lecture will cover some common things which characterize computer.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.
© 2000 Morgan Kaufman Overheads for Computers as Components Program design and analysis zOptimizing for execution time. zOptimizing for energy/power. zOptimizing.
© 2005 ECNU SEIPrinciples of Embedded Computing System Design1 Program design and analysis zOptimizing for execution time. zOptimizing for energy/power.
The Central Processing Unit (CPU) and the Machine Cycle.
CPSC 372 John D. McGregor Module 3 Session 1 Architecture.
New Mexico Computer Science For All Algorithm Analysis Maureen Psaila-Dombrowski.
FOUNDATION IN INFORMATION TECHNOLOGY (CS-T-101) TOPIC : INFORMATION SYSTEM – SOFTWARE.
Baum, Boyett, & Garrison Comparing Intel C++ and Microsoft Visual C++ Compilers Michael Baum David Boyett Holly Garrison.
Programming Language Concepts (CIS 635) Elsa L Gunter 4303 GITC NJIT,
These slides are designed to accompany Software Engineering: A Practitioner’s Approach, 7/e (McGraw-Hill 2009). Slides copyright 2009 by Roger Pressman.1.
Parallel Computing Presented by Justin Reschke
Sebastián Álvarez Henao.. It refers to all physical parts of a computer system; its components are: electrical, electronic, electromechanical and mechanical.
12-Jun-16 Event loops. 2 Programming in prehistoric times Earliest programs were all “batch” processing There was no interaction with the user Input Output.
INTRODUCTION TO ROBOTICS Part 5: Programming
Lecture 1: Introduction to JAVA
Event loops 16-Jun-18.
JavaScript Selection Statement Creating Array
Concurrency: Threads, Address Spaces, and Processes
Event loops.
Bus-Based Computer Systems
Overheads for Computers as Components 2nd ed.
Computer Architecture and Assembly Language CS 233 Lecture 1
Event loops 17-Jan-19.
Event loops 17-Jan-19.
Processes and operating systems
Processes and operating systems
Overheads for Computers as Components, 2nd ed.
Overheads for Computers as Components 2nd ed.
Processes and operating systems
Event loops 8-Apr-19.
Understand the interaction between computer hardware and software
Revision of C++.
Software Testing “If you can’t test it, you can’t design it”
Definitions: Evidence-Based Claims- 1.) the ability to take detailed
Overheads for Computers as Components, 2nd ed.
Event loops.
Event loops.
Event loops 19-Aug-19.
Presentation transcript:

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program design and analysis zProgram-level performance analysis. zOptimizing for: yExecution time. yEnergy/power. yProgram size. zProgram validation and testing.

Program-level performance analysis zNeed to understand performance in detail: yReal-time behavior, not just typical. yOn complex platforms.  Program performance  CPU performance: yPipeline, cache are windows into program. yWe must analyze the entire program. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Complexities of program performance zVaries with input data: yDifferent-length paths. zCache effects. zInstruction-level performance variations: yPipeline interlocks. yFetch times.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. How to measure program performance zSimulate execution of the CPU. yMakes CPU state visible. zMeasure on real CPU using timer. yRequires modifying the program to control the timer. zMeasure on real CPU using logic analyzer. yRequires events visible on the pins.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program performance metrics zAverage-case execution time. yTypically used in application programming. zWorst-case execution time. yA component in deadline satisfaction. zBest-case execution time. yTask-level interactions can cause best-case program behavior to result in worst-case system behavior.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Elements of program performance zBasic program execution time formula: yexecution time = program path + instruction timing zSolving these problems independently helps simplify analysis. yEasier to separate on simpler CPUs. zAccurate performance analysis requires: yAssembly/binary code. yExecution platform.

Data-dependent paths in an if statement if (a || b) { /* T1 */ if ( c ) /* T2 */ x = r*s+t; /* A1 */ else y=r+s; /* A2 */ z = r+s+u; /* A3 */ } else { if ( c ) /* T3 */ y = r-t; /* A4 */ } abcpath 000 T1=F, T3=F: no assignments 001 T1=F, T3=T: A4 010 T1=T, T2=F: A2, A3 011 T1=T, T2=T: A1, A3 100 T1=T, T2=F: A2, A3 101 T1=T, T2=T: A1, A3 110 T1=T, T2=F: A2, A3 111 T1=T, T2=T: A1, A3 © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Paths in a loop for (i=0, f=0; i<N; i++) f = f + c[i] * x[i]; © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. i=0 f=0 i=N f = f + c[i] * x[i] i = i + 1 N Y

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Instruction timing zNot all instructions take the same amount of time. yMulti-cycle instructions. yFetches. zExecution times of instructions are not independent. yPipeline interlocks. yCache effects. zExecution times may vary with operand value. yFloating-point operations. ySome multi-cycle integer operations.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Mesaurement-driven performance analysis zNot so easy as it sounds: yMust actually have access to the CPU. yMust know data inputs that give worst/best case performance. yMust make state visible. zStill an important method for performance analysis.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Feeding the program zNeed to know the desired input values. zMay need to write software scaffolding to generate the input values. zSoftware scaffolding may also need to examine outputs to generate feedback- driven inputs.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Trace-driven measurement zTrace-driven: yInstrument the program. ySave information about the path. zRequires modifying the program. zTrace files are large. zWidely used for cache analysis.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Physical measurement zIn-circuit emulator allows tracing. yAffects execution timing. zLogic analyzer can measure behavior at pins. yAddress bus can be analyzed to look for events. yCode can be modified to make events visible. zParticularly important for real-world input streams.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. CPU simulation zSome simulators are less accurate. zCycle-accurate simulator provides accurate clock-cycle timing. ySimulator models CPU internals. ySimulator writer must know how CPU works.

SimpleScalar FIR filter simulation int x[N] = {8, 17, … }; int c[N] = {1, 2, … }; main() { int i, k, f; for (k=0; k<COUNT; k++) for (i=0; i<N; i++) f += c[i]*x[i]; } Ntotal sim cycles sim cycles per filter execution , , © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Performance optimization motivation zEmbedded systems must often meet deadlines. yFaster may not be fast enough. zNeed to be able to analyze execution time. yWorst-case, not typical. zNeed techniques for reliably improving execution time.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Programs and performance analysis zBest results come from analyzing optimized instructions, not high-level language code: ynon-obvious translations of HLL statements into instructions; ycode may move; ycache effects are hard to predict.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Loop optimizations zLoops are good targets for optimization. zBasic loop optimizations: ycode motion; yinduction-variable elimination; ystrength reduction (x*2 -> x<<1).

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Code motion for (i=0; i<N*M; i++) z[i] = a[i] + b[i]; i<N*M i=0; z[i] = a[i] + b[i]; i = i+1; N Y i<X i=0; X = N*M

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Induction variable elimination zInduction variable: loop index. zConsider loop: for (i=0; i<N; i++) for (j=0; j<M; j++) z[i,j] = b[i,j]; zRather than recompute i*M+j for each array in each iteration, share induction variable between arrays, increment at end of loop body.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Cache analysis zLoop nest: set of loops, one inside other. zPerfect loop nest: no conditionals in nest. zBecause loops use large quantities of data, cache conflicts are common.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Array conflicts in cache a[0,0] b[0,0] main memory cache

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Array conflicts, cont’d. zArray elements conflict because they are in the same line, even if not mapped to same location. zSolutions: ymove one array; ypad array.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Performance optimization hints zUse registers efficiently. zUse page mode memory accesses. zAnalyze cache behavior: yinstruction conflicts can be handled by rewriting code, rescheudling; yconflicting scalar data can easily be moved; yconflicting array data can be moved, padded.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Energy/power optimization zEnergy: ability to do work. yMost important in battery-powered systems. zPower: energy per unit time. yImportant even in wall-plug systems---power becomes heat.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Measuring energy consumption zExecute a small loop, measure current: while (TRUE) a(); I

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Sources of energy consumption zRelative energy per operation (Catthoor et al): ymemory transfer: 33 yexternal I/O: 10 ySRAM write: 9 ySRAM read: 4.4 ymultiply: 3.6 yadd: 1

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Cache behavior is important zEnergy consumption has a sweet spot as cache size changes: ycache too small: program thrashes, burning energy on external memory accesses; ycache too large: cache itself burns too much power.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Cache sweet spot [Li98] © 1998 IEEE

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Optimizing for energy zFirst-order optimization: yhigh performance = low energy. zNot many instructions trade speed for energy.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Optimizing for energy, cont’d. zUse registers efficiently. zIdentify and eliminate cache conflicts. zModerate loop unrolling eliminates some loop overhead instructions. zEliminate pipeline stalls. zInlining procedures may help: reduces linkage, but may increase cache thrashing.

Efficient loops zGeneral rules: yDon’t use function calls. yKeep loop body small to enable local repeat (only forward branches). yUse unsigned integer for loop counter. yUse <= to test loop counter. yMake use of compiler---global optimization, software pipelining. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Single-instruction repeat loop example STM #4000h,AR2 ; load pointer to source STM #100h,AR3 ; load pointer to destination RPT #(1024-1) MVDD *AR2+,*AR3+ ; move © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Optimizing for program size zGoal: yreduce hardware cost of memory; yreduce power consumption of memory units. zTwo opportunities: ydata; yinstructions.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Data size minimization zReuse constants, variables, data buffers in different parts of code. yRequires careful verification of correctness. zGenerate data using instructions.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Reducing code size zAvoid function inlining. zChoose CPU with compact instructions. zUse specialized instructions where possible.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program validation and testing zBut does it work? zConcentrate here on functional verification. zMajor testing strategies: yBlack box doesn’t look at the source code. yClear box (white box) does look at the source code.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Clear-box testing zExamine the source code to determine whether it works: yCan you actually exercise a path? yDo you get the value you expect along a path? zTesting procedure: yControllability: rovide program with inputs. yExecute. yObservability: examine outputs.

Controlling and observing programs firout = 0.0; for (j=curr, k=0; j<N; j++, k++) firout += buff[j] * c[k]; for (j=0; j<curr; j++, k++) firout += buff[j] * c[k]; if (firout > 100.0) firout = 100.0; if (firout < ) firout = ; z Controllability: yMust fill circular buffer with desired N values. yOther code governs how we access the buffer. z Observability: yWant to examine firout before limit testing. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Execution paths and testing zPaths are important in functional testing as well as performance analysis. zIn general, an exponential number of paths through the program. yShow that some paths dominate others. yHeuristically limit paths.

Choosing the paths to test zPossible criteria: yExecute every statement at least once. yExecute every branch direction at least once. zEquivalent for structured programs. zNot true for gotos. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. not covered

Basis paths z Approximate CDFG with undirected graph. z Undirected graphs have basis paths: yAll paths are linear combinations of basis paths. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Cyclomatic complexity © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. z Cyclomatic complexity is a bound on the size of basis sets: ye = # edges yn = # nodes yp = number of graph components yM = e – n + 2p.

Branch testing zHeuristic for testing branches. yExercise true and false branches of conditional. yExercise every simple condition at least once. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Branch testing example zCorrect: yif (a || (b >= c)) { printf(“OK\n”); } zIncorrect: yif (a && (b >= c)) { printf(“OK\n”); } z Test: ya = F y(b >=c) = T z Example: yCorrect: [0 || (3 >= 2)] = T yIncorrect: [0 && (3 >= 2)] = F © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Another branch testing example zCorrect: yif ((x == good_pointer) && x->field1 == 3)) { printf(“got the value\n”); } zIncorrect: zif ((x = good_pointer) && x->field1 == 3)) { printf(“got the value\n”); } z Incorrect code changes pointer. yAssignment returns new LHS in C. z Test that catches error: y(x != good_pointer) && x->field1 = 3) © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Domain testing zHeuristic test for linear inequalities. zTest on each side + boundary of inequality. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Def-use pairs zVariable def-use: yDef when value is assigned (defined). yUse when used on right-hand side. zExercise each def-use pair. yRequires testing correct path. © 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Loop testing zLoops need specialized tests to be tested efficiently. zHeuristic testing strategy: ySkip loop entirely. yOne loop iteration. yTwo loop iterations. y# iterations much below max. yn-1, n, n+1 iterations where n is max.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Black-box testing zComplements clear-box testing. yMay require a large number of tests. zTests software in different ways.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Black-box test vectors zRandom tests. yMay weight distribution based on software specification. zRegression tests. yTests of previous versions, bugs, etc. yMay be clear-box tests of previous versions.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. How much testing is enough? zExhaustive testing is impractical. zOne important measure of test quality---bugs escaping into field. zGood organizations can test software to give very low field bug report rates. zError injection measures test quality: yAdd known bugs. yRun your tests. yDetermine % injected bugs that are caught.