CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.

Slides:



Advertisements
Similar presentations
Performance Evaluation of Architectures Vittorio Zaccaria.
Advertisements

Read Section 1.4, Section 1.7 (pp )
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Computer Organization and Architecture 18 th March, 2008.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Chapter 1 Section 1.4 Dr. Iyad F. Jafar Evaluating Performance.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Lecture 2: Computer Performance
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Performance.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Measuring Performance II and Logic Design
Computer Organization
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Performance Performance The CPU Performance Equation:
Defining Performance Which airplane has the best performance?
Prof. Hsien-Hsin Sean Lee
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
Chapter 1 Computer Abstractions & Technology Performance Evaluation
Computer Performance He said, to speed things up we need to squeeze the clock.
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
Computer Performance Read Chapter 4
Performance.
Computer Organization and Design Chapter 4
Presentation transcript:

CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair

CEN316 - Chapter Chapter 2: Performance Performance: –What is it: measures of performance The CPU performance equation: –Execution time as the measure –What affects execution time –Examples Popular alternative metrics –Why they don ’ t work Benchmarks Amdahl's law

CEN316 - Chapter Performance is Time Time to do the task (execution time) –Execution time, response time, latency Tasks per unit time (sec, minute,...) –Throughput, bandwidth Plane Boeing 747 BAD/Sud Concorde Speed 610 MPH 1350 MPH DC to Paris 6.5 hours 3 hours Passengers Throughput (PMPH) 286, ,200

CEN316 - Chapter Throughput and Response Time Example –What is the effect of the following changes on throughput and response time? »Increasing processor speed? »Increasing the number of processors on the same system (multitask)? »Is there any relation between response time and throughput? »What about queuing?

CEN316 - Chapter Performance as Response Time Performance is most often measured as response time or execution time for some task. “ X is n times faster than Y ” means Example –Execution time of program P »X is 5 sec »Y is 10 sec. –X is 2 times faster than Y.

CEN316 - Chapter What Time to Measure? Elapsed time, wall-clock time: –Actual time from start to completion. –Depends on CPU, system, I/O, etc. –Often used in real benchmarks. –Only suitable choice when I/O is included. CPU time: –Measure/analyze CPU performance only. –May be suitable when machine is timeshared. –Possibly both user and system component. –User CPU time is our focus for first part of course. Elapsed time = CPU time + idle time. –Usually and assuming time is accurately accounted for.

CEN316 - Chapter Metrics of performance Different performance metrics are appropriate at different levels: Compiler Programming Language Application Datapath Control Transistors ISA Function Units (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s Cycles per second (clock rate) Cycles per Instruction Answers per month Operations per second Instruction Set Architecture

CEN316 - Chapter Relating Processor Metrics CPU execution time per program = CPU clock cycles/program  Clock cycle time = CPU clock cycles/program ÷ Clock rate (frequency) CPU clock cycles/program = Instructions/program  Clock cycles Per Instruction Clock cycles Per Instruction (CPI) is an average measurement, it depends on : –ISA, the implementation, and the program measured –CPI = CPU clock cycles/program ÷ Instructions/program –Also, Instructions per clock cycle or IPC = 1 / CPI CPU execution time = Instructions  CPI  Clock cycle

CEN316 - Chapter Aspects of CPU Performance Instead of reporting execution time in seconds, we often use cycles Clock “ ticks ” indicate when to start activities (one abstraction): cycle time = time between ticks = seconds per cycle clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) A 200 Mhz. clock has a cycle time time

CEN316 - Chapter Example Two machines A and B with the same ISA, A has a clock cycle time of 1ns and a CPI=2 for certain program. B has clock cycle time of 2ns and a CPI=1.2 for the same program. Which machine is faster and by how much? CPU time (A) = I x 2 x 1 ns CPU time (B) = I x 1.2 x 2 ns

CEN316 - Chapter Example of Improving performance Given the following Information –program run in 10 seconds on computer A, 4 GH clock –Build a computer that runs the same program in 6 seconds –Assumptions: »Clock rate can be increased substantially »Increasing clock rate increases the clock cycles for this program by 20% What is the target clock rate?

CEN316 - Chapter Organizational Trade-offs Instruction set (and hence Instruction Count), CPI, and Clock cycle time interact in complex ways: Compiler Programming Language Application Datapath Transistors ISA Function Units Instruction Mix Cycle Time CPI

CEN316 - Chapter Cycles Per Instruction (CPI) CPI = average number of clock cycles per instruction –CPI = Clock Cycles / Instruction Count Affected by both : –cost for each instruction type –the frequency of different instructions »called the instruction mix Useful way to compute CPI (for n instruction types):

CEN316 - Chapter Example comparing code segments Given the following Info from HW designer Number of instructions? 5 for sequence 1 and 6 For sequence 2 Which one is faster? CPU clock cycles1 = (2x1)+ (1x2) + (2x3)=10 cycles CPU clock cycles2 = 9 cycles What is the CPI for each sequence? CPI1 = 10/5 =2CPI2 = 9/6=1.5 Instruction class CPI ABCABC Code Sequence Instruction count for instruction class ABC

CEN316 - Chapter Example CPI Computation RISC processor: register-register ISA: Typical Mix CPI = 1.5 Instruction Type Frequency F i Cycles CPI i CPI Contr F i  CPI i Time % This Instr ALU Load Store Branch 50% 20% 10% 20% % 27% 13% 27%

CEN316 - Chapter Using the CPU Performance Equation Example: Consider adding ALU instructions that can have one memory operand to the MIPS ISA to produce MIPSE: –MIPSE = MIPS + ALU instrs with a memory operand. –Initial mix and cycle counts on MIPS: Instr classFreqCyclesMIPS CPI Load30% Store 15% ALU op40% Branch 15% Overall CPI 1.6 –Assume: »CPI of the MIPSE instruction ALU-with-memory-instruction is 2 »Clock cycle 1.25 times the MIPS clock cycle »One half of the load instructions and a corresponding number of ALU instructions are replaced by ALU-with-memory –Which machine is faster?

CEN316 - Chapter Solution Normalize mix to 100 instructions –can be easier to calculate and enhance intuition MIPS execution time = 160 cycles  CC MIPS MIPSE execution time = 145 cycles  (1.25X CC MIPS ) MIPSE takes ((145  1.25) / 160) times as long as MIPS MIPSE is 1.13  performance of MIPS

CEN316 - Chapter Alternative Performance Metrics: MIPS Use something other than time –Often good intention to find a simple metric »bigger is better, general measure, summarizes performance Most common metric: MIPS (Millions of Instructions Per Second): Flaws in using MIPS –Machines with different instruction sets ? –Programs with different instruction mixes ? » dynamic frequency of instructions –Can vary inversely with performance!

CEN316 - Chapter Example of Problems Consider an optimized and unoptimized version of the same program: Here are cycle counts for instructions Memory Instructions ALU Instructions Branch Instruction FP Instruction Total Instructions Unoptimize d Program 100M 30M40M270M Optimized Program 50M 30M40M170M Memory Cycles ALU CyclesBranch Cycles FP CyclesCPI Per Instruction 2135 Unoptimize d Program 200M100M90M200M2.2 Optimized Program 100M50M90M200M2.6

CEN316 - Chapter Example continued Assuming a 200 MHz clock: –MIPS unoptimized = 200/2.2 = 91 –MIPS optimized = 200/2.6 = 77 –Performance unoptimized > Performance optimized But look at Execution time! –Execution time unoptimized = CPI  IC / CR = 2.2  270 / 200 = 3s –Execution time optimized = CPI  IC / CR = 2.6  170 / 200 = 2.2s –Performance optimized > Performance unoptimized MIPS measurement is inverse of reality!

CEN316 - Chapter Another Alternative: MFLOPS MFLOPS (Millions of FLoating Operations per Second): –common metric in scientific/engineering and supercomputer arenas –MFLOPS = Floating point Operations Time X 106 –Machine dependent: what is a floating point op? –often not where time is spent (i.e. not in FP operations) –at best, no better than execution time –at worst, much less informative and more deceptive

CEN316 - Chapter What is benchmarks? Benchmarks – a set of programs that form a “workload” specifically chosen to measure performance SPEC (System Performance Evaluation Cooperative) creates standard sets of benchmarks starting with SPEC89. The latest is SPEC CPU2006 which consists of 12 integer benchmarks (CINT2006) and 17 floating-point benchmarks (CFP2006). There are also benchmark collections for power workloads (SPECpower_ssj2008), for mail workloads (SPECmail2008), for multimedia workloads (mediabench), …

CEN316 - Chapter Comparing and Summarizing Performance  How do we summarize the performance for benchmark set with a single number? l First the execution times are normalized giving the “SPEC ratio” (bigger is faster, i.e., SPEC ratio is the inverse of execution time) l The SPEC ratios are then “averaged” using the geometric mean (GM) Guiding principle in reporting performance measurements is reproducibility – list everything another experimenter would need to duplicate the experiment (version of the operating system, compiler settings, input set used, specific computer configuration (clock rate, cache sizes and speed, memory size and speed, etc.)) GM = n  SPEC ratio i i = 1 n

CEN316 - Chapter Amdahl's Law Handy for evaluating impact of a change not tied to CPU performance equation Insight: No improvement of a feature enhances performance by more than the use of the feature. Suppose that enhancement E accelerates fraction F of a program by a factor S (remainder of the task is unaffected): F S = 1–F1–F

CEN316 - Chapter Example on Amdahl's Law Assume a program runs in 100 seconds on a machine whre multiply operations consumes 80 seconds of this time. How much do we have to improve the speed of multiplication if we want to run the program 5 times faster? Solution: –Execution time after improvement = –20 seconds = (80/n) + 20 seconds –What is the value of n??? Execution time affected by improvement A mount of improvement + Execution time unaffected

CEN316 - Chapter Summary : Performance Time is the measure of computer performance! –Performance equation includes three parts; all three together determine performance Good products created when have: –Good benchmarks –Good ways to summarize performance Will need different performance metrics as well as a different set of applications to benchmark embedded and desktop computers, which are more focused on response time, versus servers, which are more focused on throughput Remember Amdahl ’ s Law: Speedup is limited by unimproved part of program