CSE 340 Computer Architecture Summer 2016 Understanding Performance.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

CS1104: Computer Organisation School of Computing National University of Singapore.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Assessing and Understanding Performance Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
810:142 Lecture 2: Performance Fall 2006 Chapter 4: Performance Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design,
ECE-C355 Computer Structures Winter 2008 Chapter 04: Understanding Performance Slides are adapted from Professor Mary Jane Irwin (
CSE431 L04 Understanding Performance.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 04: Understanding Performance Mary Jane Irwin (
Read Section 1.4, Section 1.7 (pp )
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Evaluating Performance
Computer Organization and Architecture 18 th March, 2008.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture 3: Computer Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Chapter 1 Section 1.4 Dr. Iyad F. Jafar Evaluating Performance.
1 Computer Performance: Metrics, Measurement, & Evaluation.
CSE 340 Computer Architecture Summer 2014 Understanding Performance
CS3350B Computer Architecture Winter 2015 Performance Metrics I Marc Moreno Maza
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 7-1 Measuring.
CSE431 L04 Understanding Performance.1Irwin, PSU, 2005 CSE 431 Computer Architecture Understanding Performance Mary Jane Irwin ( )
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Morgan Kaufmann Publishers
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
CS35101 – Computer Architecture Week 9: Understanding Performance Paul Durand ( ) [Adapted from M Irwin (
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
Measuring Performance II and Logic Design
Computer Organization
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Assessing and Understanding Performance
Defining Performance Which airplane has the best performance?
Computer Architecture & Operations I
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Computer Architecture
Morgan Kaufmann Publishers Computer Performance
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Parameters that affect it How to improve it and by how much
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

CSE 340 Computer Architecture Summer 2016 Understanding Performance

Performance Metrics Purchasing perspective – given a collection of machines, which has the best performance ? least cost ? best cost/performance? Design perspective – faced with design options, which has the best performance improvement ? least cost ? best cost/performance? Both require – basis for comparison – metric for evaluation Our goal is to understand what factors in the architecture contribute to overall system performance and the relative importance (and cost) of these factors 2

Defining (Speed) Performance Normally interested in reducing – Response time (aka execution time) – the time between the start and the completion of a task Important to individual users – Thus, to maximize performance, need to minimize execution time l Throughput – the total amount of work done in a given time -Important to data center managers l Decreasing response time almost always improves throughput performance X = 1 / execution_time X If X is n times faster than Y, then performance X execution_time Y = = n performance Y execution_time X 3

Performance Factors Want to distinguish elapsed time and the time spent on our task CPU execution time (CPU time) – time the CPU spends working on a task – Does not include time waiting for I/O or running other programs CPU execution time # CPU clock cycles for a program for a program = x clock cycle time CPU execution time # CPU clock cycles for a program for a program clock rate =  Can improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a program or 4

Review: Machine Clock Rate Clock rate (MHz, GHz) is inverse of clock cycle time (clock period) CC = 1 / CR one clock period 10 nsec clock cycle => 100 MHz clock rate 5 nsec clock cycle => 200 MHz clock rate 2 nsec clock cycle => 500 MHz clock rate 1 nsec clock cycle => 1 GHz clock rate 500 psec clock cycle => 2 GHz clock rate 250 psec clock cycle => 4 GHz clock rate 200 psec clock cycle => 5 GHz clock rate 5

Clock Cycles per Instruction Not all instructions take the same amount of time to execute – One way to think about execution time is that it equals the number of instructions executed multiplied by the average time per instruction  Clock cycles per instruction (CPI) – the average number of clock cycles each instruction takes to execute l A way to compare two different implementations of the same ISA # CPU clock cycles # Instructions Average clock cycles for a program for a program per instruction = x CPI for this instruction class ABC CPI123

Effective CPI Computing the overall effective CPI is done by looking at the different types of instructions and their individual cycle counts and averaging Overall effective CPI =  (CPI i x IC i ) i = 1 n l Where IC i is the count (percentage) of the number of instructions of class i executed l CPI i is the (average) number of clock cycles per instruction for that instruction class l n is the number of instruction classes  The overall effective CPI varies by instruction mix – a measure of the dynamic frequency of instructions across one or many programs 7

THE Performance Equation Our basic performance equation is then CPU time = Instruction_count x CPI x clock_cycle Instruction_count x CPI clock_rate CPU time = or  These equations separate the three key factors that affect performance l Can measure the CPU execution time by running the program l The clock rate is usually given l Can measure overall instruction count by using profilers/ simulators without knowing all of the implementation details l CPI varies by instruction type and ISA implementation for which we must know the implementation details 8

Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle Instruction_ count CPIclock_cycle Algorithm Programming language Compiler ISA Processor organization Technology

Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle Instruction_ count CPIclock_cycle Algorithm Programming language Compiler ISA Processor organization Technology X XX XX XX X X X X X

Comparing and Summarizing Performance Guiding principle in reporting performance measurements is reproducibility – list everything another experimenter would need to duplicate the experiment (version of the operating system, compiler settings, input set used, specific computer configuration (clock rate, cache sizes and speed, memory size and speed, etc.))  How do we summarize the performance for benchmark set with a single number? l The average of execution times that is directly proportional to total execution time is the arithmetic mean (AM) AM = 1/n  Time i i = 1 n l Where Time i is the execution time for the i th program of a total of n programs in the workload l A smaller mean indicates a smaller average execution time and thus improved performance 11

Other Performance Metrics Power consumption – especially in the embedded market where battery life is important (and passive cooling) – For power-limited applications, the most important metric is energy efficiency

Aspects of CPU Execution Time CPU Time = Instruction count x CPI x Clock cycle Instruction Count I ClockCycle C CPI Depends on: CPU Organization Technology (VLSI) Depends on: Program Used Compiler ISA CPU Organization Depends on: Program Used Compiler ISA (Average CPI) T = I x CPI x C I =Dynamic instruction count executed

CPU Execution Time: Example A Program is running on a specific machine (CPU) with the following parameters: – Total executed instruction count: 10,000,000 instructions Average CPI for the program: 2.5 cycles/instruction. – CPU clock rate: 200 MHz. (clock cycle = 5x10 -9 seconds) 5 nsec clock cycle => 200 MHz clock rate What is the execution time for this program: CPU time = Instruction count x CPI x Clock cycle = 10,000,000 x 2.5 x 1 / clock rate = 10,000,000 x 2.5 x 5x10 -9 =.125 seconds T = I x CPI x C

Performance Comparison: Example From the previous example: A Program is running on a specific machine with the following parameters: – Total executed instruction count, I: 10,000,000 instructions – Average CPI for the program: 2.5 cycles/instruction. – CPU clock rate: 200 MHz. Using the same program with these changes: – A new compiler used: New instruction count 9,500,000 New CPI: 3.0 – Faster CPU implementation: New clock rate = 300 MHZ What is the speedup with the changes? Speedup = (10,000,000 x 2.5 x 5x10 -9 ) / (9,500,000 x 3 x 3.33x10 -9 ) =.125 /.095 = 1.32 or 32 % faster after changes. Speedup= Old Execution Time = I old x CPI old x Clock cycle old New Execution Time I new x CPI new x Clock Cycle new Speedup= Old Execution Time = I old x CPI old x Clock cycle old New Execution Time I new x CPI new x Clock Cycle new Clock Cycle = 1/ Clock Rate

Instruction Types & CPI: An Example An instruction set has three instruction classes: Two code sequences have the following instruction counts: CPU cycles for sequence 1 = 2 x x x 3 = 10 cycles CPI for sequence 1 = clock cycles / instruction count = 10 /5 = 2 CPU cycles for sequence 2 = 4 x x x 3 = 9 cycles CPI for sequence 2 = 9 / 6 = 1.5 Instruction class CPI A 1 B 2 C 3 Instruction counts for instruction class Code Sequence A B C CPI = CPU Cycles / I For a specific CPU design

Summary: Evaluating ISAs Design-time metrics: – Can it be implemented, in how long, at what cost? – Can it be programmed? Ease of compilation? Static Metrics: – How many bytes does the program occupy in memory? Dynamic Metrics: – How many instructions are executed? How many bytes does the processor fetch to execute the program? – How many clocks are required per instruction? – How "lean" a clock is practical? Best Metric: Time to execute the program! depends on the instructions set, the processor organization, and compilation techniques. 17

Next Lecture and Reminders Next lecture – MIPS non-pipelined datapath/control path review Reading assignment – PH, Chapter 5 18