Assessing and Understanding Performance

Slides:



Advertisements
Similar presentations
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
Advertisements

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Assessing and Understanding Performance Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
810:142 Lecture 2: Performance Fall 2006 Chapter 4: Performance Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design,
ECE-C355 Computer Structures Winter 2008 Chapter 04: Understanding Performance Slides are adapted from Professor Mary Jane Irwin (
CSE431 L04 Understanding Performance.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 04: Understanding Performance Mary Jane Irwin (
Read Section 1.4, Section 1.7 (pp )
Evaluating Performance
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
CPU Performance Evaluation: Cycles Per Instruction (CPI)
CS61C L3 Performance (1) Staley, Fall 2005 © UCB Titleless TA Aaron Staley inst.eecs./~cs61c-tc inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
EECC550 - Shaaban #1 Lec # 3 Spring Computer Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing.
1 Recap. 2 Measuring Performance  A computer user: response time (execution time).  A computer center manager - throughput - the total amount of work.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
Performance Comparison of Niagara, Xeon, and Itanium2 Daekyeong Moon
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture 3: Computer Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Chapter 1 Section 1.4 Dr. Iyad F. Jafar Evaluating Performance.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Benchmarks Prepared By : Arafat El-madhoun Supervised By:eng. Mohammad temraz.
CSE 340 Computer Architecture Summer 2014 Understanding Performance
CS3350B Computer Architecture Winter 2015 Performance Metrics I Marc Moreno Maza
CSE431 L04 Understanding Performance.1Irwin, PSU, 2005 CSE 431 Computer Architecture Understanding Performance Mary Jane Irwin ( )
CDA 3101 Fall 2013 Introduction to Computer Organization Benchmarks 30 August 2013.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Performance.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Morgan Kaufmann Publishers
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
CS35101 – Computer Architecture Week 9: Understanding Performance Paul Durand ( ) [Adapted from M Irwin (
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
Performance COE 301 / ICS 233 Computer Organization Prof. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Two notions of performance
Computer Organization
CSCI206 - Computer Organization & Programming
Lecture 2: Performance Evaluation
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
How do we evaluate computer architectures?
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Performance COE 301 Computer Organization
CSCI206 - Computer Organization & Programming
Computer Architecture
CMSC 611: Advanced Computer Architecture
Performance ICS 233 Computer Architecture and Assembly Language
CMSC 611: Advanced Computer Architecture
COMS 361 Computer Organization
Parameters that affect it How to improve it and by how much
Performance.
Benchmarks Programs specifically chosen to measure performance
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
CS161 – Design and Architecture of Computer Systems
CS2100 Computer Organisation
Presentation transcript:

Assessing and Understanding Performance Chapter 4 Assessing and Understanding Performance

Chapter 4 Table of Contents CPU Performance and Its Factors Evaluating Performance Real Stuff: Two SPEC Benchmarks and the Performance of Recent Intel Processors

Performance Metrics Purchasing perspective Design perspective Both require basis for comparison metric for evaluation Our goal is to understand what factors in the architecture contribute to overall system performance what are the relative importance (and cost) of these factors

Defining (Speed) Performance Normally interested in reducing response time (aka execution time) – the time between the start and the completion of a task To maximize performance, need to minimize execution time performanceX = 1 / execution_timeX If X is n times faster than Y, then Throughput – the total amount of work done in a given time Decreasing response time almost always improves throughput performanceX = execution_timeY = n performanceY execution_timeX

Performance Factors Distinguish elapsed time from time spent on our task CPU execution time (CPU time) – time the CPU spends working on a task (Does not include time waiting for I/O or running other programs) or Can improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a program CPU execution time for a program = # CPU clock cycles for a program x clock cycle time CPU execution time for a program = # CPU clock cycles for a program clock rate

Review: Machine Clock Rate Clock rate (MHz, GHz) is inverse of clock cycle time (clock period): CC = 1/CR one clock period 10 nsec clock cycle = 100 MHz clock rate 5 nsec clock cycle = 200 MHz clock rate 2 nsec clock cycle = 500 MHz clock rate 1 nsec clock cycle = 1 GHz clock rate 500 psec clock cycle = 2 GHz clock rate 250 psec clock cycle = 4 GHz clock rate 200 psec clock cycle = 5 GHz clock rate

Clock Cycles per Instruction Not all instructions take the same amount of time to execute Clock cycles per instruction (CPI) – the average number of clock cycles each instruction takes to execute A way to compare two different implementations of the same ISA # CPU clock cycles for a program = # Instructions for a program x Average clock cycles per instruction Instruction Class A B C CPI 1 2 3

Effective CPI  Overall effective CPI (CPIi x ICi) = n  i = 1 (CPIi x ICi) Computing overall effective CPI is done by looking at different types of instructions and their individual cycle counts and averaging ICi is the count (percentage) of the number of instructions of class i executed CPIi is the (average) number of clock cycles per instruction for that instruction class n is the number of instruction classes The overall effective CPI varies by instruction mix – a measure of the dynamic frequency of instructions across one or many programs

The Performance Equation CPU time = Instruction_count x CPI clock_cycle CPU time = Instruction_count x CPI clock_rate These equations separate the three key factors that affect performance: Measure CPU execution time by running the program The clock rate is usually given Measure overall instruction count by using profilers or simulators CPI varies by instruction type and ISA implementation for which we must know the implementation details

Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle Instruction_count CPI clock_cycle Algorithm X Programming language Compiler ISA Processor organization Technology

A Simple Example Op Freq CPIi Freq x CPIi a) b) c) ALU 50% 1 . Load 20% 5 Store 10% 3 Branch 2  = a) How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? b) How does this compare with using branch prediction to shave a cycle off the branch time? c) What if two ALU instructions could be executed at once?

Comparing and Summarizing Performance How do we summarize the performance for benchmark set with a single number? The average of execution times that is directly proportional to total execution time is the arithmetic mean (AM) AM = 1/n n  i = 1 Timei Where Timei is the execution time for the ith program of a total of n programs in the workload A smaller mean indicates a smaller average execution time and thus improved performance

Comparing and Summarizing Performance The guiding principle in reporting performance measurements is reproducibility List everything another experimenter would need to duplicate the experiment version of the operating system compiler settings input set used specific computer configuration clock rate cache sizes and speed memory size and speed etc.

SPEC Benchmarks www.spec.org Standard Performance Evaluation Corporation (SPEC) A non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers Benchmarks available for the following applications CPU Graphics/Applications HPC/OMP Java Client/Server Mail Servers Network File System Web Servers

SPEC Benchmarks www.spec.org Integer benchmarks FP benchmarks gzip compression wupwise Quantum chromodynamics vpr FPGA place & route swim Shallow water model gcc GNU C compiler mgrid Multigrid solver in 3D fields mcf Combinatorial optimization applu Parabolic/elliptic pde crafty Chess program mesa 3D graphics library parser Word processing program galgel Computational fluid dynamics eon Computer visualization art Image recognition (NN) perlbmk perl application equake Seismic wave propagation simulation gap Group theory interpreter facerec Facial image recognition vortex Object oriented database ammp Computational chemistry bzip2 lucas Primality testing twolf Circuit place & route fma3d Crash simulation fem sixtrack Nuclear physics accel apsi Pollutant distribution

Example SPEC Ratings

Other Performance Metrics Power consumption – especially in the embedded market where battery life is important (and passive cooling) For power-limited applications, the most important metric is energy efficiency

Summary: Evaluating ISAs Design-time Metrics Can it be implemented, in how long, at what cost? Can it be programmed? What is the “ease” of compilation? Static Metrics How many bytes does the program occupy in memory? Dynamic Metrics How many instructions are executed? How many bytes are fetched to execute the program? How many clocks are required per instruction? How “lean” a clock is practical? Best Metric: Time to execute the program! Depends on the instruction set the processor organization, and compilation techniques.