CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Evaluating Performance
Computer Performance CS350 Term Project-Spring 2001 Elizabeth Cramer Bryan Driskell Yassaman Shayesteh.
Lecture 7: 9/17/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Computer Organization and Architecture 18 th March, 2008.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Chapter 1 Section 1.4 Dr. Iyad F. Jafar Evaluating Performance.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Ch4b- 2 EE/CS/CPE Computer Organization  Seattle Pacific University Performance metrics I’m concerned with how long it takes to run my program.
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 7-1 Measuring.
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Lecture 9: 9/24/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
EGRE 426 Computer Organization and Design Chapter 4.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Additional Examples CSE420/598, Fall 2008.
Measuring Performance II and Logic Design
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Defining Performance Which airplane has the best performance?
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
Performance of computer systems
CMSC 611: Advanced Computer Architecture
Performance of computer systems
CMSC 611: Advanced Computer Architecture
January 25 Did you get mail from Chun-Fa about assignment grades?
Parameters that affect it How to improve it and by how much
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / © 2003 Elsevier Science

Or Calculation of CPU Time 2

CPU Time (Cont.) CPU execution time can be measured by running the program The clock cycle is usually published by the manufacture Measuring the CPI and instruction count is not trivial –Instruction counts can be measured by: software profiling, using an architecture simulator, using hardware counters on some architecture –The CPI depends on many factors including: processor structure, memory system, the mix of instruction types and the implementation of these instructions 3

Where: C i is the count of number of instructions of class i executed CPI i is the average number of cycles per instruction for that instruction class n is the number of different instruction classes CPU Time (Cont.) Designers sometimes uses the following formula: 4

Suppose we have two implementation of the same instruction set architecture. Machine “A” has a clock cycle time of 1 ns and a CPI of 2.0 for some program, and machine “B” has a clock cycle time of 2 ns and a CPI of 1.2 for the same program. Which machine is faster for this program and by how much? Both machines execute the same instructions for the program. Assume the number of instructions is “I”, CPU clock cycles (A) = I  2.0CPU clock cycles (B) = I  1.2 The CPU time required for each machine is as follows: CPU time (A) = CPU clock cycles (A)  Clock cycle time (A) = I  2.0  1 ns = 2  I ns CPU time (B) = CPU clock cycles (B)  Clock cycle time (B) = I  1.2  2 ns = 2.4  I ns Therefore machine A will be faster by the following ratio: Example 5

A compiler designer is trying to decide between two code sequences for a particular machine. The hardware designers have supplied the following facts: For a particular high-level language statement, the compiler writer is considering two code sequences that require the following instruction counts: Which code sequence executes the most instructions? Which will be faster? What is the CPI for each sequence? Answer: Sequence 1:executes = 5 instructions Sequence 2:executes = 6 instructions  Comparing Code Segments 6

Sequence 1:CPU clock cycles = (2  1) + (1  2) + (2  3) = 10 cycles Sequence 2:CPU clock cycles = (4  1) + (1  2) + (1  3) = 9 cycles + Therefore Sequence 2 is faster although it executes more instructions Using the formula: Sequence 1:CPI = 10/5 = 2 Sequence 2:CPI = 9/6 = 1.5 Using the formula: + Since Sequence 2 takes fewer overall clock cycles but has more instructions it must have a lower CPI Comparing Code Segments 7

The Role of Performance Hardware performance is a key to the effectiveness of the entire system Performance has to be measured and compared to evaluate designs To optimize the performance, major affecting factors have to be known For different types of applications –different performance metrics may be appropriate –different aspects of a computer system may be most significant Instructions use and implementation, memory hierarchy and I/O handling are among the factors that affect the performance 8

Where: C i is the count of number of instructions of class i executed CPI i is the average number of cycles per instruction for that instruction class n is the number of different instruction classes Calculation of CPU Time 9

Important Equations (so far) 10

A common theme in Hardware design is to make the common case fast Increasing the clock rate would not affect memory access time Using a floating point processing unit does not speed integer ALU operations Example: Floating point instructions improved to run 2X; but only 34% of actual instructions are floating point Exec-Time new = Exec-Time old x ( /2) = 0.83 x Exec-Time old Speedup overall = Exec-Time old / Exec-Time new = 1/0.83 = The performance enhancement possible with a given improvement is limited by the amount that the improved feature is used Amdahl’s Law 11

Ahmdal’s Law Visually Time old Time new 12

Ahmdal’s Law for Speedup 13

Performance Variations Performance is dependent on workload Task dependent Need a measure of workload Best = run your program –Often cannot Benchmark = “typical” workload –Standardized for comparison 14

Synthetic Benchmarks Synthetic benchmarks are artificial programs that are constructed to match the characteristics of large set of programs Whetstone (scientific programs), Dhrystone (systems programs), LINPACK (linear algebra), …

Synthetic Benchmark Drawbacks 1.They do not reflect the user interest since they are not real applications 2.They do not reflect real program behavior (e.g. memory access pattern) 3.Compiler and hardware can inflate the performance of these programs far beyond what the same optimization can achieve for real-programs

Application Benchmarks Real applications typical of expected workload Applications and mix important 17

The SPEC Benchmarks System Performance Evaluation Cooperative Suite of benchmarks –Created by a set of companies to improve the measurement and reporting of CPU performance SPEC2006 is the latest suite SPECint = 12 integer programs SPECfloat = 17 floating point programs Since SPEC requires running applications on real hardware, the memory system has a significant effect on performance –Reported with results

Guiding principle is reproducibility (report environment & experiments setup) Performance Reports

The SPEC Benchmarks Bigger numeric values of the SPEC ratio indicate faster machine 1997 “historical” reference machine

SPEC95 for Pentium and Pentium Pro The performance measured may be different on other Pentium-based hardware with different memory system and using different compilers –At the same clock rate, the SPECint95 measure shows that Pentium Pro is times faster while the SPECfp95 shows that it is times faster –When the clock rate is increased by a certain factor, the processor performance increases by a lower factor

Wrong summary can present a confusing picture – A is 10 times faster than B for program 1 – B is 10 times faster than A for program 2 Total execution time is a consistent summary measure Relative execution times for the same workload – Assuming that programs 1 and 2 are executing for the same number of times on computers A and B Execution time is the only valid and unimpeachable measure of performance Comparing & Summarizing Performance

App. and arch. specific optimization can dramatically impact performance Effect of Compilation

Weighted arithmetic means summarize performance while tracking exec. time Never use AM for normalizing time relative to a reference machine Where:n is the number of programs executed w i is a weighting factor that indicates the frequency of executing program i with and Performance Summary

SPECbase CINT2000 SPEC CINT2000 per $1000 in price Different results are obtained for other benchmarks, e.g. SPEC CFP2000 With the exception of the Sunblade price-performance metrics were consistent with performance Prices reflects those of July 2001 Price-Performance Metric