CWRU EECS 3221 Benchmarks EECS 322 Computer Architecture Instructor: Francis G. Wolff Case Western Reserve University This presentation.

Slides:



Advertisements
Similar presentations
Computer Organization Lab 1 Soufiane berouel. Formulas to Remember CPU Time = CPU Clock Cycles x Clock Cycle Time CPU Clock Cycles = Instruction Count.
Advertisements

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Computer Organization and Architecture 18 th March, 2008.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Performance See: P&H 1.4.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
1 Introduction Rapidly changing field: –vacuum tube -> transistor -> IC -> VLSI (see section 1.4) –doubling every 1.5 years: memory capacity processor.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
Assessing and Understanding Performance B. Ramamurthy Chapter 4.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 ECE3055 Computer Architecture and Operating Systems Lecture 2 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Lecture 2: Computer Performance
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Morgan Kaufmann Publishers
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Performance Enhancement. Performance Enhancement Calculations: Amdahl's Law The performance enhancement possible due to a given design improvement is.
Lecture2: Performance Metrics Computer Architecture By Dr.Hadi Hassan 1/3/2016Dr. Hadi Hassan Computer Architecture 1.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Measuring Performance II and Logic Design
CS161 – Design and Architecture of Computer Systems
September 2 Performance Read 3.1 through 3.4 for Tuesday
How do we evaluate computer architectures?
Defining Performance Which airplane has the best performance?
Prof. Hsien-Hsin Sean Lee
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Computer Performance He said, to speed things up we need to squeeze the clock.
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
January 25 Did you get mail from Chun-Fa about assignment grades?
Computer Performance Read Chapter 4
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

CWRU EECS 3221 Benchmarks EECS 322 Computer Architecture Instructor: Francis G. Wolff Case Western Reserve University This presentation uses powerpoint animation: please viewshow

CWRU EECS 3222 SPEC 2000 FAQ What is SPEC CPU2000? A non-profit group that includes computer vendors, systems integrators, universities and consultants from around the world. What do CINT2000 and CFP2000 measure? Being compute-intensive benchmarks, they measure performance of the (1) computer's processor, (2) memory architecture and (3) compiler. It is important to remember the contribution of the latter two components -- performance is more than just the processor. What is not measured? The CINT2000 and CFP2000 benchmarks do not stress: I/O (disk drives), networking or graphics. Reference:

CWRU EECS 3223 SPECint2000 (Number of processors = 1) Company System Clock, CPUSPEC Dell Precision Ws GHz P4505 Intel VC GHz P3464 Intel SE440BX MHz P3344 SGI SGI X 400MHz R12k347 Intel SE440BX MHz P3330 SGI Origin MHz R12k298 Dell Precision Ws GHz P KB(I+D) L2 cache 256KB(I+D) 8M(I+D) 256KB(I+D) 4M(I+D) Pitfall: Using MIPS or Clock speed as performance metric

CWRU EECS 3224 Doom benchmark results "The Doom benchmark is more important than SPEC" (paraphrased) John Hennessy in his plenary talk at FCRC '99. "The Doom benchmark is more important than SPEC" (paraphrased) John Hennessy in his plenary talk at FCRC '99. avg. L1Mother fps ProcessorCacheBoard MIPS R kSGI Indigo PentiumIIIE KASUS P3B-F PentiumIIIE K Abit BH6R MIPS R kSGI Indigo PentiumIII KAbit BX6 2, PentiumIII KASUS CU4VX avg. fps The average number of video frames per second Reference: Doom,Quake games: Wow! 250 Mhz MIPS beats the 800 Mhz Pentium.

CWRU EECS 3225 Benchmark wars: Internet Servers Reseller's January 1999 article, “Linux Is The Web Server’s Choice”said "Linux with Apache beats NT 4.0 with IIS, hands down." In March 1999, Microsoft commissioned Mindcraft to carry out a comparison between NT and Linux.

CWRU EECS 3226 Benchmark Wars: Linux/Solaris...found that NT did a lot more disk accesses than Linux, which let Linux score about 50% better than NT. Sun Microsystems SPARC architecture now jumps in! PC Magazine, September 1999

CWRU EECS 3227 Performance To maximize performance, we want to minimize response time or execution time Performance = 1 Execution time Performance X Execution time Y = = n Performance Y Execution time X To compare the relative performance, n, between machine X and Y, we use

CWRU EECS 3228 Measuring Performance Total program clock cycles executed Execution time = Clock frequency rate (MHz) Clock cycle time (us) = 1 Clock frequency rate (Mhz) Total program instructions exec x CPI = Clock frequency rate (MHz) CPI = Average number of clock cycles per instruction

CWRU EECS 3229 CPI Example CPI = (6ns+8ns+7ns+5ns+2ns)/5 = 28/5 = 5.6 ns Given the following instruction class execution times: alu=6ns, loads=8ns, stores=7ns, branches=5ns, jumps=2ns = (0.2*6ns+0.2*8ns+0.2*7ns+0.2*5ns+0.2*2ns) = 5.6 ns Given the following instruction class execution times: alu=60%, loads=20%, stores=10%, branches=5%, jumps=5% alu=6ns, loads=8ns, stores=7ns, branches=5ns, jumps=2ns CPI = (0.6*6ns+0.2*8ns+0.1*7ns+0.05*5ns+0.05*2ns) = 6.25

CWRU EECS Performance example (PH page 64) Instruction classCPI ALU1 Branches2 Load/Stores3 BenchmarkABL Total 1212 = =6 Total CPU cycles 1 = (2xA) + (1xB) + (2xL) = (2x1) + (1x2) + (2x3) = 10 cycles Total CPU cycles 2 = (4x1) + (1x2) + (1x3) = 9 cycles CPI 1 = 10 cycles/5 = 2 average cycles per instruction CPI 2 = 9 cycles/6 = 1.5 average cycles per instruction Benchmark 2 executed more instructions, but was faster.

CWRU EECS MIPS Performance example (PH page 78) Instruction classCPI ALU1 Branches2 Load/Stores3 BenchmarkABL Total Compiler 15x =7x10 9 Compiler =12x10 9 Total CPU cycles 1 = (5xA) + (1xB) + (1xL) = 10x10 9 cycles Total CPU cycles 2 = (10xA)+(1xB)+(1xL) = 15x10 9 cycles CPI 1 = 10x10 9 cycles/ 7x10 9 = 1.43 CPI 2 = 15x10 9 cycles/12x10 9 =1.25 MIPS 2 = 500Mhz/1.25 = 400 MIPS Execution time 1 = 10x10 9 cycles/500Mhz = 20 seconds Execution time 2 = 15x10 9 cycles/500Mhz = 30 seconds MIPS 1 = Clock rate/CPI = 500Mhz/1.43 = 350 MIPS Although MIPS 2 > MIPS 1 but execution time is unexpected!

CWRU EECS Amdahl’s Law (the law of dimishing returns) Execution Time After Improvement = Execution Time Unaffected + (Execution Time Affected / Amount of Improvement) Example: "Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?" How about making it 5 times faster? Principle: Make the common case fast Well, let’s speed up the multiply!

CWRU EECS Amdahl’s Law (the law of dimishing returns) Execution Time needed = 80 seconds/n + ( seconds) Let Execution Time After Improvement be old time / speed up = 100 seconds / 5 times faster = 20 seconds = Equating both sides 20 = 80 seconds/n + ( seconds) 0 = 80 seconds/n Execution Time After Improvement = (Execution Time Affected / Amount of Improvement) + Execution Time Unaffected No amount of multiplier speed up can make a 5 fold increase

CWRU EECS Sources of improvement For a given instruction set architecture, increases in CPU performance can come from three sources 1. Increase the clock rate 2. Improve the hardware organization that lower the CPI 3. Compiler enhancements that lower the instruction count or generate instructions with a lower average CPI In addition to the above, in order to improve CPU efficiency of software benchmarks. Improve the software organization (data structures, …)

CWRU EECS Performance Summary Execution time is the only valid and unimpeachable measure of performance. Any measure that summarizes performance should reflect execution time. Designers must balance high-performance with low-cost. You should not always believe everything you read! Read carefully! (see newspaper articles, e.g., Exercise 2.37)