CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.

Slides:



Advertisements
Similar presentations
11 Measuring performance Kosarev Nikolay MIPT Feb, 2010.
Advertisements

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
Performance Evaluation of Architectures Vittorio Zaccaria.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
Computer Organization and Architecture 18 th March, 2008.
CIS429.S00: Lec3 - 1 CPU Time Analysis Terminology IC = instruction count = number of instructions in the program CPI = cycles per instruction (varies.
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
Assessing and Understanding Performance B. Ramamurthy Chapter 4.
Chapter 4 Assessing and Understanding Performance
CS430 – Computer Architecture Lecture - Introduction to Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 Measuring Performance Chris Clack B261 Systems Architecture.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Lecture 2: Computer Performance
Memory/Storage Architecture Lab Computer Architecture Performance.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
Performance.
Digital System Architecture 1 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CS252/Patterson Lec 1.1 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS252 lecture.
Cost and Performance.
Performance Enhancement. Performance Enhancement Calculations: Amdahl's Law The performance enhancement possible due to a given design improvement is.
CSE2021: Computer Organization Instructor: Dr. Amir Asif Department of Computer Science York University Handout # 2: Measuring Performance Topics: 1. Performance:
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Lecture 2: Performance Today’s topics:
Lecture 2: Performance Evaluation
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Defining Performance Which airplane has the best performance?
Prof. Hsien-Hsin Sean Lee
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
August 30, 2000 Prof. John Kubiatowicz
A Question to Ponder On [from last lecture]
Performance.
Computer Organization and Design Chapter 4
Presentation transcript:

CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative methods : –Amdahl’s Law and Speedup –CPI - cycles per instruction Benchmarks Metrics for summarizing performance data Pitfalls

CIS429/529 Winter 07 - Performance - 2 Time-based Metrics Elapsed time to run the task –Execution time, response time, latency Rate: tasks completed per day, hour, week, sec, ns –Throughput, bandwidth Plane Boeing 747 Concorde Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers Throughput 286, ,200 In passenger-mile/hour

CIS429/529 Winter 07 - Performance - 3 Performance comparisons "X is n times faster than Y" means ExTime(Y) Performance(X) = = n ExTime(X) Performance(Y)

CIS429/529 Winter 07 - Performance - 4 Execution time metric wall clock time = response time = elapsed time CPU time = user CPU time + system CPU time We will measure CPU performance using user CPU time on an unloaded system Example: Unix time command 90.7u 12.9s 2:39 65%

CIS429/529 Winter 07 - Performance - 5 Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = = ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected

CIS429/529 Winter 07 - Performance - 6 Amdahl’s Law ExTime new = ExTime old x (1 - Fraction enhanced ) + Fraction enhanced Speedup overall = ExTime old ExTime new Speedup enhanced = 1 (1 - Fraction enhanced ) + Fraction enhanced Speedup enhanced

CIS429/529 Winter 07 - Performance - 7 Amdahl’s Law Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedup overall = =1.053 ExTime new = ExTime old x ( /2) = 0.95 x ExTime old Corollary: Make the common case fast!

CIS429/529 Winter 07 - Performance - 8 Amdahl’s Law Corollary: Make the common case fast! All instructions require instruction fetch, only a fraction require data fetch/store. ==> optimize instruction access over data access Access to small memories is faster. ==> organize the storage hierarchy such that most frequent accesses are to the smallest/closest memory unit. Programs exhibit locality (spatial and temporal). ==> implement pre-fetching of nearby code/data

CIS429/529 Winter 07 - Performance - 9 CPU Time Analysis Terminology IC = instruction count = number of instructions in the program CPI = cycles per instruction (varies for different instructions) clock cycle = length of time between clock ticks Note: clock cycle = 1 / clock frequency where frequency is measured in MHz If we assume the CPI is constant for all instructions, we have: CPU time = IC x CPI x clock cycle

CIS429/529 Winter 07 - Performance - 10 More realistic CPU time analysis A given machine has several classes of instructions. Each class of instructions has its own cycle time. This equation includes separate IC and CPI for each instruction class: CPU time = x clock cycle Alternatively, if we know the frequency of occurrence of each instruction type: CPU time = IC x x clock cycle where = avg. CPI

CIS429/529 Winter 07 - Performance - 11 Example: Calculating CPI Typical Mix Base Machine (Reg / Reg) OpFreqCPI i CPI i *F i (% Time) ALU50%1.5(33%) Load20%2.4(27%) Store10%2.2(13%) Branch20%2.4(27%) 1.5 average CPI

CIS429/529 Winter 07 - Performance - 12 Speedup computation using the CPI eqn Speedup = IC(old) x CPI (old) x clock cycle (old) IC(new) x CPI (new) x clock cycle (new) When doing problems, identify which of the three components have changed (old --> new). Need only include those components in the speedup equation since unchanged ones will cancel out.

CIS429/529 Winter 07 - Performance - 13 Programs to evaluate performance Real benchmark programs Kernels Toy benchmarks Synthetic benchmarks

CIS429/529 Winter 07 - Performance - 14 SPEC Benchmark Suite (Standard Performance Evaluation Cooperative ) First Round 1989 –10 programs yielding a single number (“SPECmarks”) Second Round 1992 –SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs). –Compiler Flags unlimited –Normalized to VAX-11/780

CIS429/529 Winter 07 - Performance - 15 SPEC Benchmark Suite - continued Third Round 1995 “benchmarks useful for 3 years” –SPECint95 (8 integer programs) and SPECfp95 (10 floating point) –Single flag setting for all programs: SPECint_base95, SPECfp_base95 Fourth Round 2000 –CINT2000 (12 integer programs) and CFP2000 (14 floating point) –usable across Unix and Windows NT –Normalized to Sun Ultra5_10 workstation with 300-MHz SPARC and 256 MB memory –base and optimized versions Fifth Round 2006 HW problem !!

CIS429/529 Winter 07 - Performance - 16 Additional desktop (PC) benchmarks Winbench scripts that test CPU, video, and disk performance Business Winstone netscape, office suite applications CC Winstone content creation applications such as Photoshop

CIS429/529 Winter 07 - Performance - 17 Specialized Benchmarks Graphics benchmarks SPECviewperf, SPECapc Embedded systems benchmarks EEMBC - automotive, consumer, networking, office automation, telecommunications Server benchmarks SPEC benchmarks for CPU, files system, web server, transaction servers (

CIS429/529 Winter 07 - Performance - 18 How to Summarize Performance Two metrics for summarizing execution time Arithmetic mean (weighted arithmetic mean) tracks execution time:  (T i )/n or  (W i *T i ) Harmonic mean (weighted harmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/  (1/R i ) or n/  (W i /R i )

CIS429/529 Winter 07 - Performance - 19 How to Summarize Performance Normalized execution time for scaling performance (e.g., X times faster than SPARCstation 10) –Arithmetic mean impacted by choice of reference machine Geometric mean for comparison of normalized execution times  (T i )^1/n –Independent of chosen machine –but not good metric for total execution time

CIS429/529 Winter 07 - Performance - 20 Fallacies and Pitfalls Fallacy: MIPS is an accurate measure of comparative performance. –MIPS = = Fallacy: MFLOPS is a consistent and useful measure of performance. –MFLOPS =

CIS429/529 Winter 07 - Performance - 21 More Fallacies and Pitfalls Fallacy: Synthetic benchmarks predict performance for real programs. Fallacy: Benchmarks remain valid indefinitely. Fallacy: Peak performance tracks observed performance. Fallacy: Your performance in CIS 429/529 depends on how much you eat in class.