Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.

Slides:



Advertisements
Similar presentations
Performance Evaluation of Architectures Vittorio Zaccaria.
Advertisements

TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
ECE 232 L4 perform.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 4 Performance,
Chapter 4 Assessing and Understanding Performance
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.
1 ECE3055 Computer Architecture and Operating Systems Lecture 2 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Lecture 2: Computer Performance
Memory/Storage Architecture Lab Computer Architecture Performance.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
Performance Chapter 4 P&H. Introduction How does one measure report and summarise performance? Complexity of modern systems make it very more difficult.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Performance.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CSE2021: Computer Organization Instructor: Dr. Amir Asif Department of Computer Science York University Handout # 2: Measuring Performance Topics: 1. Performance:
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Measuring Performance and Benchmarks Instructor: Dr. Mike Turi Department of Computer Science and Computer Engineering Pacific Lutheran University Lecture.
CpE 442 Introduction to Computer Architecture The Role of Performance
Computer Organization
Lecture 2: Performance Evaluation
CS161 – Design and Architecture of Computer Systems
September 2 Performance Read 3.1 through 3.4 for Tuesday
Performance Performance The CPU Performance Equation:
How do we evaluate computer architectures?
Prof. Hsien-Hsin Sean Lee
Computer Performance He said, to speed things up we need to squeeze the clock.
Performance of computer systems
Performance of computer systems
Computer Performance Read Chapter 4
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
Presentation transcript:

Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I

2 Evaluation Tools Benchmarks, traces, & mixes macrobenchmarks & suites application execution time microbenchmarks measure one aspect of performance traces replay recorded accesses »cache, branch, register Simulation at many levels ISA, cycle accurate, RTL, gate, circuit trade fidelity for simulation rate Area and delay estimation Analysis instructions, throughput, Amdahl’s law e.g., queuing theory MOVE39% BR20% LOAD20% STORE10% ALU11% LD 5EA3 ST 31FF …. LD 1EA2 ….

3 Metrics of Evaluation Level of design  performance metric Examples Applications perspective Time to run task (Response Time) Tasks run per second (Throughput) Systems perspective Millions of instructions per second (MIPS) Millions of FP operations per second (MFLOPS) Bus/network bandwidth: megabytes per second Function Units: cycles per instruction (CPI) Fundamental elements (transistors, wires, pins): clock rate

4 Basis of Evaluation Actual Target Workload Full Application Benchmarks Small “Kernel” Benchmarks Microbenchmarks ProsCons representative very specific non-portable difficult to run, or measure hard to identify cause portable widely used improvements useful in reality easy to run, early in design cycle identify peak capability and potential bottlenecks less representative easy to “fool” “peak” may be a long way from application performance Slide courtesy of D. Patterson

5 Some Warnings about Benchmarks Benchmarks measure the whole system application compiler operating system architecture implementation Popular benchmarks typically reflect yesterday’s programs what about the programs people are running today? need to design for tomorrow’s problems Benchmark timings are sensitive alignment in cache location of data on disk values of data Danger of inbreeding or positive feedback if you make an operation fast (slow) it will be used more (less) often therefore you make it faster (slower) »and so on, and so on… the optimized NOP

6 Know what you are measuring! Compare apples to apples Example Wall clock execution time: User CPU time System CPU time I/O wait time Idle time (multitasking, I/O) csh> time latex lecture2.tex csh> 0.68u 0.05s 0: % % CPU time elapsed system user

7 Two notions of “performance” ° Time to do the task (Execution Time) – execution time, response time, latency ° Tasks per day, hour, week, sec, ns... (Performance) – throughput, bandwidth Response time and throughput often are in opposition Plane Boeing 747 Concorde Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers Throughput (pmph) 286, ,200 Which has higher performance? Slide courtesy of D. Patterson

8 Brief History of Benchmarking Early days (1960s) Single instruction execution time Average instruction time [Gibson 1970] Pure MIPS (1/AIT) Simple programs(early 70s) Synthetic benchmarks (Whetstone, etc.) Kernels (Livermore Loops) Relative Performance (late 70s) VAX 11/780  1-MIPS but was it? MFLOPs “Real” Applications (late 80s- now) SPEC Desktop Scientific Java Media Parallel etc. TPC Transaction Processing Graphics 3D-Mark Real games (Assassin’s Creed, Call of Duty, Flight Simulator, etc.)

9 SPEC: Standard Performance Evaluation Corporation ( ) System Performance and Evaluation Cooperative HP, DEC, Mips, Sun Portable O/S and high level languages Spec89  Spec92  Spec95  Spec2000  SPEC Categories CPU (most popular) JVM, JBB SpecWeb - web server performance SFS - file server performance Benchmarks change with the times and technology Elimination of Matrix 300 Compiler restrictions

10 How to Compromise a Benchmark

11 The compiler reorganized the code! Change the memory system performance Matrix multiply cache blocking You will see this later in “performance programming” Before After

12 Spec2006 Suite 12 Integer benchmarks (C/C++) compression C compiler Perl interpreter Database Chess Bioinformatics 17 FP applications (Fortran/C) Shallow water model 3D graphics Quantum chromodynamics Computer vision Speech recognition Characteristics Computationally intensive Little I/O Relatively small code size Variable data set sizes

13 Improving Performance: Fundamentals Suppose we have a machine with two instructions Instruction A executes in 100 cycles Instruction B executes in 2 cycles We want better performance…. Which instruction do we improve?

14 CPU Performance Equation 3 components to execution time: Factors affecting CPU execution time: Consider all three elements when optimizing Workloads change!

15 Cycles Per Instruction (CPI) Depends on the instruction Average cycles per instruction Example:

16 Amdahl’s Law How much performance could you get if you could speed up some part of your program? Performance improvements depend on: how good is enhancement how often is it used Speedup due to enhancement E (fraction p sped up by factor S):

17 Amdahl’s Law: Example FP instructions improved by 2x But….only 10% of instructions are FP Speedup bounded by

18 Amdahl’s Law: Example 2 Parallelize (vectorize) some portion of your program Make it 100x faster? How much faster does the whole program get?

19 Amdahl’s Law: Summary message Make the Common Case fast Examples: All instructions require instruction fetch, only fraction require data  optimize instruction access first – Data locality (spatial, temporal), small memories faster  storage hierarchy: most frequent accesses to small, local memory

20 Is Speed the Last Word in Performance? Depends on the application! Cost Not just processor, but other components (ie. memory) Power consumption Trade power for performance in many applicationsCapacity Many database applications are I/O bound and disk bandwidth is the precious commodity Throughput (a form of speed) An individual program isn’t faster, but many more programs can be completed per unit time Example: Google search (processes many, many searches simultaneously)