Two notions of performance

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Performance Evaluation of Architectures Vittorio Zaccaria.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
1. 2 Performance  Today we’ll discuss issues related to performance: —Latency/Response Time/Execution Time vs. Throughput —How do you make a reasonable.
Computer Organization and Architecture 18 th March, 2008.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Performance See: P&H 1.4.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture 3: Computer Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
Lecture 2: Computer Performance
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 7-1 Measuring.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Performance.
1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.
Computer Performance Computer Engineering Department.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.
Morgan Kaufmann Publishers
Performance Enhancement. Performance Enhancement Calculations: Amdahl's Law The performance enhancement possible due to a given design improvement is.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
Performance. Moore's Law Moore's Law Related Curves.
Computer Organization
Computer Architecture & Operations I
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Assessing and Understanding Performance
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
How do we evaluate computer architectures?
Defining Performance Which airplane has the best performance?
Computer Architecture & Operations I
Prof. Hsien-Hsin Sean Lee
CSCE 212 Chapter 4: Assessing and Understanding Performance
Computer Architecture CSCE 350
CS2100 Computer Organisation
Chapter 1 Computer Abstractions & Technology Performance Evaluation
Computer Performance He said, to speed things up we need to squeeze the clock.
CMSC 611: Advanced Computer Architecture
Arrays versus Pointers
Performance of computer systems
CMSC 611: Advanced Computer Architecture
Parameters that affect it How to improve it and by how much
Computer Performance Read Chapter 4
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

Two notions of performance Which has higher performance? From a passenger’s viewpoint: latency (time to do the task) hours per flight, execution time, response time From an airline’s viewpoint: throughput (tasks per unit time) passengers per hour, bandwidth Latency and throughput are often in opposition Aircraft DC to Paris Passengers 747 6 hours 500 Concorde 3 hours 125

Some Definitions Relative performance: “x is N times faster than y” Performance(y) If we are primarily concerned with latency, Performance(x) = 1 Latency(x) If we are primarily concerned with throughput, Performance(x) = throughput(x) = N

CPU time = Instructions executed  CPI  Clock cycle time CPU performance The obvious metric: how long does it take to run a test program? This depends upon three factors: The number of instructions N in the program Obviously, time increases as N increases The kind of instructions in the program Some instructions take more CPU cycles than others Let c be the average number of cycles per instruction (CPI) The time t per CPU clock cycle (clock-cycle time) CPU time = Instructions executed  CPI  Clock cycle time Seconds = Instructions  Clock cycles Program Clock cycle

The three components of CPU performance Instructions executed: the dynamic instruction count (#instructions actually executed) not the (static) number of lines of code Average Cycles per instruction: function of the machine and program CPI(floating-point operations)  CPI(integer operations) Improved processor may execute same instructions in fewer cycles Single-cycle machine: each instruction takes 1 cycle (CPI = 1) CPI can be  1 due to memory stalls and slow instructions CPI can be  1 on superscalar machines Clock cycle time: 1 cycle = minimum time it takes the CPU to do any work clock cycle time = 1/ clock frequency 500MHz processor has a cycle time of 2ns (nanoseconds) 2GHz (2000MHz) CPU has a cycle time of just 0.5ns higher frequency is usually better

CPU time = Instructions executed  CPI  Clock cycle time Execution time, again CPU time = Instructions executed  CPI  Clock cycle time Make things faster by making any component smaller! Often easy to reduce one component by increasing another Program Compiler ISA Organization Technology Instruction Executed CPI Clock Cycle Time

Example: ISA-compatible processors Let’s compare the performances two x86-based processors An 800MHz AMD Duron, with a CPI of 1.2 for an MP3 compressor A 1GHz Pentium III with a CPI of 1.5 for the same program Compatible processors implement identical instruction sets and will use the same executable files, with the same number of instructions But they implement the ISA differently, which leads to different CPIs CPU timeAMD,P = InstructionsP  CPIAMD,P  Cycle timeAMD = CPU timeP3,P = InstructionsP  CPIP3,P  Cycle timeP3

Another Example: Comparing across ISAs Intel’s Itanium (IA-64) ISA is designed facilitate executing multiple instructions per cycle. If it achieves an average of 3 instructions per cycle, how much faster is it than a Pentium4 (which uses the x86 ISA) with an average CPI of 1? Itanium is three times faster Itanium is one third as fast Not enough information

Make the common case fast Improving CPI Some processor design techniques improve CPI Often they only improve CPI for certain types of instructions where Fi = fraction of instructions of type i First Law of Performance: Make the common case fast Second Law of Performance: Make the fast case common

Example: CPI improvements Base Machine: How much faster would the machine be if: we added a cache to reduce average load time to 3 cycles? we added a branch predictor to reduce branch time by 1 cycle? we could do two ALU operations in parallel? Op Type Freq (Fi) CPIi contribution to CPI ALU 50% 3 Load 20% 6 Store Branch 10% 2

Amdahl’s Law Amdahl’s Law states that optimizations are limited in their effectiveness Example: Suppose we double the speed of floating-point operations If only 10% of the program execution time T involves floating-point code, then the overall performance improves by just 5% Execution time after improvement = Time affected by improvement + Time unaffected by improvement Amount of improvement Execution time after improvement = 0.10 T + 0.90 T 0.95 T 2

Performance Optimization Flowchart “We should forget about small efficiencies, say about 97% of the time.” -- Sir Tony Hoare 11

Collecting data The process is called “instrumenting the code” One option is to do this by hand: record entry and exit times for suspected “hot” blocks of code but this is tedious and error prone Fortunately, there are tools to do this instrumenting for us: Gprof: The GNU profiler (run gcc/g++ with the -pg flag) Compiler keps track of source code  object code correspondence Also links in a profiling signal handler the program requests OS to periodically send it signals signal handler records instruction that was executing (gmon.out) Display results: gprof a.out Shows how much time is being spent in each function Shows the path of function calls to the hot spot 12

Performance Optimization, cont. How do we fix performance problems?