1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.

Slides:



Advertisements
Similar presentations
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Advertisements

CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Ch1. Fundamentals of Computer Design 3. Principles (5) ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department University of Massachusetts.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Aug 26, 2002.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
1 Roman Japanese Chinese (compute in hex?). 2 COMP 206: Computer Architecture and Implementation Montek Singh Thu, Jan 22, 2009 Lecture 3: Quantitative.
Computer Architecture Lecture 2 Instruction Set Principles.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Chapter 4 Assessing and Understanding Performance
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 3, 2003 Lecture 2.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 Measuring Performance Chris Clack B261 Systems Architecture.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 7-1 Measuring.
Memory/Storage Architecture Lab Computer Architecture Performance.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Performance.
Some Notes on Performance Noah Mendelsohn Tufts University Web: COMP 40: Machine.
1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Intro to Computer Org. Assessing Performance. What Is Performance? What do we mean when we talk about the “performance” of a CPU?
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Performance. Moore's Law Moore's Law Related Curves.
Lecture 2: Performance Today’s topics:
Lecture 2: Performance Evaluation
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
How do we evaluate computer architectures?
Defining Performance Which airplane has the best performance?
Chinese (compute in hex?)
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Defining Performance Section /14/2018 9:52 PM.
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parameters that affect it How to improve it and by how much
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2

2Outline  Quantitative Principles of Computer Design Amdahl’s law (make the common case fast) Amdahl’s law (make the common case fast)  Performance Metrics MIPS, FLOPS, and all that… MIPS, FLOPS, and all that…  Examples

3 Quantitative Principles of Computer Design Execution time Response time Latency Execution time Response time Latency Performance Rate of producing results Throughput Bandwidth Performance Rate of producing results Throughput Bandwidth

4Comparison “Y is n times larger than X” “Y is n% larger than X”

5 “Validity of the single processor approach to achieving large scale computing capabilities”, G. M. Amdahl, AFIPS Conference Proceedings, pp , April 1967 Amdahl’s Law (1967)  Historical context Amdahl was demonstrating “the continued validity of the single processor approach and of the weaknesses of the multiple processor approach” Amdahl was demonstrating “the continued validity of the single processor approach and of the weaknesses of the multiple processor approach” Paper contains no mathematical formulation, just arguments and simulation Paper contains no mathematical formulation, just arguments and simulation  “The nature of this overhead appears to be sequential so that it is unlikely to be amenable to parallel processing techniques.”  “A fairly obvious conclusion which can be drawn at this point is that the effort expended on achieving high parallel performance rates is wasted unless it is accompanied by achievements in sequential processing rates of very nearly the same magnitude.”  Nevertheless, it is of widespread applicability in all kinds of situations

6 Amdahl’s Law Fraction of results generated at this rate Average execution rate (performance) Weighted harmonic mean Note: Not “fraction of time spent working at this rate” Note: Not “fraction of time spent working at this rate” “Bottleneckology: Evaluating Supercomputers”, Jack Worlton, COMPCOM 85, pp

7 Example of Amdahl’s Law 30% of results are generated at the rate of 1 MFLOPS, 20% at 10 MFLOPS, 50% at 100 MFLOPS. What is the average performance? What is the bottleneck? 30% of results are generated at the rate of 1 MFLOPS, 20% at 10 MFLOPS, 50% at 100 MFLOPS. What is the average performance? What is the bottleneck? Bottleneck: the rate that consumes most of the time

8 Amdahl’s Law (HP3 book, pp ) Fraction enhanced Speedup enhanced Speedup overall Speedup enhanced Fraction enhanced

9 Implications of Amdahl’s Law  The performance improvements provided by a feature are limited by how often that feature is used  As stated, Amdahl’s Law is valid only if the system always works with exactly one of the rates If a non-blocking cache is used, or there is overlap between CPU and I/O operations, Amdahl’s Law as given here is not applicable If a non-blocking cache is used, or there is overlap between CPU and I/O operations, Amdahl’s Law as given here is not applicable  Bottleneck is the most promising target for improvements “Make the common case fast” “Make the common case fast” Infrequent events, even if they consume a lot of time, will make little difference to performance Infrequent events, even if they consume a lot of time, will make little difference to performance  Typical use: Change only one parameter of system, and compute effect of this change The same program, with the same input data, should run on the machine in both cases The same program, with the same input data, should run on the machine in both cases

10 “Make The Common Case Fast”  All instructions require an instruction fetch, only a fraction require a data fetch/store Optimize instruction access over data access Optimize instruction access over data access  Programs exhibit locality Spatial Locality Spatial Locality  items with addresses near one another tend to be referenced close together in time Temporal Locality Temporal Locality  recently accessed items are likely to be accessed in the near future  Access to small memories is faster Provide a storage hierarchy such that the most frequent accesses are to the smallest (closest) memories. Provide a storage hierarchy such that the most frequent accesses are to the smallest (closest) memories. Reg's Cache Memory Disk / Tape

11 “Make The Common Case Fast” (2)  What is the common case? The rate at which the system spends most of its time The rate at which the system spends most of its time The “bottleneck” The “bottleneck”  What does this statement mean precisely? Make the common case faster, rather than making some other case faster Make the common case faster, rather than making some other case faster Make the common case faster by a certain amount, rather than making some other case faster by the same amount Make the common case faster by a certain amount, rather than making some other case faster by the same amount  Absolute amount?  Relative amount?  This principle is merely an informal statement of a frequently correct consequence of Amdahl’s Law

12 “Make The Common Case Fast” (3a) A machine produces 20% and 80% of its results at the rates of 1 and 3 MFLOPS, respectively. What is more advantageous: to improve the 1 MFLOPS rate, or to improve the 3 MFLOPS rate? A machine produces 20% and 80% of its results at the rates of 1 and 3 MFLOPS, respectively. What is more advantageous: to improve the 1 MFLOPS rate, or to improve the 3 MFLOPS rate? Generalize problem: Assume rates are x and y MFLOPS At ( x,y ) = (1,3), this indicates that it is better to improve x, the 1 MFLOPS rate, which is not the common case. So, the 3 MFLOPS rate is the common case in this example.

13 “Make The Common Case Fast” (3b) Let’s say that we want to make the same relative change to one or the other rate, rather than the same absolute change. At ( x,y ) = (1,3), this indicates that it is better to improve y, the 3 MFLOPS rate, which is the common case. If there are two different execution rates, making the common case faster by the same relative amount is always more advantageous than the alternative. However, this does not necessarily hold if we make absolute changes of the same magnitude. For three or more rates, further analysis is needed.

14 Basics of Performance

15 Details of CPI

16MIPS  Machines with different instruction sets?  Programs with different instruction mixes? Dynamic frequency of instructions Dynamic frequency of instructions  Uncorrelated with performance Marketing metric Marketing metric  “Meaningless Indicator of Processor Speed”

17MFLOP/s  Popular in supercomputing community  Often not where time is spent  Not all FP operations are equal “Normalized” MFLOP/s “Normalized” MFLOP/s  Can magnify performance differences A better algorithm (e.g., with better data reuse) can run faster even with higher FLOP count A better algorithm (e.g., with better data reuse) can run faster even with higher FLOP count DGEQRF vs. DGEQR2 in LAPACK DGEQRF vs. DGEQR2 in LAPACK

18 Aspects of CPU Performance