1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.5 – 1.10)  Power/Energy examples  Performance summaries  Measuring cost and dependability.

Slides:



Advertisements
Similar presentations
1 Lecture 2: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM Video 1: Using AM.
Advertisements

COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
CS1104: Computer Organisation School of Computing National University of Singapore.
Computer Abstractions and Technology
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Quantitative principles of computer design  Measuring cost.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Performance summaries  Quantitative principles of computer.
1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Chapter 4 Assessing and Understanding Performance
Lecture: Pipelining Basics
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  logistics  why computer organization is important  modern trends.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
1 Lecture 2: Measuring Performance/Cost/Power Today’s topics: (Sections 1.6, 1.4, 1.7, 1.8)  Quantitative principles of computer design  Measuring cost.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 Lecture 2: Metrics to Evaluate Systems Topics: Power and technology trends wrap-up, benchmark suites, performance equation, summarizing performance with.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 5810/6810: Hennessy.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Morgan Kaufmann Publishers
Performance Performance
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
1 Lecture: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM  Video 1: Using AM.
1 Lecture 23: Storage Systems Topics: disk access, bus design, evaluation metrics, RAID (Sections )
1 Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
1 Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
1 Lecture 3: Pipelining Basics Today: chapter 1 wrap-up, basic pipelining implementation (Sections C.1 - C.4) Reminders:  Sign up for the class mailing.
CS203 – Advanced Computer Architecture
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
1 Lecture: Benchmarks, Pipelining Intro Topics: Performance equations wrap-up, Intro to pipelining.
Computer Architecture & Operations I
Lecture 2: Performance Today’s topics:
Lecture 2: Performance Evaluation
Lecture 3: MIPS Instruction Set
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Lecture 1: CS/ECE 3810 Introduction
Morgan Kaufmann Publishers
Lecture 2: Performance Today’s topics: Technology wrap-up
Computer Architecture
Performance of computer systems
The University of Adelaide, School of Computer Science
Performance of computer systems
Lecture 3: MIPS Instruction Set
The University of Adelaide, School of Computer Science
Performance of computer systems
The University of Adelaide, School of Computer Science
Presentation transcript:

1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.5 – 1.10)  Power/Energy examples  Performance summaries  Measuring cost and dependability Class mailing list sign up Class notes Assignment 1 will be posted over weekend; due in 12 days

2 Reducing Power and Energy Can gate off transistors that are inactive (reduces leakage) Design for typical case and throttle down when activity exceeds a threshold DFS: Dynamic frequency scaling -- only reduces frequency and dynamic power, but hurts energy DVFS: Dynamic voltage and frequency scaling – can reduce voltage and frequency by (say) 10%; can slow a program by (say) 8%, but reduce dynamic power by 27%, reduce total power by (say) 23%, reduce total energy by 17% (Note: voltage drop  slow transistor  freq drop)

3 DVFS Example

4 Other Technology Trends DRAM density increases by 40-60% per year, latency has reduced by 33% in 10 years (the memory wall!), bandwidth improves twice as fast as latency decreases Disk density improves by 100% every year, latency improvement similar to DRAM Emergence of NVRAM technologies that can provide a bridge between DRAM and hard disk drives

5 Measuring Performance Two primary metrics: wall clock time (response time for a program) and throughput (jobs performed in unit time) To optimize throughput, must ensure that there is minimal waste of resources Performance is measured with benchmark suites: a collection of programs that are likely relevant to the user  SPEC CPU 2006: cpu-oriented programs (for desktops)  SPECweb, TPC: throughput-oriented (for servers)  EEMBC: for embedded processors/workloads

6 Summarizing Performance Consider 25 programs from a benchmark set – how do we capture the behavior of all 25 programs with a single number? P1 P2 P3 Sys-A Sys-B Sys-C  Total (average) execution time  Total (average) weighted execution time or Average of normalized execution times  Geometric mean of normalized execution times

7 AM Example

8 We fixed a reference machine X and ran 4 programs A, B, C, D on it such that each program ran for 1 second The exact same workload (the four programs execute the same number of instructions that they did on machine X) is run on a new machine Y and the execution times for each program are 0.8, 1.1, 0.5, 2 With AM of normalized execution times, we can conclude that Y is 1.1 times slower than X – perhaps, not for all workloads, but definitely for one specific workload (where all programs run on the ref-machine for an equal #cycles) With GM, you may find inconsistencies

9 GM Example Computer-A Computer-B Computer-C P1 1 sec 10 secs 20 secs P secs 100 secs 20 secs Conclusion with GMs: (i) A=B (ii) C is ~1.6 times faster For (i) to be true, P1 must occur 100 times for every occurrence of P2 With the above assumption, (ii) is no longer true Hence, GM can lead to inconsistencies

10 Summarizing Performance GM: does not require a reference machine, but does not predict performance very well  So we multiplied execution times and determined that sys-A is 1.2x faster…but on what workload? AM: does predict performance for a specific workload, but that workload was determined by executing programs on a reference machine  Every year or so, the reference machine will have to be updated

11 Normalized Execution Times Advantage of GM: no reference machine required Disadvantage of GM: does not represent any “real entity” and may not accurately predict performance Disadvantage of AM of normalized: need weights (which may change over time) Advantage: can represent a real workload

12 CPU Performance Equation Clock cycle time = 1 / clock speed CPU time = clock cycle time x cycles per instruction x number of instructions Influencing factors for each:  clock cycle time: technology and pipeline  CPI: architecture and instruction set design  instruction count: instruction set design and compiler CPI (cycles per instruction) or IPC (instructions per cycle) can not be accurately estimated analytically

13 Measuring System CPI Assume that an architectural innovation only affects CPI For 3 programs, base CPIs: 1.2, 1.8, 2.5 CPIs for proposed model: 1.4, 1.9, 2.3 What is the best way to summarize performance with a single number? AM, HM, or GM of CPIs?

14 Example AM of CPI for base case = 1.2 cyc cyc cyc /3 instr instr instr 5.5 cycles is execution time if each program ran for one instruction – therefore, AM of CPI defines a workload where every program runs for an equal #instrs HM of CPI = 1 / AM of IPC ; defines a workload where every program runs for an equal number of cycles GM of CPI: warm fuzzy number, not necessarily representing any workload

15 Speedup Vs. Percentage “Speedup” is a ratio “Improvement”, “Increase”, “Decrease” usually refer to percentage relative to the baseline A program ran in 100 seconds on my old laptop and in 70 seconds on my new laptop  What is the speedup?  What is the percentage increase in performance?  What is the reduction in execution time?

16 Wafers and Dies An entire wafer is produced and chopped into dies that undergo testing and packaging

17 Integrated Circuit Cost Cost of an integrated circuit = (cost of die + cost of packaging and testing) / final test yield Cost of die = cost of wafer / (dies per wafer x die yield) Dies/wafer = wafer area / die area -  wafer diam / die diag Die yield = wafer yield x (1 + (defect rate x die area) /  ) -  Thus, die yield depends on die area and complexity arising from multiple manufacturing steps (  ~ 4.0)

18 Integrated Circuit Cost Examples Bottomline: cost decreases dramatically if the chip area is smaller, if the chip has fewer manufacturing steps (less complex), if the chip is produced in high volume (10% lower cost if volume doubles) A 30 cm diameter wafer cost $5-6K in 2001 Such a wafer yields about 366 good 1 cm 2 dies and 1014 good 0.49 cm 2 dies (note the effect of area and yield) Die sizes: Alpha cm 2, Itanium 3.0 cm 2, embedded processors are between 0.1 – 0.25 cm 2

19 Contribution of IC Costs to Total System Cost SubsystemFraction of total cost Cabinet: sheet metal, plastic, power supply, fans, cables, nuts, bolts, manuals, shipping box 6% Processor22% DRAM (128 MB)5% Video card5% Motherboard5% Processor board subtotal37% Keyboard and mouse3% Monitor19% Hard disk (20 GB)9% DVD drive6% I/O devices subtotal37% Software (OS + Office)20%

20 Defining Fault, Error, and Failure A fault produces a latent error; it becomes effective when activated; it leads to failure when the observed actual behavior deviates from the ideal specified behavior Example I : a programming mistake is a fault; the buggy code is the latent error; when the code runs, it is effective; if the buggy code influences program output/behavior, a failure occurs Example II : an alpha particle strikes DRAM (fault); if it changes the memory bit, it produces a latent error; when the value is read, the error becomes effective; if program output deviates, failure occurs

21 Defining Reliability and Availability A system toggles between  Service accomplishment: service matches specifications  Service interruption: services deviates from specs The toggle is caused by failures and restorations Reliability measures continuous service accomplishment and is usually expressed as mean time to failure (MTTF) Availability measures fraction of time that service matches specifications, expressed as MTTF / (MTTF + MTTR)

22 Title Bullet