CET520 -- Gannod1 Chapter 1 Fundamentals of Computer Design.

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Evaluating Performance
Computer Organization and Architecture 18 th March, 2008.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
1 Introduction Rapidly changing field: –vacuum tube -> transistor -> IC -> VLSI (see section 1.4) –doubling every 1.5 years: memory capacity processor.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
Computer Architecture Lecture 2 Instruction Set Principles.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Chapter 4 Assessing and Understanding Performance
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 3, 2003 Lecture 2.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture 3: Computer Performance
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
1 Measuring Performance Chris Clack B261 Systems Architecture.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 7-1 Measuring.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  1998 Morgan Kaufmann Publishers Where we are headed Performance issues (Chapter 2) vocabulary and motivation A specific instruction set architecture.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
1  1998 Morgan Kaufmann Publishers Lectures for 2nd Edition Note: these lectures are often supplemented with other materials and also problems from the.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Measuring Performance II and Logic Design
CS203 – Advanced Computer Architecture
Lecture 2: Performance Evaluation
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Defining Performance Which airplane has the best performance?
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Defining Performance Section /14/2018 9:52 PM.
Parameters that affect it How to improve it and by how much
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

CET Gannod1 Chapter 1 Fundamentals of Computer Design

CET Gannod2 A bit of history 1945 – no stored-program computers Today, less than $1,000 will buy a personal computer more powerful than a computer bought in 1980 for $1 million. What has contributed to this rapid increase? –Technology –Computer Design

CET Gannod3 Technology vs. Design Improvements COPYRIGHT 2003 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED

CET Gannod4 Course Emphasis “by 2001, the difference between the highest-performance microprocessors and what would have been obtained by relying solely on technology, including improved circuit design, was about a factor of 15” In this course we will discuss: –Architectural design techniques used –Associated compiler improvements –Quantitative approach to computer design and analysis

CET Gannod5 Figure 1.3 FeatureDesktopServerEmbedded Price of system $1000- $10,000 $10,000- $10,000,000 $10-$100,000 Price of micr. proc. module $100-$1000$200-$2000$.20-$200 # sold per year150,000,0004,000,000300,000,000 System design issues price- performance, graphics performance throughput, availability, scalability price, power consumption, app-spec perf.

CET Gannod6 Definitions Instruction Set Architecture refers to visible instruction set. Implementation has 2 components: –Organization –Hardware Organization includes high-level aspects of design. –SPARC-2 and SPARC-20 have same ISA but different organizations. Hardware refers to specifics of a machine. –Two versions of Silicon Graphics Indy have same ISA, same organization, but different hardware (clock rate, cache structure). Architecture covers all 3 aspects of computer design.

CET Gannod7 Computer Design Architects must design computer under several constraints: –Price –Power –Performance –Functional requirements Application software often drives functional requirements (Figure 1.4, pg. 10) Architect must prioritize requirements and try to optimize design in light of all constraints.

CET Gannod8 Functional Requirements Application Area General purpose Scientific Commercial servers Embedded Computing Level of Software Compatibility Programming Lang Binary Operating System requirements Addr. Space Memory management Protection Standards Floating Point I/O bus OS Networks Programming Lang

CET Gannod9 Cost in a System Cabinet 6% Processor Board37% I/O devices37% Software20% -- processor is single most expensive item (22%) -- monitor is second most costly item (19%) Cost vs. Price -- cost is not the same as price -- price includes direct costs (labor, scrap, warranty), gross margin(R&D, marketing, sales, building rental, etc.), discounts -- Changing cost by $1000 could change price by $3000-$4000

CET Gannod10 Performance When we say one computer has better performance than another, what do we mean? Different people may mean different things: –Single user –Manager of large, multi-user system

CET Gannod11 Performance – key terms Execution time (response time) –time to execute a job from start to finish Wall-clock time (elapsed time) CPU time –user time –system time Throughput –number of jobs processed per time unit System performance refers to elapsed time on an unloaded system CPU performance refers to user CPU time on an unloaded system

CET Gannod12 Performance – key formulas Performance for completing task X is Allows us to compare performance of different machines on the same task. Execution time for program:

CET Gannod13 Improving Performance Improving performance is a hardware designers main goal. Given the previous formulas, how can a designer improve performance?

CET Gannod14 If only it were that simple... Unfortunately, these factors are NOT independent –Changing instruction set to lower the instruction count may lead to an organization with a slower clock cycle time… –small IC may not be fastest because complex instructions require more clock cycles. There are many tradeoffs when designing for better performance.

CET Gannod15 Performance Equation IC: is a function of the instruction set architecture and compiler technology. CPI: is primarily a function of implementation. CP: is primarily a function of the hardware technology.

CET Gannod16 Frequency vs. Period Execution time is based on the clock period. We are often given the clock rate (frequency) in MHz. What is MHz? What is relationship between rate (f) and clock period (cp)? Example –Clock rate 500 MHz –What’s the clock period?

CET Gannod17 Benchmarking -- key terms Workload: set of programs (instructions) that run on the computer system. Benchmarks: programs chosen specifically to measure performance. Workload is meant to predict typical performance. –Real Applications –Modified (or scripted) applications –Kernels – pieces of real programs –Toy benchmarks – small, well-known programs –Synthetic benchmarks – match average frequency of operations in real programs.

CET Gannod18 Benchmark suites A benchmark suite is a collection of benchmark programs that contain a variety of applications. –E.g., SPEC92 Advantage: weakness of one benchmark is lessened by presence of other benchmarks. When we have several benchmarks we need to summarize performance of entire suite to determine which system has better performance.

CET Gannod19 Summarizing Performance Typically we measure several different applications (not just 1) People like to have a single number to measure performance. Total Execution Time Weighted Execution Time Normalized Execution Time

CET Gannod20 Total Execution Time Simply add execution times of all benchmarks in suite. Prev Example: Arithmetic mean is closely related:

CET Gannod21 Weighted Execution Time Are all programs run an equal number of times? Weighted arithmetic mean: Prev Example: –w1 =.5w2 =.5 –w1 =.909w2 =.091 –w1=.999w2 =.001

CET Gannod22 Normalized ExT Normalize times to a reference machine and then summarize. Geometric mean of normalized times: Prev Example: –Normalize to A –Normalize to C

CET Gannod23 Pros and Cons of GM Pros: –GM is independent of running times of individual programs –Independent of base machine. Cons: –Does not predict execution time. –Encourages hardware and software designers to focus on benchmarks that are easiest to improved rather than the slowest ones.

CET Gannod24 Amdahl’s Law One important principle in computer design is: Make the common case fast. Amdahl’s Law defines the speedup that can be gained by using a particular feature: Not always possible to use enhancement:

CET Gannod25 Speedup speedup = old Ext / new Ext

CET Gannod26 Example 1 Implementations of FP square root vary in performance. Suppose FPSQR is responsible for 20% of execution time, and all FP operations are responsible for 50% of execution time. Proposal 1: add hardware to speed up FPSQR by factor of 10. Proposal 2: make all FP instructions run 2 times faster. Which proposal should we accept?

CET Gannod27 Solution Speedup FPSQR Speedup FP

CET Gannod28 Calculating CPI Calculate from Performance Equation: During design, we don’t know what ExT time is… Calculate CPI from detailed understanding of the architecture:

CET Gannod29 Example 2 Consider 3 compilers (1,2,3) for the same machine. The machine has 3 classes (A,B,C) of instructions with the following characteristics: ClassCPI A1 B2 C3 The clock rate is f=100MHz For a particular program, the compilers generate code with the following IC values (in millions of instructions) CompilerIC A IC B IC C Which compiler generates the fastest code?

CET Gannod30 Solution First, calculate the CPI for each compiler: Execution time for the 3 compilers:

CET Gannod31 Measuring Performance factors CP: –Easy to do after the machine is built! –During design, use timing estimators to measure critical paths in design IC: –Compilers, profilers –Often interested in finding instruction mix as well (simulators/execution-based monitoring) CPI: –Simplistic example we did previously not very accurate in modern processors (pipeline/cache effects) –CPI i = Pipeline CPI i + Memory CPI i

CET Gannod32 MIPS Million Instructions Per Second Suppose we want to compare performance of task on two different implementations of the SAME architecture. Same architecture => same IC Smaller sec/instr => Larger instr/sec

CET Gannod33 (native) MIPS Note: this measure is ONLY valid for comparing SAME program on SAME architecture!!! Consider two implementations: –CPI A = 1.5 –freq A = 400 MHz –CPI B = 1.8 –freq B = 500 MHz Which has better native MIPS rating?

CET Gannod34 Solution Native MIPS A Native MIPS B

CET Gannod35 MIPS, MOPS and other FLOPS MIPS : –Millions of Instructions Per Second MOPS: –Millions of Operations Per Second FLOPS –FLoating point Operations Per Second

CET Gannod36 MIPS is unreliable Recall Example #2 –f=100MHz CompilerIC A IC B IC C Find MIPS rating –C1: –C2: –C3: Conclusions using MIPS:

CET Gannod37 Memory Hierarchy Computer systems usually have several different types of memory organized in a hierarchy. WHY? CPU Registers Cache Memory disk

CET Gannod38 Locality Locality of reference: programs tend to reuse data and instructions that they have used before (recently) Two types of locality –Temporal: recently axccessed items are likely to be accessed in the near future –Spatial: items whose addresses are near one another tend to be referenced close together in time. Do we expect instructions or data to have a higher degree of locality?

CET Gannod39 Key Terms Cache hit: CPU finds requested data in cache Cache miss: requested data not in cache. Miss rate: fraction of cache accesses that result in miss Block: amount of data transferred between cache and memory Miss penalty: extra time taken to get requested cache block into cache. Page fault: requested data is not in memory.

CET Gannod40 Example 4 Suppose cache is 10 times faster than main memory and that cache can be used 90% of the time. How much speedup do we gain from using the cache? Using Amdahl’s Law:

CET Gannod41 CPU ExT and the Memory Hierarchy Need to expand our definition of CPU ExT:

CET Gannod42 Example 5 A machine has CPI 2 when all memory accesses hit cache. 40% of the instructions access data. If the miss penalty is 25 clock cycles and the miss rate is 2%, how much faster would the machine be if the hit rate was 100%? ExT ideal = ExT cache =