1 Chapter 1: Fundamentals of Computer Design Introduction, class of computers Instruction set architecture (ISA) Technology trend: performance, power,

Slides:



Advertisements
Similar presentations
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Advertisements

CS1104: Computer Organisation School of Computing National University of Singapore.
Computer Abstractions and Technology
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
5/18/2015CPE 731, 4-Principles 1 Define and quantify dependability (1/3) How decide when a system is operating properly? Infrastructure providers now offer.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Intro II & Trends Steve Ko Computer Sciences and Engineering University at Buffalo.
Instruction Level Parallelism and Its Exploitation.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
CPE 731 Advanced Computer Architecture Instruction Set Principles Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CPSC 614 Computer Architecture Lec 2 - Introduction EJ Kim Dept. of Computer Science Texas A&M University Adapted from CS 252 Spring 2006 UC Berkeley Copyright.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Chapter 4 Assessing and Understanding Performance
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
CptS / E E COMPUTER ARCHITECTURE
CSE820 Lec 2 - Introduction Rich Enbody Based on slides by David Patterson.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
CS136, Advanced Architecture Introduction to Architecture (continued)
Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department.
Computer Architecture- 1 Ping-Liang Lai ( 賴秉樑 ) Chapter 1 Fundamentals of Computer Design Computer Architecture 計算機結構.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Computer Architecture Lec 2 - Introduction. 01/19/10Lec 02-intro 2 Review from last lecture Computer Architecture >> instruction sets Computer Architecture.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
The University of Adelaide, School of Computer Science
Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
CS 5513: Computer Architecture Lecture 1: Introduction Daniel A. Jiménez The University of Texas at San Antonio
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
CPE 731 Advanced Computer Architecture Technology Trends Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,
CS 5513 Computer Architecture Lecture 2 – More Introduction, Measuring Performance.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Computer Architecture CPSC 350
CS 3853/3851: Computer Architecture Lecture 1: Introduction Daniel A. Jiménez The University of Texas at San Antonio
Chapter 1 Technology Trends and Performance. Chapter 1 — Computer Abstractions and Technology — 2 Technology Trends Electronics technology continues to.
Morgan Kaufmann Publishers
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Performance Performance
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
EEL5708/Bölöni Lec 3.1 Fall 2006 Sept 1, 2006 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Lecture 3 Review: Instruction Sets.
EGRE 426 Computer Organization and Design Chapter 4.
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
EEL5708/Bölöni Lec 3.1 Fall 2004 Sept 1, 2004 Lotzi Bölöni Fall 2004 EEL 5708 High Performance Computer Architecture Lecture 3 Review: Instruction Sets.
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
Ch1. Fundamentals of Computer Design 1. Formulas ECE562 Advanced Computer Architecture Prof. Honggang Wang ECE Department University of Massachusetts Dartmouth.
Computer Architecture & Operations I
Morgan Kaufmann Publishers Technology Trends and Performance
Measuring Performance II and Logic Design
Lecture 2: Performance Evaluation
Fundamentals of Computer Design
Lecture 3: MIPS Instruction Set
Chapter 1: Fundamentals of Quantitative Design and Analysis
HY425 – Αρχιτεκτονική Υπολογιστών Διάλεξη 02
Morgan Kaufmann Publishers
COSC 3406: Computer Organization
Computer Architecture CSCE 350
CPE 432 Computer Design 1 – Introduction and Technology Trends
Appendix A Classifying Instruction Set Architecture
Computer Evolution and Performance
Lecture 3: MIPS Instruction Set
The University of Adelaide, School of Computer Science
Utsunomiya University
Presentation transcript:

1 Chapter 1: Fundamentals of Computer Design Introduction, class of computers Instruction set architecture (ISA) Technology trend: performance, power, cost Dependability Measuring performance CDA5155 Spring, 2008, Peir / University of Florida

2 Microprocessor Performance Trends

3 Conventional Wisdom Old CW: Uniprocessor performance 2X / 1.5 yrs New CW: Power Wall + ILP Wall + Memory Wall = New Brick Wall  Uniprocessor performance now 2X / 5(?) yrs  Sea change in chip design: multiple “cores” (2X processors per chip / ~ 2 years) More simpler processors are more power efficient Exploit TLP and DLP, not ILP Programmer / compiler involvement

4 Classes of Computers Desk top –Still largest market in dollar amount –Driven by price-performance –Application-driven performance evaluation Server –High performance, high power –Availability, scalability –Designed for efficient throughput Embedded system –Largest volume –Real-time performance requirement –Minimize memory and power

5 Computer Architecture Old Definition –Old definition of computer architecture = instruction set design Other aspects of computer design called implementation Insinuates implementation is uninteresting or less challenging –Right view is computer architecture >> ISA –Architect’s job much more than instruction set design; technical hurdles today more challenging than instruction set design New Definition –What really matters is the functioning of the complete system hardware, runtime system, compiler, operating system, application In networking, called the “End to End argument” –Computer architecture is not just about transistors, individual instructions, or particular implementations E.g., RISC replaced complex instr. with compiler + simple instr.

6 ISA An instruction set architecture is a specification of a standardized programmer-visible interface to hardware, comprised of: –A set of instructions (instruction types and operations) With associated argument fields, assembly syntax, and machine encoding. –A set of named storage locations and addressing Registers, memory, … Programmer-accessible caches? –A set of addressing modes (ways to name locations) –Types and sizes of operands –Control flow instructions –Often an I/O interface (usually memory-mapped)

7 Example: MIPS 0 r0 r1 ° r31 PC lo hi Programmable storage 2^32 x bytes 31 x 32-bit GPRs (R0=0) 32 x 32-bit FP regs (paired DP) HI, LO, PC Data types ? Format ? Addressing Modes? Arithmetic logical Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU, AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI SLL, SRL, SRA, SLLV, SRLV, SRAV Memory Access LB, LBU, LH, LHU, LW, LWL,LWR SB, SH, SW, SWL, SWR Control J, JAL, JR, JALR BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL 32-bit instructions on word boundary

8 MIPS64 Instruction Format

9 Overview of This Course Understanding the design techniques, machine structures, technology factors, evaluation methods that determine the form of computers in 21st Century Technology Programming Languages Operating Systems History Applications Interface Design (ISA) Measurement & Evaluation Parallelism Computer Architecture: Organization Hardware/Software Boundary Compilers

10 Technology Trend Drill down into 4 technologies: –Disks, –Memory, –Network, –Processors Compare ~1980 vs. ~2000 –Performance Milestones in each technology Compare for Bandwidth vs. Latency improvements in performance over time –Bandwidth: number of events per unit time E.g., M bits / second over network, M bytes / second from disk –Latency: elapsed time for a single event E.g., one-way network delay in microseconds, average disk access time in milliseconds

11 Disk Comparison CDC Wren I, RPM 0.03 GBytes capacity Tracks/Inch: 800 Bits/Inch: 9550 Three 5.25” platters Bandwidth: 0.6 MBytes/sec Latency: 48.3 ms Cache: none Seagate , RPM (4X) 73.4 GBytes (2500X) Tracks/Inch: (80X) Bits/Inch: 533,000 (60X) Four 2.5” platters (in 3.5” form factor) Bandwidth: 86 MBytes/sec (140X) Latency: 5.7 ms (8X) Cache: 8 MBytes

12 Memory Comparison 1980 DRAM (asynchronous) 0.06 Mbits/chip 64,000 xtors, 35 mm2 16-bit data bus per module, 16 pins/chip 13 Mbytes/sec Latency: 225 ns (no block transfer) 2000 Double Data Rate Synchr. (clocked) DRAM Mbits/chip (4000X) 256,000,000 xtors, 204 mm 2 64-bit data bus per DIMM, 66 pins/chip (4X) 1600 Mbytes/sec (120X) Latency: 52 ns (4X) Block transfers (page mode)

13 LAN Comparison Ethernet Year of Standard: Mbits/s link speed Latency: 3000  sec Shared media Coaxial cable Ethernet 802.3ae Year of Standard: ,000 Mbits/s(1000X) link speed Latency: 190  sec (15X) Switched media Category 5 copper wire Copper core Insulator Braided outer conductor Plastic Covering Copper, 1mm thick, twisted to avoid antenna effect Twisted Pair: "Cat 5" is 4 twisted pairs in bundle

14 CPU Comparison 1982 Intel MHz 2 MIPS (peak) Latency 320 ns 134,000 xtors, 47 mm 2 16-bit data bus, 68 pins Microcode interpreter, separate FPU chip (no caches) 2001 Intel Pentium MHz(120X) 4500 MIPS (peak) (2250X) Latency 15 ns (20X) 42,000,000 xtors, 217 mm 2 64-bit data bus, 423 pins 3-way superscalar, Dynamic translate to RISC, Superpipelined (22 stage), Out-of-Order execution On-chip 8KB Data caches, 96KB Instr. Trace cache, 256KB L2 cache

15 Bandwidth vs. Latency Performance Milestones: Processor: ‘286, ‘386, ‘486, Pentium, Pentium Pro, Pentium 4 (21x,2250x) Ethernet: 10Mb, 100Mb, 1000Mb, Mb/s (16x,1000x) Memory Module: 16bit plain DRAM, Page Mode DRAM, 32b, 64b, SDRAM, DDR SDRAM (4x,120x) Disk : 3600, 5400, 7200, 10000, RPM (8x, 143x)

16 Summary on Technology Trend For disk, LAN, memory, and microprocessor, bandwidth improves by square of latency improvement –In the time that bandwidth doubles, latency improves by no more than 1.2X to 1.4X Lag probably even larger in real systems, as bandwidth gains multiplied by replicated components –Multiple processors in a cluster or even in a chip –Multiple disks in a disk array –Multiple memory modules in a large memory –Simultaneous communication in switched LAN HW and SW developers should innovate assuming Latency Lags Bandwidth –If everything improves at the same rate, then nothing really changes –When rates vary, require real innovation

17 Define and Quantity Power For CMOS, traditional dominant energy consumption has been in switching transistors, called dynamic power For mobile devices, energy better metric For fixed task, slowing clock rate (frequency switched) reduces power, but not energy Capacitive load, a function of number of transistors connected to output and technology, which determines capacitance of wires and transistors Dropping voltage helps both, so went from 5V to 1V Turn off clock to save energy & dynamic power

18 Example Suppose 15% reduction in voltage results in a 15% reduction in frequency. What is impact on dynamic power?

19 Static Power Because leakage current flows even when a transistor is off, now static power important too Leakage current increases in processors with smaller transistor sizes Increasing the number of transistors increases power even if they are turned off In 2006, goal for leakage is 25% of total power consumption; high performance designs at 40% Very low power systems even gate voltage to inactive modules to control loss due to leakage

20 Define and Quantity Dependability How decide when a system is operating properly? Infrastructure providers now offer Service Level Agreements (SLA) to guarantee that their networking or power service would be dependable Systems alternate between 2 states of service with respect to an SLA: Service accomplishment, where the service is delivered as specified in SLA Service interruption, where the delivered service is different from the SLA Failure = transition from state 1 to state 2 Restoration = transition from state 2 to state 1

21 Dependability (cont.) Module reliability = measure of continuous service accomplishment (or time to failure). 2 metrics: 1.Mean Time To Failure (MTTF) measures Reliability 2.Failures In Time (FIT) = 1/MTTF, the rate of failures –Traditionally reported as failures per billion hours of operation Mean Time To Repair (MTTR) measures Service Interruption –Mean Time Between Failures (MTBF) = MTTF+MTTR Module availability measures service as alternate between the 2 states of accomplishment and interruption (number between 0 and 1, e.g. 0.9) Module availability = MTTF / ( MTTF + MTTR)

22 Example If modules have exponentially distributed lifetimes (age of module does not affect probability of failure), overall failure rate is the sum of failure rates of the modules Calculate FIT and MTTF for 10 disks (1M hour MTTF per disk), 1 disk controller (0.5M hour MTTF), and 1 power supply (0.2M hour MTTF): 17,000 failure per billion hours

23 Performance Measurement Performance metrics: execution time Other metrics –Wall-clock time, response time, elapsed time –CPU time: user or system –We will focus on CPU performance, i.e. user CPU time on unloaded system

24 Benchmark Suites Desktop –New SPEC CPU2006 (Fig. 1.13) –SPEC CPU2000: 11 integer, 14 floating-point –SPECviewperf, SPECapc: graphics benchmarks Server –SPEC CPU2000: running multiple copies, SPECrate –SPECSFS: for NFS performance –SPECWeb: Web server benchmark –TPC-x: measure transaction-processing, queries, and decision making database applications Embedded Processor –New area –EEMBC: EDN Embedded Microprocessor Benchmark Consortium

25 SPEC CPU Benchmarks

26 Comparing Performance Arithmetic Mean: Weighted Arithmetic Mean: Geometric Mean: –Execution time ratio is normalized to a base machine –Is used to figure out SPECrate

27 SPECRatio SPECRatio: Normalize execution times to reference computer, yielding a ratio proportional to performance = time on reference computer time on computer being rated If program SPECRatio on Computer A is 1.25 times bigger than Computer B, then

28 Summarize Suite Performance Since ratios, proper mean is geometric mean (SPECRatio unitless, so arithmetic mean meaningless) Geometric mean of the ratios is the same as the ratio of the geometric means Ratio of geometric means = Geometric mean of performance ratios  choice of reference computer is irrelevant! These two points make geometric mean of ratios attractive to summarize performance

29 Performance, Price-Performance (SPEC)

30 Performance, Price-Performance (TPC-C)

31 Amdahl’s Law Where: f is a fraction of the execution time that can be enhanced n is the enhancement factor Example: f =.9, n = 10 => Speedup = 5.26

32 CPU Performance Equation Clock Cycle Time: Hardware technology and organization CPI: Organization and Inst Set Architecture (ISA) Instruction Count: ISA and compiler technology We will focus more on the organization issues

33 Example Parameters: –FP operations (including FPSQR) = 25% –CPI for FP operations = 4; CPI for others = 1.33 –Frequency of FPSQR = 2%; CPI of FPSQR = 20 Compare the following 2 designs: –Decrease CPI of FPSQR to 2; or CPI of all FP to 2.5

34 Misc. Items Check SPEC web site for more information, Read Fallacies and Pitfalls –For example, Fallacy MIPS is an accurate measure for comparing performance among computers is a Fallacy

35 Example Using MIPS Instruction distribution: –ALU: 43%, 1 cycle/inst –Load: 21%, 2 cycle/inst –Store: 12%, 2 cycle/inst –Branch: 24%, 2 cycle/inst Optimization compiler reduces 50% of ALU