Fundamentals of Computer Design

Slides:



Advertisements
Similar presentations
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Advertisements

Computer Abstractions and Technology
5/18/2015CPE 731, 4-Principles 1 Define and quantify dependability (1/3) How decide when a system is operating properly? Infrastructure providers now offer.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
CPE 731 Advanced Computer Architecture Instruction Set Principles Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
Rung-Bin Lin Chapter 1: Fundamental of Computer Design1-1 Chapter 1. Fundamentals of Computer Design Introduction –Performance Improvement due to (1).
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
1 Chapter 1: Fundamentals of Computer Design Introduction, class of computers Instruction set architecture (ISA) Technology trend: performance, power,
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
The University of Adelaide, School of Computer Science
MS108 Computer System I Lecture 2 Metrics Prof. Xiaoyao Liang 2014/2/28 1.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
Computer Architecture
Performance Performance
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
EEL5708/Bölöni Lec 3.1 Fall 2006 Sept 1, 2006 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Lecture 3 Review: Instruction Sets.
1 Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with.
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
EEL5708/Bölöni Lec 3.1 Fall 2004 Sept 1, 2004 Lotzi Bölöni Fall 2004 EEL 5708 High Performance Computer Architecture Lecture 3 Review: Instruction Sets.
CS203 – Advanced Computer Architecture
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Computer Architecture & Operations I
Morgan Kaufmann Publishers Technology Trends and Performance
Measuring Performance II and Logic Design
CS203 – Advanced Computer Architecture
What is *Computer Architecture*
CS203 – Advanced Computer Architecture
Lecture 2: Performance Today’s topics:
Lecture 2: Performance Evaluation
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
Lecture 3: MIPS Instruction Set
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Chapter 1: Fundamentals of Quantitative Design and Analysis
Morgan Kaufmann Publishers Computer Abstractions and Technology
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Uniprocessor Performance
Morgan Kaufmann Publishers
COSC 3406: Computer Organization
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Appendix A Classifying Instruction Set Architecture
Lecture 2: Performance Today’s topics: Technology wrap-up
CMSC 611: Advanced Computer Architecture
Chapter 1 Fundamentals of Computer Design
Performance of computer systems
The University of Adelaide, School of Computer Science
Computer Evolution and Performance
Lecture 3: MIPS Instruction Set
The University of Adelaide, School of Computer Science
Performance of computer systems
CMSC 611: Advanced Computer Architecture
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
The University of Adelaide, School of Computer Science
Utsunomiya University
CS161 – Design and Architecture of Computer Systems
Presentation transcript:

Fundamentals of Computer Design Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design CDA 5155 – Fall 2017 Copyright © 2017 Prabhat Mishra

Microprocessor Performance Trends Move to multi-processor RISC

Design Complexity Exponential Growth – doubling of transistors every couple of years

Technology and Demand #of transistors are doubling every 2 years Communication, multimedia, entertainment, networking Exponential growth of design complexity verification complexity

Fundamentals of Computer Design Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion

Computer Market Desktop Server Embedded Systems Driven by price-performance $1000 - $10,000 [$100 - $1000 per processor] Server Throughput, availability, scalability $10K - $10M [$200 - $2000 per processor] Embedded Systems Application specific Low cost, low power, real-time performance $10 - $100,000 [$0.20 - $200 per processor]

An Example Embedded System Digital Camera Block Diagram

Components of Embedded Systems Controllers Memory Interface Software (Application Programs) Processor Coprocessors ASIC Converters Analog Digital Analog

Fundamentals of Computer Design Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion

Computer Architecture Definition Instruction set architecture (ISA) Programmer (user) View Implementation Organization: CPU, memory, buses, I/O Hardware: logic design, packaging technology Computer design must meet Functional requirements Area, performance, cost, power goals Optimize, evaluate, and explore to find best possible architecture Consider other factors Time-to-market, technology trend, safety, reliability, …

Instruction-Set Architecture (ISA) An instruction set architecture is a specification of a standardized programmer-visible interface to hardware, comprised of: A set of instructions (instruction types and operations) With associated argument fields, assembly syntax, binary encoding. A set of named storage locations and addressing Registers, memory, … programmer-accessible caches? A set of addressing modes (ways to name locations) Types and sizes of operands Control flow instructions Often an I/O interface (usually memory-mapped)

Example: MIPS r0 Programmable storage r1 Data types ? 232 x bytes ° Programmable storage 232 x bytes 31 x 32-bit GPRs (R0=0) 32 x 32-bit FP regs (paired DP) HI, LO, PC Data types ? Format ? Addressing Modes? PC lo hi Arithmetic logical ADD, ADDU, SUB, SUBU, AND, OR, XOR, NOR, SLT, SLTU, ADDI, ADDIU, SLTI, SLTIU, ANDI, ORL, XORL, LUI SLL, SRL, SRA, SLLV, SRLV, SRAV Memory Access LB, LBU, LH, LHU, LW, LWL, LWR SB, SH, SW, SWL, SWR Control J, JAL, JR, JALR BEQ, BNE, BLEZ, BGTZ, BLTZ, BGEZ, BLTZAL, BGEZAL

MIPS64 Instruction Format

Morgan Kaufmann Publishers 19 May, 2018 MIPS Implementation Chapter 4 — The Processor

Pipelined Implementation Morgan Kaufmann Publishers 19 May, 2018 Pipelined Implementation Chapter 4 — The Processor

Fundamentals of Computer Design Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion

Technology Trend Component Scaling of performance, wires and power IC technology: transistor/chip increases 55% per year DRAM: density increases 40-60% per year Magnetic disk: density increases 100% per year Network: Ethernet from 10  100Mb took 10 years; 100Mb  1Gb in 5 years Scaling of performance, wires and power Feature size: 10 micron in 1971; 0.18 in 2001, … Microprocessor organization improvement Wiring delay Power issue: ~100 watts for 2GHz Pentium 4

Bandwidth vs. Latency Latency improvement is 6-80X while bandwidth improvement is 300-25000X.

Fundamentals of Computer Design Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion

The University of Adelaide, School of Computer Science 19 May 2018 Power Intel 80386 consumed ~ 2 W 3.3 GHz Intel Core i7 consumes 130 W Heat must be dissipated from 1.5 x 1.5 cm chip This is the limit of what can be cooled by air Chapter 2 — Instructions: Language of the Computer

Power and Energy P E t In many cases, faster execution means less energy, but the opposite may be true if power has to be increased to allow faster execution.

Power and Energy Power is drawn from a voltage source Power: Energy: Average Power:

Dynamic Power Power needed to charge and discharge load capacitances when transistors switch. The capacitor needs to charge for output to be ‘1’ For output to be ‘0’, capacitor needs to discharge This repeats T.fsw times over an interval of T Here,  is activity factor and f is clock frequency. 

Static Power Because leakage current flows even when a transistor is off, now static power important too Leakage current increases in processors with smaller transistor sizes Increasing the number of transistors increases power even if they are turned off In 2006, goal for leakage is 25% of total power consumption; high performance designs at 40% Very low power systems even gate voltage to inactive modules to control loss due to leakage

Reducing Energy Consumption Pentium Crusoe Running the same multimedia application. [www.transmeta.com] Infrared Cameras (FLIR) can be used to detect thermal distribution.

Dynamic Power Management (DPM) RUN: operational IDLE: a SW routine may stop the CPU when not in use, while monitoring interrupts SLEEP: Shutdown of on-chip activity 400mW RUN 10µs 90µs 160ms STRONGARM SA1100 10µs IDLE 90µs SLEEP 50mW 160µW

Dynamic Voltage Scaling (DVS) E = P x T P  V2 E (energy), P (power), T (time), V (voltage) Example A task is given with workload (W) and deadline (D). Assume that idle energy is negligible. E1  V12.T1 = V2.T E2  V22.T2 = V2/4.2T = E1/2 V V/2 T D T 2T D

Multicores – Low Power? Multicore New challenges One core with frequency 2 GHz Two cores with 1 GHz frequency (each) Same performance Two 1 GHz cores require half power/energy Power  freq2 1GHz core needs one-fourth power compared to 2GHz core. New challenges Performance concerns – how to keep them busy? Reliability concerns – MTTF goes worse! and more …

Fundamentals of Computer Design Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion

DRAM Pricing © 2003 Elsevier Science (USA). All rights reserved.

Processor Pricing (Intel Pentium III) © 2003 Elsevier Science (USA). All rights reserved.

Silicon Wafer This 300 mm wafer contains 280 Intel Core i7 dies, each 20.7 by 10.5 mm in a 32 nm process.

Intel Core i7 Die The dimensions are 18.9 mm by 13.6 mm (257 mm2) in a 45 nm process. (Courtesy Intel.)

Floorplan of Intel Core i7

Integrated Circuit Cost The University of Adelaide, School of Computer Science 19 May 2018 Integrated Circuit Cost Integrated circuit Bose-Einstein formula: Defects per unit area = 0.016-0.057 defects per cm2 (2010) N = process-complexity factor = 11.5-15.5 (40 nm, 2010) Chapter 2 — Instructions: Language of the Computer

Fundamentals of Computer Design Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion

Define and Quantify Dependability How to decide when a system is operating properly? Infrastructure providers now offer Service Level Agreements (SLA) to guarantee that their networking or power service would be dependable Systems alternate between 2 states of service with respect to an SLA: State 1: Service accomplishment, where the service is delivered as specified in SLA State 2: Service interruption, where the delivered service is different from the SLA Failure = transition from state 1 to state 2 Restoration = transition from state 2 to state 1

Dependability Module reliability = measure of continuous service accomplishment (or time to failure) Two metrics: Mean Time To Failure (MTTF) – measures Reliability Failures In Time (FIT) = 1/MTTF, the rate of failures Traditionally reported as failures per billion hours of operation Mean Time To Repair (MTTR) measures Service Interruption Mean Time Between Failures (MTBF) = MTTF+MTTR Module availability measures service as alternate between the 2 states of accomplishment and interruption (number between 0 and 1, e.g. 0.9) Module availability = MTTF / ( MTTF + MTTR)

Example If modules have exponentially distributed lifetimes (age of module does not affect probability of failure), overall failure rate is the sum of failure rates of the modules Calculate FIT and MTTF for 10 disks (1M hour MTTF per disk), 1 disk controller (0.5M hour MTTF), and 1 power supply (0.2M hour MTTF): ( )

Fundamentals of Computer Design Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion

Performance Measurement Performance metrics  execution time Increasing performance decreases execution time Other metrics Wall-clock time, response time, elapsed time CPU time: user or system We will focus on CPU performance, i.e., user CPU time on unloaded system

Choosing Programs to Evaluate Performance Real applications For example: gcc compiler, Microsoft Word Modified (or scripted) applications For example: remove I/O, script to simulate interactive behavior. Kernels For example: Livermore loops, Linpack Toy benchmarks For example: sieve of eratosthenes, quicksort Synthetic benchmarks For example: wheatstone, dhrystone Lower Accuracy

Benchmark Suites Desktop Server Embedded Processor New SPEC CPU2006 SPEC CPU2000: 11 integer, 14 floating-point SPECviewperf, SPECapc: graphics benchmarks Server SPEC CPU2000: running multiple copies SPECSFS: for NFS performance SPECWeb: Web server benchmark TPC-x: measure transaction-processing, queries, and decision making database applications Embedded Processor EEMBC: EDN Embedded Microprocessor Benchmark Consortium

SPEC CPU Benchmarks

Reporting Performance Performance should be reproducible Description of the machine and compiler flags Report for both baseline and optimized version Source code modifications Not allowed in SPEC benchmarks Allowed but difficult or impossible TPC-C using Oracle or SQL database Allowed in supercomputer benchmarks Modify or re-write algorithms Hand-coding in assembly for EEMBC benchmark

Comparing Performance Arithmetic Mean: What is the mixture of programs in the workload? Arithmetic Mean: 500.5 55 20

Comparing Performance Weighted Arithmetic Mean: What if programs are fixed and inputs are not?

Comparing Performance Geometric Mean: Execution time ratio is normalized to a base machine. Reference machine is not important. The arithmetic means are different depending on which machine is used as basis, but geometric means are same. Geometric mean does not predict execution time

Normalized Execution Times (SPECRatio) Geometric mean does not predict execution time Performance of machines A and B are same only if program P1 is executed 100 times for every occurrence of program P2 Rewards easy enhancements Improving program P3 (2 to 1) is same as improving program P4 (1000 to 500).

Fundamentals of Computer Design Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion

Amdahl’s Law Make the common case fast Performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. Where: f is a fraction of the execution time that can be enhanced n is the enhancement factor Example: f = 0.1, n = 10  Speedup = 1.1

Application of Amdahl’s Law Amdahl’s law is useful for comparing overall performance of two design alternatives. Example: Floating-point (FP) operations consume 50% of the execution time of a graphics application. FP square root (FPSQRT) is used 20% of the time. Improve FPSQRT operation execution by 10 times Speedup = 1 / ((1-0.2) + 0.2/10) = 1.22 Improve all FP operations by 1.6 times Speedup = 1 / ((1-0.5) + 0.5/1.6) = 1.23 Due to higher frequency of FP operations, the performance gain is more (case 2) compared to drastic improvement of FPSQRT (case 1).

Measuring the Performance Performance Equation CPU time = Instruction Count x Clock cycle time x CPI How to compute these parameters Known for existing processors Clock cycle time Use of counters in new processors CPI, Instruction count Simulation for performance analysis Profile based Trace-driven Execution-driven

CPU Performance Equation The parameters are dependent Instruction Count: ISA and compiler technology CPI: Organization and ISA Cycle Time: Hardware technology and organization Many performance enhancing techniques improves one with small/predictable impacts on the other two.

Example Parameters: Compare 2 designs: Frequency of FP operations (incl. FPSQR) = 25% CPI for FP operations = 4; CPI for others = 1.33 Frequency of FPSQR = 2%; CPI of FPSQR = 20 Compare 2 designs: Decrease CPI of FPSQR to 2 CPI of all FP to 2.5

Fundamentals of Computer Design Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion

Fallacies and Pitfalls The relative performance of two processors with the same ISA can be judged by clock rate or by the performance of a single benchmark suite. 1.7 GHz Pentium 4 relative to 1.0 GHz Pentium III © 2003 Elsevier Science (USA). All rights reserved.

Fallacies and Pitfalls Benchmarks remain valid indefinitely. One line in matrix300(SPEC89) executes 99% of the time Peak performance tracks observed performance. The best design is the one that optimizes the primary objective without considering design costs. Synthetic benchmarks predict performance for real programs. Compiler/hardware optimizations can inflate performance MIPS is an accurate measure for comparing performance among computers Consider using FP hardware instead of FP routines.