Download presentation
Presentation is loading. Please wait.
1
Fundamentals of Computer Design
Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design CDA – Fall Copyright © Prabhat Mishra
2
Microprocessor Performance Trends
Move to multi-processor RISC
3
Design Complexity Exponential Growth – doubling of transistors every couple of years
4
Technology and Demand #of transistors are doubling every 2 years
Communication, multimedia, entertainment, networking Exponential growth of design complexity verification complexity
5
Fundamentals of Computer Design
Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion
6
Computer Market Desktop Server Embedded Systems
Driven by price-performance $ $10,000 [$100 - $1000 per processor] Server Throughput, availability, scalability $10K - $10M [$200 - $2000 per processor] Embedded Systems Application specific Low cost, low power, real-time performance $10 - $100,000 [$ $200 per processor]
7
An Example Embedded System
Digital Camera Block Diagram
8
Components of Embedded Systems
Controllers Memory Interface Software (Application Programs) Processor Coprocessors ASIC Converters Analog Digital Analog
9
Fundamentals of Computer Design
Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion
10
Computer Architecture
Definition Instruction set architecture (ISA) Programmer (user) View Implementation Organization: CPU, memory, buses, I/O Hardware: logic design, packaging technology Computer design must meet Functional requirements Area, performance, cost, power goals Optimize, evaluate, and explore to find best possible architecture Consider other factors Time-to-market, technology trend, safety, reliability, …
11
Instruction-Set Architecture (ISA)
An instruction set architecture is a specification of a standardized programmer-visible interface to hardware, comprised of: A set of instructions (instruction types and operations) With associated argument fields, assembly syntax, binary encoding. A set of named storage locations and addressing Registers, memory, … programmer-accessible caches? A set of addressing modes (ways to name locations) Types and sizes of operands Control flow instructions Often an I/O interface (usually memory-mapped)
12
Example: MIPS r0 Programmable storage r1 Data types ? 232 x bytes °
Programmable storage 232 x bytes 31 x 32-bit GPRs (R0=0) 32 x 32-bit FP regs (paired DP) HI, LO, PC Data types ? Format ? Addressing Modes? PC lo hi Arithmetic logical ADD, ADDU, SUB, SUBU, AND, OR, XOR, NOR, SLT, SLTU, ADDI, ADDIU, SLTI, SLTIU, ANDI, ORL, XORL, LUI SLL, SRL, SRA, SLLV, SRLV, SRAV Memory Access LB, LBU, LH, LHU, LW, LWL, LWR SB, SH, SW, SWL, SWR Control J, JAL, JR, JALR BEQ, BNE, BLEZ, BGTZ, BLTZ, BGEZ, BLTZAL, BGEZAL
13
MIPS64 Instruction Format
14
Morgan Kaufmann Publishers
19 May, 2018 MIPS Implementation Chapter 4 — The Processor
15
Pipelined Implementation
Morgan Kaufmann Publishers 19 May, 2018 Pipelined Implementation Chapter 4 — The Processor
16
Fundamentals of Computer Design
Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion
17
Technology Trend Component Scaling of performance, wires and power
IC technology: transistor/chip increases 55% per year DRAM: density increases 40-60% per year Magnetic disk: density increases 100% per year Network: Ethernet from 10 100Mb took 10 years; 100Mb 1Gb in 5 years Scaling of performance, wires and power Feature size: 10 micron in 1971; 0.18 in 2001, … Microprocessor organization improvement Wiring delay Power issue: ~100 watts for 2GHz Pentium 4
18
Bandwidth vs. Latency Latency improvement is 6-80X while bandwidth improvement is X.
19
Fundamentals of Computer Design
Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion
21
The University of Adelaide, School of Computer Science
19 May 2018 Power Intel consumed ~ 2 W 3.3 GHz Intel Core i7 consumes 130 W Heat must be dissipated from 1.5 x 1.5 cm chip This is the limit of what can be cooled by air Chapter 2 — Instructions: Language of the Computer
22
Power and Energy P E t In many cases, faster execution means less energy, but the opposite may be true if power has to be increased to allow faster execution.
23
Power and Energy Power is drawn from a voltage source Power: Energy:
Average Power:
24
Dynamic Power Power needed to charge and discharge load capacitances when transistors switch. The capacitor needs to charge for output to be ‘1’ For output to be ‘0’, capacitor needs to discharge This repeats T.fsw times over an interval of T Here, is activity factor and f is clock frequency.
25
Static Power Because leakage current flows even when a
transistor is off, now static power important too Leakage current increases in processors with smaller transistor sizes Increasing the number of transistors increases power even if they are turned off In 2006, goal for leakage is 25% of total power consumption; high performance designs at 40% Very low power systems even gate voltage to inactive modules to control loss due to leakage
26
Reducing Energy Consumption
Pentium Crusoe Running the same multimedia application. [ Infrared Cameras (FLIR) can be used to detect thermal distribution.
27
Dynamic Power Management (DPM)
RUN: operational IDLE: a SW routine may stop the CPU when not in use, while monitoring interrupts SLEEP: Shutdown of on-chip activity 400mW RUN 10µs 90µs 160ms STRONGARM SA1100 10µs IDLE 90µs SLEEP 50mW 160µW
28
Dynamic Voltage Scaling (DVS)
E = P x T P V2 E (energy), P (power), T (time), V (voltage) Example A task is given with workload (W) and deadline (D). Assume that idle energy is negligible. E1 V12.T1 = V2.T E2 V22.T2 = V2/4.2T = E1/2 V V/2 T D T 2T D
29
Multicores – Low Power? Multicore New challenges
One core with frequency 2 GHz Two cores with 1 GHz frequency (each) Same performance Two 1 GHz cores require half power/energy Power freq2 1GHz core needs one-fourth power compared to 2GHz core. New challenges Performance concerns – how to keep them busy? Reliability concerns – MTTF goes worse! and more …
30
Fundamentals of Computer Design
Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion
31
DRAM Pricing © 2003 Elsevier Science (USA). All rights reserved.
32
Processor Pricing (Intel Pentium III)
© 2003 Elsevier Science (USA). All rights reserved.
33
Silicon Wafer This 300 mm wafer contains 280 Intel Core i7 dies,
each 20.7 by 10.5 mm in a 32 nm process.
34
Intel Core i7 Die The dimensions are 18.9 mm by 13.6 mm (257 mm2) in a 45 nm process. (Courtesy Intel.)
35
Floorplan of Intel Core i7
36
Integrated Circuit Cost
The University of Adelaide, School of Computer Science 19 May 2018 Integrated Circuit Cost Integrated circuit Bose-Einstein formula: Defects per unit area = defects per cm2 (2010) N = process-complexity factor = (40 nm, 2010) Chapter 2 — Instructions: Language of the Computer
37
Fundamentals of Computer Design
Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion
38
Define and Quantify Dependability
How to decide when a system is operating properly? Infrastructure providers now offer Service Level Agreements (SLA) to guarantee that their networking or power service would be dependable Systems alternate between 2 states of service with respect to an SLA: State 1: Service accomplishment, where the service is delivered as specified in SLA State 2: Service interruption, where the delivered service is different from the SLA Failure = transition from state 1 to state 2 Restoration = transition from state 2 to state 1
39
Dependability Module reliability = measure of continuous service accomplishment (or time to failure) Two metrics: Mean Time To Failure (MTTF) – measures Reliability Failures In Time (FIT) = 1/MTTF, the rate of failures Traditionally reported as failures per billion hours of operation Mean Time To Repair (MTTR) measures Service Interruption Mean Time Between Failures (MTBF) = MTTF+MTTR Module availability measures service as alternate between the 2 states of accomplishment and interruption (number between 0 and 1, e.g. 0.9) Module availability = MTTF / ( MTTF + MTTR)
40
Example If modules have exponentially distributed lifetimes (age of module does not affect probability of failure), overall failure rate is the sum of failure rates of the modules Calculate FIT and MTTF for 10 disks (1M hour MTTF per disk), 1 disk controller (0.5M hour MTTF), and 1 power supply (0.2M hour MTTF): ( )
41
Fundamentals of Computer Design
Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion
42
Performance Measurement
Performance metrics execution time Increasing performance decreases execution time Other metrics Wall-clock time, response time, elapsed time CPU time: user or system We will focus on CPU performance, i.e., user CPU time on unloaded system
43
Choosing Programs to Evaluate Performance
Real applications For example: gcc compiler, Microsoft Word Modified (or scripted) applications For example: remove I/O, script to simulate interactive behavior. Kernels For example: Livermore loops, Linpack Toy benchmarks For example: sieve of eratosthenes, quicksort Synthetic benchmarks For example: wheatstone, dhrystone Lower Accuracy
44
Benchmark Suites Desktop Server Embedded Processor New SPEC CPU2006
SPEC CPU2000: 11 integer, 14 floating-point SPECviewperf, SPECapc: graphics benchmarks Server SPEC CPU2000: running multiple copies SPECSFS: for NFS performance SPECWeb: Web server benchmark TPC-x: measure transaction-processing, queries, and decision making database applications Embedded Processor EEMBC: EDN Embedded Microprocessor Benchmark Consortium
45
SPEC CPU Benchmarks
46
Reporting Performance
Performance should be reproducible Description of the machine and compiler flags Report for both baseline and optimized version Source code modifications Not allowed in SPEC benchmarks Allowed but difficult or impossible TPC-C using Oracle or SQL database Allowed in supercomputer benchmarks Modify or re-write algorithms Hand-coding in assembly for EEMBC benchmark
47
Comparing Performance
Arithmetic Mean: What is the mixture of programs in the workload? Arithmetic Mean:
48
Comparing Performance
Weighted Arithmetic Mean: What if programs are fixed and inputs are not?
49
Comparing Performance
Geometric Mean: Execution time ratio is normalized to a base machine. Reference machine is not important. The arithmetic means are different depending on which machine is used as basis, but geometric means are same. Geometric mean does not predict execution time
50
Normalized Execution Times (SPECRatio)
Geometric mean does not predict execution time Performance of machines A and B are same only if program P1 is executed 100 times for every occurrence of program P2 Rewards easy enhancements Improving program P3 (2 to 1) is same as improving program P4 (1000 to 500).
51
Fundamentals of Computer Design
Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion
52
Amdahl’s Law Make the common case fast
Performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. Where: f is a fraction of the execution time that can be enhanced n is the enhancement factor Example: f = 0.1, n = 10 Speedup = 1.1
53
Application of Amdahl’s Law
Amdahl’s law is useful for comparing overall performance of two design alternatives. Example: Floating-point (FP) operations consume 50% of the execution time of a graphics application. FP square root (FPSQRT) is used 20% of the time. Improve FPSQRT operation execution by 10 times Speedup = 1 / ((1-0.2) + 0.2/10) = 1.22 Improve all FP operations by 1.6 times Speedup = 1 / ((1-0.5) + 0.5/1.6) = 1.23 Due to higher frequency of FP operations, the performance gain is more (case 2) compared to drastic improvement of FPSQRT (case 1).
54
Measuring the Performance
Performance Equation CPU time = Instruction Count x Clock cycle time x CPI How to compute these parameters Known for existing processors Clock cycle time Use of counters in new processors CPI, Instruction count Simulation for performance analysis Profile based Trace-driven Execution-driven
55
CPU Performance Equation
The parameters are dependent Instruction Count: ISA and compiler technology CPI: Organization and ISA Cycle Time: Hardware technology and organization Many performance enhancing techniques improves one with small/predictable impacts on the other two.
56
Example Parameters: Compare 2 designs:
Frequency of FP operations (incl. FPSQR) = 25% CPI for FP operations = 4; CPI for others = 1.33 Frequency of FPSQR = 2%; CPI of FPSQR = 20 Compare 2 designs: Decrease CPI of FPSQR to 2 CPI of all FP to 2.5
57
Fundamentals of Computer Design
Introduction Classes of Computers Defining Computer Architecture Trends in Technology Trends in Power and Energy in Integrated Circuits Tends in Cost Dependability Measuring and Reporting Performance Quantitative Principles of Computer Design Conclusion
58
Fallacies and Pitfalls
The relative performance of two processors with the same ISA can be judged by clock rate or by the performance of a single benchmark suite. 1.7 GHz Pentium 4 relative to 1.0 GHz Pentium III © 2003 Elsevier Science (USA). All rights reserved.
59
Fallacies and Pitfalls
Benchmarks remain valid indefinitely. One line in matrix300(SPEC89) executes 99% of the time Peak performance tracks observed performance. The best design is the one that optimizes the primary objective without considering design costs. Synthetic benchmarks predict performance for real programs. Compiler/hardware optimizations can inflate performance MIPS is an accurate measure for comparing performance among computers Consider using FP hardware instead of FP routines.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.