Download presentation
Presentation is loading. Please wait.
1
Measurement & Evaluation
Course Focus Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of programmable processors in 21st Century Programming Applications Technology Languages Computer Architecture: • Instruction Set Design • Organization • Hardware Interface Design (ISA) Operating Measurement & Evaluation History Systems
2
Massively Parallel Processors
1988 Computer Food Chain Mainframe Work- station PC Mini- computer Mini- supercomputer Supercomputer Massively Parallel Processors
3
Massively Parallel Processors
Mini- supercomputer Mini- computer Massively Parallel Processors 1997 Computer Food Chain Mainframe Work- station PC PDA Server Supercomputer
4
Why Such Change? Performance Price: Lower costs due to … Function
Technology Advances CMOS VLSI dominates older technologies (TTL, ECL) in cost AND performance and is progressing rapidly Computer architecture advances improves low-end RISC, superscalar, RAID, … Price: Lower costs due to … Simpler development CMOS VLSI: smaller systems, fewer components Higher volumes CMOS VLSI : same device cost 10,000 vs. 10,000,000 units Lower margins by class of computer, due to fewer services Function Rise of networking/local interconnection technology
5
Technology Trends: Microprocessor Capacity
Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law CMOS improvements: Die size: 2X every 3 yrs Line width: halve / 7 yrs ISSCC 2000: 25M+ transistor processors (Intel)
6
Memory Capacity (Single Chip DRAM)
year size(Mb) cyc time ns ns ns ns ns ns ns
7
Technology Trends (Summary)
Capacity Speed (latency) Logic 2x in 3 years 2x in 3 years DRAM 4x in 3 years 2x in 10 years Disk 4x in 3 years 2x in 10 years
8
Processor frequency trend
Frequency doubles each generation Number of gates/clock reduce by 25%
9
Processor Performance Trends
1000 Supercomputers 100 Mainframes 10 Minicomputers 1 Microprocessors 0.1 1965 1970 1975 1980 1985 1990 1995 2000 Year
10
Processor Performance (1.35X before, 1.55X now)
1.54X/yr
11
Performance Trends (Summary)
Workstation performance (measured in Spec Marks) improves roughly 50% per year (2X every 18 months) Improvement in cost performance estimated at 70% per year
12
A glimpse into the future
Silicon in 2010 Die Area: 2.5x2.5 cm Voltage: V Technology: 0.07 m 15 times denser than today 2.5 times power density 5 times clock rate
13
Source: Richard Newton
What is the next wave? Source: Richard Newton
14
The Embedded Processor
What? A programmable processor whose programming interface is not accessible to the end-user of the product. The only user-interaction is through the actual application. Examples: - Sharp PDA’s are encapsulated products with fixed functionality - 3COM Palm pilots were originally intended as embedded systems. Opening up the programmers interface turned them into more generic computer systems.
15
Some interesting numbers
The Intel 4004 was intended for an embedded application (a calculator) Of todays microprocessors 95% go into embedded applications SSH3/4 (Hitachi): best selling RISC microprocessor 50% of microprocessor revenue stems from embedded systems Often focused on particular application area Microcontrollers DSPs Media Processors Graphics Processors Network and Communication Processors
16
Some different evaluation metrics
Components of Cost Area of die / yield Code density (memory is the major part of die size) Packaging Design effort Programming cost Time-to-market Reusability Power Cost Flexibility Performance as a Functionality Constraint (“Just-in-Time Computing”)
17
The Secret of Architecture Design: Measurement and Evaluation
Architecture Design is an iterative process: Searching the space of possible designs At all levels of computer systems Creativity Cost / Performance Analysis Good Ideas Mediocre Ideas Bad Ideas
18
Computer Architecture Topics
Input/Output and Storage Disks, WORM, Tape RAID Emerging Technologies Interleaving Bus protocols DRAM Coherence, Bandwidth, Latency Memory Hierarchy L2 Cache L1 Cache Addressing, Protection, Exception Handling VLSI Instruction Set Architecture Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation, Vector, VLIW, DSP, Reconfiguration Pipelining and Instruction Level Parallelism
19
Computer Architecture Topics
Shared Memory, Message Passing, Data Parallelism P M P M P M P M ° ° ° Network Interfaces S Interconnection Network Processor-Memory-Switch Topologies, Routing, Bandwidth, Latency, Reliability Multiprocessors Networks and Interconnections
20
Computer Engineering Methodology
Evaluate Existing Systems for Bottlenecks Benchmarks Implement Next Generation System Implementation Complexity Analysis Imple- mentation How hard to build Importance of simplicity (wearing a seat belt); avoiding a personal disaster Theory vs. practice Technology Trends Simulate New Designs and Organizations Workloads Design
21
Measurement Tools Hardware: Cost, delay, area, power estimation
Benchmarks, Traces, Mixes Simulation (many levels) ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental “Laws”/Principles
22
Review: Performance, Cost, Power
23
Metric 1: Performance Time to run the task
In passenger-mile/hour Plane DC to Paris 6.5 hours 3 hours Speed 610 mph 1350 mph Passengers 470 132 Throughput 286,700 178,200 Boeing 747 Fastest for 1 person? Which takes less time to transport 470 passengers? Concorde Time to run the task Execution time, response time, latency Tasks per day, hour, week, sec, ns … Throughput, bandwidth
24
The Performance Metric
"X is n times faster than Y" means ExTime(Y) Performance(X) = ExTime(X) Performance(Y) Speed of Concorde vs. Boeing 747 Throughput of Boeing 747 vs. Concorde 1350 / 610 = 2.2X 286,700/ 178, X
25
Amdahl's Law Speedup due to enhancement E:
ExTime w/o E Performance w/ E Speedup(E) = = ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected
26
Amdahl’s Law ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced
27
Amdahl’s Law Law of diminishing return: Focus on the common case!
Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedupoverall = 1 0.95 1.053 ExTimenew = ExTimeold x ( /2) = 0.95 x ExTimeold Law of diminishing return: Focus on the common case!
28
Metrics of Performance
Application Answers per month Operations per second Programming Language Compiler (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s ISA Datapath Megabytes per second Control Function Units Cycles per second (clock rate) Transistors Wires Pins
29
Aspects of CPU Performance
CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPI Clock Rate Program X Compiler X (X) Inst. Set X X Organization X X Technology X
30
Cycles Per Instruction
“Average Cycles per Instruction” CPI = Cycles / Instruction Count = (CPU Time * Clock Rate) / Instruction Count n CPU time = CycleTime * CPI * I i i i = 1 “Instruction Frequency” n CPI = CPI * F where F = I i i i i i = 1 Instruction Count Invest Resources where time is Spent!
31
Example: Calculating CPI
Base Machine (Reg / Reg) Op Freq CPIi CPIi*Fi (% Time) ALU 50% (33%) Load 20% (27%) Store 10% (13%) Branch 20% (27%) 1.5 Typical Mix
32
Creating Benchmark Sets
Real programs Kernels Toy benchmarks Synthetic benchmarks e.g. Whetstones and Dhrystones
33
SPEC: System Performance Evaluation Cooperative
First Round 1989 10 programs yielding a single number (“SPECmarks”) Second Round 1992 SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)= memcpy(b,a,c)” wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200 nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas Third Round 1995 new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) “benchmarks useful for 3 years” Single flag setting for all programs: SPECint_base95, SPECfp_base95
34
How to Summarize Performance
Arithmetic mean (weighted arithmetic mean) tracks execution time: (Ti)/n or (Wi*Ti) Harmonic mean (weighted harmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/ (1/Ri) or n/(Wi/Ri) Normalized execution time is handy for scaling performance (e.g., X times faster than SPARCstation 10) Arithmetic mean impacted by choice of reference machine Use the geometric mean for comparison: (Ti)^1/n Independent of chosen machine but not good metric for total execution time
35
SPEC First Round One program: 99% of time in single line of code
New front-end compiler could improve dramatically IBM Powerstation 550 for 2 different compilers
36
Ratio to VAX: Time: Weighted Time:
Impact of Means on SPECmark89 for IBM 550 (without and with special compiler option) Ratio to VAX: Time: Weighted Time: Program Before After Before After Before After gcc espresso spice doduc nasa li eqntott matrix fpppp tomcatv Mean Geometric Arithmetic Weighted Arith. Ratio 1.33 Ratio 1.16 Ratio 1.09
37
Performance Evaluation
“For better or worse, benchmarks shape a field” Good products created when have: Good benchmarks Good ways to summarize performance Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! Execution time is the measure of computer performance!
38
Integrated Circuits Costs
Die Cost goes roughly with die area4
39
Real World Examples Chip Metal Line Wafer Defect Area Dies/ Yield Die Cost layers width cost /cm2 mm2 wafer 386DX $ % $4 486DX $ % $12 PowerPC $ % $53 HP PA $ % $73 DEC Alpha $ % $149 SuperSPARC $ % $272 Pentium $ % $417 From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15
40
Cost/Performance What is Relationship of Cost to Price?
Recurring Costs Component Costs Direct Costs (add 25% to 40%) recurring costs: labor, purchasing, scrap, warranty Non-Recurring Costs or Gross Margin (add 82% to 186%) (R&D, equipment maintenance, rental, marketing, sales, financing cost, pretax profits, taxes Average Discount to get List Price (add 33% to 66%): volume discounts and/or retailer markup List Price Average Discount 25% to 40% Avg. Selling Price Gross Margin 34% to 39% 6% to 8% Direct Cost Component Cost 15% to 33%
41
Chip Prices (August 1993) Assume purchase 10,000 units
Chip Area Mfg. Price Multi- Comment mm2 cost plier 386DX 43 $9 $ Intense Competition 486DX2 81 $35 $ No Competition PowerPC $77 $ DEC Alpha 234 $202 $ Recoup R&D? Pentium 296 $473 $ Early in shipments
42
Summary: Price vs. Cost
43
Power/Energy Lead processor power increases every generation
Source: Intel Lead processor power increases every generation Compactions provide higher performance at lower power
44
Energy/Power Power dissipation: rate at which energy is taken from the supply (power source) and transformed into heat P = E/t Energy dissipation for a given instruction depends upon type of instruction (and state of the processor) P = (1/CPU Time) * E * I i = 1 n i
45
The University of Adelaide, School of Computer Science
8 November 2018 Transistors and Wires Trends in Technology Feature size Minimum size of transistor or wire in x or y dimension 10 microns in 1971 to .032 microns in 2011 Transistor performance scales linearly Wire delay does not improve with feature size! Integration density scales quadratically Linear performance and quadratic density growth present a challenge and opportunity, creating the need for computer architect! Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer
46
The University of Adelaide, School of Computer Science
8 November 2018 Power and Energy Problem: Get power in, get power out Thermal Design Power (TDP) Characterizes sustained power consumption Used as target for power supply and cooling system Lower than peak power, higher than average power consumption Clock rate can be reduced dynamically to limit power consumption Energy per task is often a better measurement Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer
47
Dynamic Energy and Power
The University of Adelaide, School of Computer Science 8 November 2018 Dynamic Energy and Power Trends in Power and Energy Dynamic energy Transistor switch from 0 -> 1 or 1 -> 0 ½ x Capacitive load x Voltage2 Dynamic power ½ x Capacitive load x Voltage2 x Frequency switched Reducing clock rate reduces power, not energy Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer
48
The University of Adelaide, School of Computer Science
8 November 2018 Power Intel consumed ~ 2 W 3.3 GHz Intel Core i7 consumes 130 W Heat must be dissipated from 1.5 x 1.5 cm chip This is the limit of what can be cooled by air Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer
49
The University of Adelaide, School of Computer Science
8 November 2018 Reducing Power Trends in Power and Energy Techniques for reducing power: Do nothing well Dynamic Voltage-Frequency Scaling Low power state for DRAM, disks Overclocking, turning off cores Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer
50
The University of Adelaide, School of Computer Science
8 November 2018 Static Power Trends in Power and Energy Static power consumption Currentstatic x Voltage Scales with number of transistors To reduce: power gating Race-to-halt The new primary evaluation for design innovation Tasks per joule Performance per watt Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer
51
The University of Adelaide, School of Computer Science
8 November 2018 Trends in Cost Trends in Cost Cost driven down by learning curve Yield DRAM: price closely tracks cost Microprocessors: price depends on volume 10% less for each doubling of volume Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer
52
Integrated Circuit Cost
The University of Adelaide, School of Computer Science 8 November 2018 Integrated Circuit Cost Trends in Cost Integrated circuit Bose-Einstein formula: Defects per unit area = defects per square cm (2010) N = process-complexity factor = (40 nm, 2010) The manufacturing process dictates the wafer cost, wafer yield and defects per unit area The architect’s design affects the die area, which in turn affects the defects and cost per die Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer
53
The University of Adelaide, School of Computer Science
Dependability The University of Adelaide, School of Computer Science 8 November 2018 Dependability Systems alternate between two states of service with respect to SLA/SLO: Service accomplishment, where service is delivered as specified by SLA Service interruption, where the delivered service is different from the SLA Module reliability: “failure(F)=transition from 1 to 2” and “repair(R)=transition from 2 to 1” Mean time to failure (MTTF) Mean time to repair (MTTR) Mean time between failures (MTBF) = MTTF + MTTR Availability = MTTF / MTBF Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer
54
Summary, #1 Designing to Last through Trends Time to run the task
Capacity Speed Logic 2x in 3 years 2x in 3 years SPEC RATING: 2x in 1.5 years DRAM 4x in 3 years 2x in 10 years Disk 4x in 3 years 2x in 10 years 6yrs to graduate => 16X CPU speed, DRAM/Disk size Time to run the task Execution time, response time, latency Tasks per day, hour, week, sec, ns, … Throughput, bandwidth “X is n times faster than Y” means ExTime(Y) Performance(X) = ExTime(X) Performance(Y)
55
Summary, #2 Amdahl’s Law: CPI Law:
Execution time is the REAL measure of computer performance! Good products created when have: Good benchmarks, good ways to summarize performance Different set of metrics apply to embedded systems Speedupoverall = ExTimeold ExTimenew = 1 (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.