Measurement & Evaluation

Slides:

Advertisements

Similar presentations

Computer Abstractions and Technology

Advertisements

1 CIS775: Computer Architecture Chapter 1: Fundamentals of Computer Design.

1 ECE 570– Advanced Computer Architecture Dr. Patrick Chiang Winter 2013 Tues/Thurs 2-4PMPM.

2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.

Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.

CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.

ENGS 116 Lecture 11 ENGS 116 / COSC 107 Computer Architecture Introduction Vincent H. Berk September 21, 2005 Reading for Friday: Chapter 1.1 – 1.4, Amdahl.

CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.

Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001.

1 Computer Performance: Metrics, Measurement, & Evaluation.

Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.

September 9, Digital System Architecture Cost, Price, and Price for Performance Pradondet Nilagupta Spring 2001 (original notes from Randy Katz,

Lecture 2: Computer Performance

Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu

The University of Adelaide, School of Computer Science

MS108 Computer System I Lecture 2 Metrics Prof. Xiaoyao Liang 2014/2/28 1.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.

PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.

CS510 Computer Architectures

Digital System Architecture 1 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta.

CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.

CS252/Patterson Lec 1.1 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS252 lecture.

Cost and Performance.

December 4, Digital System Architecture Cost, Price, and Price for Performance Pradondet Nilagupta Spring 2001 (original notes from Randy Katz,

Morgan Kaufmann Publishers

Performance Performance

Lec2.1 Computer Architecture Chapter 2 The Role of Performance.

CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.

Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,

CS203 – Advanced Computer Architecture

VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.

June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.

SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.

Computer Architecture & Operations I

CpE 442 Introduction to Computer Architecture The Role of Performance

CS203 – Advanced Computer Architecture

What is *Computer Architecture*

CS203 – Advanced Computer Architecture

September 2 Performance Read 3.1 through 3.4 for Tuesday

ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance

Performance Performance The CPU Performance Equation:

How do we evaluate computer architectures?

Graduate Computer Architecture Lecture 1 Review of Technology Trends and Cost/Performance Ayman Alharbi.

Lecture 4: Performance (conclusion) & Instruction Set Architecture

Lecture 2: Intro to Computer Architecture

Morgan Kaufmann Publishers

Architecture & Organization 1

CS775: Computer Architecture

The University of Adelaide, School of Computer Science

Advanced Computer Architecture 5MD00 / 5Z032 Introduction

Computer Performance He said, to speed things up we need to squeeze the clock.

Computer Architecture

Lecture 2: Performance Today’s topics: Technology wrap-up

Architecture & Organization 1

BIC 10503: COMPUTER ARCHITECTURE

Computer Architecture

Chapter 1 Fundamentals of Computer Design

Performance of computer systems

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

August 30, 2000 Prof. John Kubiatowicz

Performance of computer systems

A Question to Ponder On [from last lecture]

CS 704 Advanced Computer Architecture

Computer Performance Read Chapter 4

The University of Adelaide, School of Computer Science

Utsunomiya University

Presentation transcript:

Measurement & Evaluation Course Focus Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of programmable processors in 21st Century Programming Applications Technology Languages Computer Architecture: • Instruction Set Design • Organization • Hardware Interface Design (ISA) Operating Measurement & Evaluation History Systems

Massively Parallel Processors 1988 Computer Food Chain Mainframe Work- station PC Mini- computer Mini- supercomputer Supercomputer Massively Parallel Processors

Massively Parallel Processors Mini- supercomputer Mini- computer Massively Parallel Processors 1997 Computer Food Chain Mainframe Work- station PC PDA Server Supercomputer

Why Such Change? Performance Price: Lower costs due to … Function Technology Advances CMOS VLSI dominates older technologies (TTL, ECL) in cost AND performance and is progressing rapidly Computer architecture advances improves low-end RISC, superscalar, RAID, … Price: Lower costs due to … Simpler development CMOS VLSI: smaller systems, fewer components Higher volumes CMOS VLSI : same device cost 10,000 vs. 10,000,000 units Lower margins by class of computer, due to fewer services Function Rise of networking/local interconnection technology

Technology Trends: Microprocessor Capacity Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law CMOS improvements: Die size: 2X every 3 yrs Line width: halve / 7 yrs ISSCC 2000: 25M+ transistor processors (Intel)

Memory Capacity (Single Chip DRAM) year size(Mb) cyc time 1980 0.0625 250 ns 1983 0.25 220 ns 1986 1 190 ns 1989 4 165 ns 1992 16 145 ns 1996 64 120 ns 2000 256 100 ns

Technology Trends (Summary) Capacity Speed (latency) Logic 2x in 3 years 2x in 3 years DRAM 4x in 3 years 2x in 10 years Disk 4x in 3 years 2x in 10 years

Processor frequency trend Frequency doubles each generation Number of gates/clock reduce by 25%

Processor Performance Trends 1000 Supercomputers 100 Mainframes 10 Minicomputers 1 Microprocessors 0.1 1965 1970 1975 1980 1985 1990 1995 2000 Year

Processor Performance (1.35X before, 1.55X now) 1.54X/yr

Performance Trends (Summary) Workstation performance (measured in Spec Marks) improves roughly 50% per year (2X every 18 months) Improvement in cost performance estimated at 70% per year

A glimpse into the future Silicon in 2010 Die Area: 2.5x2.5 cm Voltage: 0.6 - 0.9 V Technology: 0.07 m 15 times denser than today 2.5 times power density 5 times clock rate

Source: Richard Newton What is the next wave? Source: Richard Newton

The Embedded Processor What? A programmable processor whose programming interface is not accessible to the end-user of the product. The only user-interaction is through the actual application. Examples: - Sharp PDA’s are encapsulated products with fixed functionality - 3COM Palm pilots were originally intended as embedded systems. Opening up the programmers interface turned them into more generic computer systems.

Some interesting numbers The Intel 4004 was intended for an embedded application (a calculator) Of todays microprocessors 95% go into embedded applications SSH3/4 (Hitachi): best selling RISC microprocessor 50% of microprocessor revenue stems from embedded systems Often focused on particular application area Microcontrollers DSPs Media Processors Graphics Processors Network and Communication Processors

Some different evaluation metrics Components of Cost Area of die / yield Code density (memory is the major part of die size) Packaging Design effort Programming cost Time-to-market Reusability Power Cost Flexibility Performance as a Functionality Constraint (“Just-in-Time Computing”)

The Secret of Architecture Design: Measurement and Evaluation Architecture Design is an iterative process: Searching the space of possible designs At all levels of computer systems Creativity Cost / Performance Analysis Good Ideas Mediocre Ideas Bad Ideas

Computer Architecture Topics Input/Output and Storage Disks, WORM, Tape RAID Emerging Technologies Interleaving Bus protocols DRAM Coherence, Bandwidth, Latency Memory Hierarchy L2 Cache L1 Cache Addressing, Protection, Exception Handling VLSI Instruction Set Architecture Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation, Vector, VLIW, DSP, Reconfiguration Pipelining and Instruction Level Parallelism

Computer Architecture Topics Shared Memory, Message Passing, Data Parallelism P M P M P M P M ° ° ° Network Interfaces S Interconnection Network Processor-Memory-Switch Topologies, Routing, Bandwidth, Latency, Reliability Multiprocessors Networks and Interconnections

Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Benchmarks Implement Next Generation System Implementation Complexity Analysis Imple- mentation How hard to build Importance of simplicity (wearing a seat belt); avoiding a personal disaster Theory vs. practice Technology Trends Simulate New Designs and Organizations Workloads Design

Measurement Tools Hardware: Cost, delay, area, power estimation Benchmarks, Traces, Mixes Simulation (many levels) ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental “Laws”/Principles

Review: Performance, Cost, Power

Metric 1: Performance Time to run the task In passenger-mile/hour Plane DC to Paris 6.5 hours 3 hours Speed 610 mph 1350 mph Passengers 470 132 Throughput 286,700 178,200 Boeing 747 Fastest for 1 person? Which takes less time to transport 470 passengers? Concorde Time to run the task Execution time, response time, latency Tasks per day, hour, week, sec, ns … Throughput, bandwidth

The Performance Metric "X is n times faster than Y" means ExTime(Y) Performance(X) --------- = --------------- ExTime(X) Performance(Y) Speed of Concorde vs. Boeing 747 Throughput of Boeing 747 vs. Concorde 1350 / 610 = 2.2X 286,700/ 178,200 1.6X

Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = ------------- = ------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected

Amdahl’s Law ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced

Amdahl’s Law Law of diminishing return: Focus on the common case! Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedupoverall = 1 0.95 1.053 ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold Law of diminishing return: Focus on the common case!

Metrics of Performance Application Answers per month Operations per second Programming Language Compiler (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s ISA Datapath Megabytes per second Control Function Units Cycles per second (clock rate) Transistors Wires Pins

Aspects of CPU Performance CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPI Clock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X

Cycles Per Instruction “Average Cycles per Instruction” CPI = Cycles / Instruction Count = (CPU Time * Clock Rate) / Instruction Count n CPU time = CycleTime *  CPI * I i i i = 1 “Instruction Frequency” n CPI =  CPI * F where F = I i i i i i = 1 Instruction Count Invest Resources where time is Spent!

Example: Calculating CPI Base Machine (Reg / Reg) Op Freq CPIi CPIi*Fi (% Time) ALU 50% 1 .5 (33%) Load 20% 2 .4 (27%) Store 10% 2 .2 (13%) Branch 20% 2 .4 (27%) 1.5 Typical Mix

Creating Benchmark Sets Real programs Kernels Toy benchmarks Synthetic benchmarks e.g. Whetstones and Dhrystones

SPEC: System Performance Evaluation Cooperative First Round 1989 10 programs yielding a single number (“SPECmarks”) Second Round 1992 SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)= memcpy(b,a,c)” wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200 nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas Third Round 1995 new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) “benchmarks useful for 3 years” Single flag setting for all programs: SPECint_base95, SPECfp_base95

How to Summarize Performance Arithmetic mean (weighted arithmetic mean) tracks execution time: (Ti)/n or (Wi*Ti) Harmonic mean (weighted harmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/ (1/Ri) or n/(Wi/Ri) Normalized execution time is handy for scaling performance (e.g., X times faster than SPARCstation 10) Arithmetic mean impacted by choice of reference machine Use the geometric mean for comparison: (Ti)^1/n Independent of chosen machine but not good metric for total execution time

SPEC First Round One program: 99% of time in single line of code New front-end compiler could improve dramatically IBM Powerstation 550 for 2 different compilers

Ratio to VAX: Time: Weighted Time: Impact of Means on SPECmark89 for IBM 550 (without and with special compiler option) Ratio to VAX: Time: Weighted Time: Program Before After Before After Before After gcc 30 29 49 51 8.91 9.22 espresso 35 34 65 67 7.64 7.86 spice 47 47 510 510 5.69 5.69 doduc 46 49 41 38 5.81 5.45 nasa7 78 144 258 140 3.43 1.86 li 34 34 183 183 7.86 7.86 eqntott 40 40 28 28 6.68 6.68 matrix300 78 730 58 6 3.43 0.37 fpppp 90 87 34 35 2.97 3.07 tomcatv 33 138 20 19 2.01 1.94 Mean 54 72 124 108 54.42 49.99 Geometric Arithmetic Weighted Arith. Ratio 1.33 Ratio 1.16 Ratio 1.09

Performance Evaluation “For better or worse, benchmarks shape a field” Good products created when have: Good benchmarks Good ways to summarize performance Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! Execution time is the measure of computer performance!

Integrated Circuits Costs Die Cost goes roughly with die area4

Real World Examples Chip Metal Line Wafer Defect Area Dies/ Yield Die Cost layers width cost /cm2 mm2 wafer 386DX 2 0.90 $900 1.0 43 360 71% $4 486DX2 3 0.80 $1200 1.0 81 181 54% $12 PowerPC 601 4 0.80 $1700 1.3 121 115 28% $53 HP PA 7100 3 0.80 $1300 1.0 196 66 27% $73 DEC Alpha 3 0.70 $1500 1.2 234 53 19% $149 SuperSPARC 3 0.70 $1700 1.6 256 48 13% $272 Pentium 3 0.80 $1500 1.5 296 40 9% $417 From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15

Cost/Performance What is Relationship of Cost to Price? Recurring Costs Component Costs Direct Costs (add 25% to 40%) recurring costs: labor, purchasing, scrap, warranty Non-Recurring Costs or Gross Margin (add 82% to 186%) (R&D, equipment maintenance, rental, marketing, sales, financing cost, pretax profits, taxes Average Discount to get List Price (add 33% to 66%): volume discounts and/or retailer markup List Price Average Discount 25% to 40% Avg. Selling Price Gross Margin 34% to 39% 6% to 8% Direct Cost Component Cost 15% to 33%

Chip Prices (August 1993) Assume purchase 10,000 units Chip Area Mfg. Price Multi- Comment mm2 cost plier 386DX 43 $9 $31 3.4 Intense Competition 486DX2 81 $35 $245 7.0 No Competition PowerPC 601 121 $77 $280 3.6 DEC Alpha 234 $202 $1231 6.1 Recoup R&D? Pentium 296 $473 $965 2.0 Early in shipments

Summary: Price vs. Cost

Power/Energy Lead processor power increases every generation Source: Intel Lead processor power increases every generation Compactions provide higher performance at lower power

Energy/Power Power dissipation: rate at which energy is taken from the supply (power source) and transformed into heat P = E/t Energy dissipation for a given instruction depends upon type of instruction (and state of the processor) P = (1/CPU Time) *  E * I i = 1 n i

The University of Adelaide, School of Computer Science 8 November 2018 Transistors and Wires Trends in Technology Feature size Minimum size of transistor or wire in x or y dimension 10 microns in 1971 to .032 microns in 2011 Transistor performance scales linearly Wire delay does not improve with feature size! Integration density scales quadratically Linear performance and quadratic density growth present a challenge and opportunity, creating the need for computer architect! Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 8 November 2018 Power and Energy Problem: Get power in, get power out Thermal Design Power (TDP) Characterizes sustained power consumption Used as target for power supply and cooling system Lower than peak power, higher than average power consumption Clock rate can be reduced dynamically to limit power consumption Energy per task is often a better measurement Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Dynamic Energy and Power The University of Adelaide, School of Computer Science 8 November 2018 Dynamic Energy and Power Trends in Power and Energy Dynamic energy Transistor switch from 0 -> 1 or 1 -> 0 ½ x Capacitive load x Voltage2 Dynamic power ½ x Capacitive load x Voltage2 x Frequency switched Reducing clock rate reduces power, not energy Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 8 November 2018 Power Intel 80386 consumed ~ 2 W 3.3 GHz Intel Core i7 consumes 130 W Heat must be dissipated from 1.5 x 1.5 cm chip This is the limit of what can be cooled by air Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 8 November 2018 Reducing Power Trends in Power and Energy Techniques for reducing power: Do nothing well Dynamic Voltage-Frequency Scaling Low power state for DRAM, disks Overclocking, turning off cores Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 8 November 2018 Static Power Trends in Power and Energy Static power consumption Currentstatic x Voltage Scales with number of transistors To reduce: power gating Race-to-halt The new primary evaluation for design innovation Tasks per joule Performance per watt Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 8 November 2018 Trends in Cost Trends in Cost Cost driven down by learning curve Yield DRAM: price closely tracks cost Microprocessors: price depends on volume 10% less for each doubling of volume Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Integrated Circuit Cost The University of Adelaide, School of Computer Science 8 November 2018 Integrated Circuit Cost Trends in Cost Integrated circuit Bose-Einstein formula: Defects per unit area = 0.016-0.057 defects per square cm (2010) N = process-complexity factor = 11.5-15.5 (40 nm, 2010) The manufacturing process dictates the wafer cost, wafer yield and defects per unit area The architect’s design affects the die area, which in turn affects the defects and cost per die Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science Dependability The University of Adelaide, School of Computer Science 8 November 2018 Dependability Systems alternate between two states of service with respect to SLA/SLO: Service accomplishment, where service is delivered as specified by SLA Service interruption, where the delivered service is different from the SLA Module reliability: “failure(F)=transition from 1 to 2” and “repair(R)=transition from 2 to 1” Mean time to failure (MTTF) Mean time to repair (MTTR) Mean time between failures (MTBF) = MTTF + MTTR Availability = MTTF / MTBF Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Summary, #1 Designing to Last through Trends Time to run the task Capacity Speed Logic 2x in 3 years 2x in 3 years SPEC RATING: 2x in 1.5 years DRAM 4x in 3 years 2x in 10 years Disk 4x in 3 years 2x in 10 years 6yrs to graduate => 16X CPU speed, DRAM/Disk size Time to run the task Execution time, response time, latency Tasks per day, hour, week, sec, ns, … Throughput, bandwidth “X is n times faster than Y” means ExTime(Y) Performance(X) --------- = -------------- ExTime(X) Performance(Y)

Summary, #2 Amdahl’s Law: CPI Law: Execution time is the REAL measure of computer performance! Good products created when have: Good benchmarks, good ways to summarize performance Different set of metrics apply to embedded systems Speedupoverall = ExTimeold ExTimenew = 1 (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle