Lecture 2: Intro to Computer Architecture

Slides:



Advertisements
Similar presentations
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Advertisements

100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
Chapter 4 Assessing and Understanding Performance
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 Measuring Performance Chris Clack B261 Systems Architecture.
ECE 4436ECE 5367 Introduction to Computer Architecture and Design Ji Chen Section : T TH 1:00PM – 2:30PM Prerequisites: ECE 4436.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
September 9, Digital System Architecture Cost, Price, and Price for Performance Pradondet Nilagupta Spring 2001 (original notes from Randy Katz,
Lecture 2: Computer Performance
Economics and Sustainability Financial Factors Influencing Success.
Performance Chapter 4 P&H. Introduction How does one measure report and summarise performance? Complexity of modern systems make it very more difficult.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
CMSC 611 Evaluating Cost Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from David Culler, UC Berkeley.
CS510 Computer Architectures
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Integrated Circuits Costs
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
CMSC 611 Evaluating Cost Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from David Culler, UC Berkeley.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CS252/Patterson Lec 1.1 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS252 lecture.
Cost and Performance.
December 4, Digital System Architecture Cost, Price, and Price for Performance Pradondet Nilagupta Spring 2001 (original notes from Randy Katz,
Morgan Kaufmann Publishers
Performance Performance
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
Computer Architecture & Operations I
CpE 442 Introduction to Computer Architecture The Role of Performance
Lecture 2: Performance Today’s topics:
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
September 2 Performance Read 3.1 through 3.4 for Tuesday
Performance Performance The CPU Performance Equation:
How do we evaluate computer architectures?
Graduate Computer Architecture Lecture 1 Review of Technology Trends and Cost/Performance Ayman Alharbi.
Defining Performance Which airplane has the best performance?
Computer Architecture & Operations I
Uniprocessor Performance
Morgan Kaufmann Publishers
CS2100 Computer Organisation
CS775: Computer Architecture
Measurement & Evaluation
Defining Performance Section /14/2018 9:52 PM.
Chapter 1 Computer Abstractions & Technology Performance Evaluation
Computer Performance He said, to speed things up we need to squeeze the clock.
Lecture 2: Performance Today’s topics: Technology wrap-up
Computer Architecture
August 30, 2000 Prof. John Kubiatowicz
A Question to Ponder On [from last lecture]
CS 704 Advanced Computer Architecture
Computer Performance Read Chapter 4
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
CS161 – Design and Architecture of Computer Systems
CS2100 Computer Organisation
Presentation transcript:

Lecture 2: Intro to Computer Architecture Michael B. Greenwald Computer Architecture CIS 501 Fall 1999

General Information Class: TR 1:30-3, in LRSM Auditorium Recitation: T 10:30-12 in Moore 225 Instructor: Professor Michael Greenwald Office: Moore (GRW), room 260 email: cis501@cis.upenn.edu Office hours: R10:30-12noon or by appt. TA: Sotiris Ioannidis Office: Moore, room 102e email: sotiris@dsl.cis.upenn.edu Office hours: TR5-6PM or by appt. Secretary: Christine Metz Office: Moore, room 556

Outline Review Quantitative principles of computer design Amdahl’s law CPU performance equation Quantitative measurements Costs Performance

Typos in HW 3c. New version on web page. D = defects/ Defects per layer

Technology Trends: Microprocessor Capacity “Graduation Window” Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law CMOS improvements: Die size: 2X every 3 yrs Line width: halve / 7 yrs

Trends in application demands Program increase memory demands by factor of 1.5-2 per year (1/2 to 1 bit/year) Avail. disk space (or net bw) is always consumed. User I/O bandwidth grows: tty->crt->bitmap->video->?virtual reality? Processing power: cheapest to produce one version of program. Optimize for mid-range. Slow on low- end, fast on high-end. Are these demands growing because of increased capabilities or increased appetites?

The Quantitative Approach

Measurement and Evaluation Quantitative Approach Architecture is an iterative process: Searching the space of possible designs At all levels of computer systems Cost / Performance Analysis Creativity Good Ideas Mediocre Ideas Bad Ideas

Measurement and Evaluation Quantitative Approach Not a guarantee of good ideas, just a way to discard bad ideas. Cost / Performance Analysis Creativity Good Ideas Mediocre Ideas Bad Ideas

Computer Engineering Methodology Technology Trends

Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Benchmarks Where to start: existing systems bottlenecks Technology Trends

Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Benchmarks Technology Trends Simulate New Designs and Organizations Workloads

Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Implementation Complexity Benchmarks How hard to build Importance of simplicity (wearing a seat belt); avoiding a personal disaster Theory vs. practice Technology Trends Implement Next Generation System Simulate New Designs and Organizations Workloads

Measurement Tools Benchmarks, Traces, Mixes Hardware: Cost, delay, area, power estimation Simulation (many levels) ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental “Laws”/Principles Measure Experiment Analyze Design

All produce “measures”: what do measures mean? How do they compare? Measurement Tools Benchmarks, Traces, Mixes Hardware: Cost, delay, area, power estimation Simulation (many levels) ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental “Laws”/Principles Measure Experiment Analyze Design All produce “measures”: what do measures mean? How do they compare?

The Bottom Line: Performance (and Cost) Plane DC to Paris 6.5 hours 3 hours Speed 610 mph 1350 mph Passengers 470 132 Throughput (pmph) 286,700 178,200 Boeing 747 Fastest for 1 person? Which takes less time to transport 470 passengers? BAD/Sud Concorde Time to run the task (ExTime) Execution time, response time, latency Tasks per day, hour, week, sec, ns … (Performance) Throughput, bandwidth

The Bottom Line: Performance (and Cost) Plane DC to Paris 6.5 hours 3 hours Speed 610 mph 1350 mph Passengers 470 132 Throughput (pmph) 286,700 178,200 Boeing 747 Fastest for 1 person? Which takes less time to transport 470 passengers? BAD/Sud Concorde Which is better?

The Bottom Line: Performance (and Cost) Plane DC to Paris 6.5 hours 3 hours Speed 610 mph 1350 mph Passengers 470 132 Throughput (pmph) 286,700 178,200 Boeing 747 Fastest for 1 person? Which takes less time to transport 470 passengers? BAD/Sud Concorde Which is better? It depends if you are trying to win a race from DC to Paris, or you are trying to move the most people.

The Bottom Line: Performance (and Cost) Plane DC to Paris 6.5 hours 3 hours Speed 610 mph 1350 mph Passengers 470 132 Throughput (pmph) 286,700 178,200 Boeing 747 Fastest for 1 person? Which takes less time to transport 470 passengers? BAD/Sud Concorde Even if trying to move most people, performance is useless without understanding cost. Else, why not just fly two Concordes at once, doubling throughput? 747-400, $160M in ‘98

Costs Performance metrics are mostly useless without understanding costs.

Integrated Circuits Costs IC cost = Die cost + Testing cost + Packaging cost Final test yield Die cost = Wafer cost Dies per Wafer * Die yield Wafer Defect Die Smaller dies are cheaper, and reduce cost per defect.

Integrated Circuits Costs IC cost = Die cost + Testing cost + Packaging cost Final test yield Die cost = Wafer cost Dies per Wafer * Die yield Defect Smaller dies are cheaper, and reduce cost per defect.

Die Cost goes roughly with die area4 IC Cost parameters  Number of masking levels (measure of manufacturing complexity), was typically 3.0, growing wafer yield = wafers that are not completely bad. Typically close to 100% Defects per unit area = 0.6 to 1.2 per cm2. Drops with learning curve. Die Cost goes roughly with die area4

Integrated Circuits Costs IC cost = Die cost + Testing cost + Packaging cost Final test yield Die cost = Wafer cost Dies per Wafer * Die yield Dies per wafer =  * ( Wafer_diam / 2)2 –  * Wafer_diam – Test dies Die Area  2 * Die Area Die Yield = Wafer yield * 1 +  Defects_per_unit_area * Die_Area  { } Die Cost goes roughly with die area4

Integrated Circuits Costs Die cost = Wafer cost Dies per Wafer * Die yield Dies per wafer =  * ( Wafer_diam / 2)2 –  * Wafer_diam – Test dies Die Area  2 * Die Area Die Yield = Wafer yield * 1 + Die Cost = Wafer cost * 1 +  * ( Wafer_diam / 2)2 –  * Wafer_diam  Defects_per_unit_area * Die_Area  { }  Defects_per_unit_area * Die_Area  { } Die Cost goes roughly with die area4

Die Cost goes roughly with die area+1 IC Cost parameters Defects per unit area = 0.6 to 1.2 per cm2 Technologies that can fix defects (e.g. lasers a’la Lincoln Labs (MIT)), reduce effective defects per unit area and increase yield. However, need to understand costs which differ from formula. Still: Die Cost goes roughly with die area+1

Real World Examples(circa ‘93) Chip Metal Line Wafer Defect Area Dies/ Yield Die Cost layers width cost /cm2 mm2 wafer 386DX 2 0.90 $900 1.0 43 360 71% $4 486DX2 3 0.80 $1200 1.0 81 181 54% $12 PowerPC 601 4 0.80 $1700 1.3 121 115 28% $53 HP PA 7100 3 0.80 $1300 1.0 196 66 27% $73 DEC Alpha 3 0.70 $1500 1.2 234 53 19% $149 SuperSPARC 3 0.70 $1700 1.6 256 48 13% $272 Pentium 3 0.80 $1500 1.5 296 40 9% $417 From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15

Other Costs Die Test Cost = Test Jig Cost * Ave. Test Time Die Yield Packaging Cost: depends on pins, heat dissipation Chip Die Package Test & Total cost pins type cost Assembly 386DX $4 132 QFP $1 $4 $9 486DX2 $12 168 PGA $11 $12 $35 PowerPC 601 $53 304 QFP $3 $21 $77 HP PA 7100 $73 504 PGA $35 $16 $124 DEC Alpha $149 431 PGA $30 $23 $202 SuperSPARC $272 293 PGA $20 $34 $326 Pentium $417 273 PGA $19 $37 $473

Cost/Performance What is Relationship of Cost to Price? Component Costs Direct Costs (add 25% to 40%) recurring costs: labor, purchasing, scrap, warranty Gross Margin (add 82% to 186%) nonrecurring costs: R&D, marketing, sales, equipment maintenance, rental, financing cost, pretax profits, taxes Average Discount to get List Price (add 33% to 66%): volume discounts and/or retailer markup List Price Average Discount 25% to 40% Avg. Selling Price Gross Margin 34% to 39% 6% to 8% Direct Cost Component Cost 15% to 33%

Cost/Performance What is Relationship of Cost to Price? Component Costs Direct Costs (add 25% to 40%) recurring costs: labor, purchasing, scrap, warranty Gross Margin (add 82% to 186%) nonrecurring costs: R&D, marketing, sales, equipment maintenance, rental, financing cost, pretax profits, taxes Average Discount to get List Price (add 33% to 66%): volume discounts and/or retailer markup List Price Average Discount Avg. Selling Price Discretion Gross Margin Direct Cost Component Cost

Chip Prices (August 1993) Assume purchase 10,000 units Chip Area Mfg. Price Multi- Comment mm2 cost plier 386DX 43 $9 $31 3.4 Intense Competition 486DX2 81 $35 $245 7.0 No Competition PowerPC 601 121 $77 $280 3.6 DEC Alpha 234 $202 $1231 6.1 Recoup R&D? Pentium 296 $473 $965 2.0 Early in shipments

Summary: Price vs. Cost

Cost/Price/Profit How is R&D funded? R&D 4% to 12%, contributes to gross margin (it is an indirect cost) Two views: Only 4% of income on R&D! Investment: every $1 spent on R&D should lead to $8 to $25 in sales!

PERFORMANCE

The Bottom Line: Performance (and Cost) Plane DC to Paris 6.5 hours 3 hours Speed 610 mph 1350 mph Passengers 470 132 Throughput (pmph) 286,700 178,200 Boeing 747 Fastest for 1 person? Which takes less time to transport 470 passengers? BAD/Sud Concorde Even if trying to move most people, performance is useless without understanding cost. Else, why not just fly two Concordes at once, doubling throughput? 747-400, $160M in ‘98

Performance Terminology Time versus Performance: duration vs. rate. Time: response time = execution time Rate: throughput Reciprocals: there is both a time and a performance measure for any performance metric. “Improve performance”: time decreases, performance increases For computer systems the key performance metric is total execution time

Meaning of “Execution Time” (a.k.a. Response time) Wall-clock-time, response time, elapsed-time: latency (including idle time) vs. CPU Time: non-idle System vs. User time: both elapsed and CPU system performance: elapsed time on unloaded system (includes OS + idle time) CPU performance: user CPU time on unloaded system

Terminology What do we mean when we compare two measures and say that “X is n times faster than Y”?

The Bottom Line: Performance (and Cost) "X is n times faster than Y" means ExTime(Y) Performance(X) --------- = --------------- = n ExTime(X) Performance(Y) Speed of Boeing 747 vs. Concorde Throughput of Boeing 747 vs. Concorde 1350 / 610 = 2.2X 286,700/ 178,200 1.6X

The Bottom Line: Performance (and Cost) "X is n times faster than Y" means 286,700 Performance(X) ----------------------- = 1.60 178,200 Performance(Y) Speed of Boeing 747 vs. Concorde Throughput of Boeing 747 vs. Concorde 1350 / 610 = 2.2X 286,700/ 178,200 1.6X

The Bottom Line: Performance (and Cost) "X is n times faster than Y" means 286,700 Performance(X) ----------------------- = 1.60 178,200 Performance(Y) Speed of Boeing 747 vs. Concorde Throughput of Boeing 747 vs. Concorde 1350 / 610 = 2.2X 286,700/ 178,200 1.6X Note: Natural or meaningful units. Hours per passenger-mile is slightly weirder than passenger-miles per hour.

Measurement Tools Benchmarks, Traces, Mixes Hardware: Cost, delay, area, power estimation Simulation (many levels) ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental “Laws”/Principles Measure Experiment Analyze Design ENGINEERING:Convert this to that

Fundamental Principle of Computer Design Make the common case fast In every trade-off, favor the frequent case over the infrequent case. But how do we quantify this? At what point is the cost to the infrequent case sufficiently large as to offset speedups to the frequent case?

Fundamental Principle of Computer Design Make the common case fast In every trade-off, favor the frequent case over the infrequent case. But how do we quantify this? At what point is the cost to the infrequent case sufficiently large as to offset speedups to the frequent case? Amdahl’s Law quantifies this principle

Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = ------------- = ------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected

Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = ------------- = ------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected

Amdahl’s Law ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced

Amdahl’s Law: Example Floating point instructions improved to run 2X; but only 10% of actual instructions are FP ExTimenew = Speedupoverall =

Amdahl’s Law: Example Floating point instructions improved to run 2X; but only 10% of actual instructions are FP ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold 1 Speedupoverall = = 1.053 0.95

Amdahl’s Law: Example Suppose fetching a page from a web cache is 1000 times faster than getting the page over the net, but hit rate on cache is only 30% ExTimenew = Speedupoverall =

Amdahl’s Law: Example Suppose fetching a page from a web cache is 1000 times faster than getting the page over the net, but hit rate on cache is only 30% ExTimeWCache = ExTimeold x (.7 + .3/1000) = ExTimeold x .7003 1 .7003 Speedupoverall = = 1.428

Amdahl’s Law: Example Just because something seems quantifiable, doesn’t mean it is meaningful. Quality of class = .5 student effort + .25 quality of instructor + .25 value of material MetricWProf+ = Metricold x (.75 + .25/106)  Metricold x .75 1 .75 Speedup = = 1.333 So even if I were a million times better as a professor, the class would only be 1.333 times as good.