A Question to Ponder On [from last lecture]

Slides:

Advertisements

Similar presentations

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.

Advertisements

Performance Evaluation of Architectures Vittorio Zaccaria.

1 CIS775: Computer Architecture Chapter 1: Fundamentals of Computer Design.

TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.

100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.

ECE 4100/6100 Advanced Computer Architecture Lecture 3 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute.

2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary

CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.

ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.

CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.

Computer Performance Evaluation: Cycles Per Instruction (CPI)

1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.

CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.

CS430 – Computer Architecture Lecture - Introduction to Performance

CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.

1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.

CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.

1 Computer Performance: Metrics, Measurement, & Evaluation.

Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.

Lecture 2: Computer Performance

CENG 450 Computer Systems & Architecture Lecture 3 Amirali Baniasadi

Memory/Storage Architecture Lab Computer Architecture Performance.

Performance Chapter 4 P&H. Introduction How does one measure report and summarise performance? Complexity of modern systems make it very more difficult.

PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.

1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.

1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.

Digital System Architecture 1 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta.

1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.

CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.

CS252/Patterson Lec 1.1 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS252 lecture.

EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.

Cost and Performance.

Performance Performance

TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.

Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.

Lec2.1 Computer Architecture Chapter 2 The Role of Performance.

Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.

EGRE 426 Computer Organization and Design Chapter 4.

CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.

Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,

Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.

EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.

June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.

CpE 442 Introduction to Computer Architecture The Role of Performance

Measuring Performance II and Logic Design

Computer Organization

COSC6385 Advanced Computer Architecture

Lecture 2: Performance Evaluation

ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance

Performance Performance The CPU Performance Equation:

How do we evaluate computer architectures?

Graduate Computer Architecture Lecture 1 Review of Technology Trends and Cost/Performance Ayman Alharbi.

Lecture 4: Performance (conclusion) & Instruction Set Architecture

Lecture 2: Intro to Computer Architecture

CSCE 212 Chapter 4: Assessing and Understanding Performance

CS775: Computer Architecture

Measurement & Evaluation

Performance COE 301 Computer Organization

Computer Performance He said, to speed things up we need to squeeze the clock.

Performance of computer systems

Performance of computer systems

Arrays versus Pointers

August 30, 2000 Prof. John Kubiatowicz

Performance of computer systems

CS 704 Advanced Computer Architecture

Computer Performance Read Chapter 4

Computer Organization and Design Chapter 4

Presentation transcript:

A Question to Ponder On [from last lecture] In the CISC vs. RISC debate a key argument of the RISC movement was that because of its simplicity, RISC would always remain ahead. If there were enough transistors to implement a CISC on chip, then those same transistors could implement a pipelined RISC If there was enough to allow for a pipelined CISC there would be enough to have an on-chip cache for RISC. And so on. After 20 years of this debate what do you think? Hint: Think of commercial PC’s, Moore’s law and some of the data in the first chapter of the book (and on these slides)

CIS 775:Lecture 2 Performance and Cost

The Bottom Line: Performance (and Cost) Plane DC to Paris 6.5 hours 3 hours Speed 610 mph 1350 mph Passengers 470 132 Throughput (pmph) 286,700 178,200 Boeing 747 BAD/Sud Concodre Fastest for 1 person? Which takes less time to transport 470 passengers? Time to run the task (ExTime) Execution time, response time, latency Tasks per day, hour, week, sec, ns … (Performance) Throughput, bandwidth

The Bottom Line: Performance (and Cost) "X is n times faster than Y" means ExTime(Y) Performance(X) --------- = --------------- ExTime(X) Performance(Y) Speed of Concorde vs. Boeing 747 Throughput of Boeing 747 vs. Concorde 1350 / 610 = 2.2X 286,700/ 178,200 1.6X

Amdahl’s Law ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced

Amdahl’s Law Floating point instructions improved to run 2X; but only 10% of actual instructions are FP ExTimenew = Speedupoverall =

Aspects of CPU Performance CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPI Clock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X

Cycles Per Instruction “Average Cycles per Instruction” CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count n CPU time = CycleTime * CPI * I i i i = 1 “Instruction Frequency” n CPI = CPI * F where F = I i i i i i = 1 Instruction Count Invest Resources where time is Spent!

Example: Calculating CPI Base Machine (Reg / Reg) Op Freq Cycles CPI(i) (% Time) ALU 50% 1 .5 (33%) Load 20% 2 .4 (27%) Store 10% 2 .2 (13%) Branch 20% 2 .4 (27%) 1.5 Typical Mix

Metrics of Performance Application Answers per month Operations per second Programming Language Compiler (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s ISA Datapath Megabytes per second Control Function Units Cycles per second (clock rate) Transistors Wires Pins

Measurement and Evaluation Architecture is an iterative process: Searching the space of possible designs At all levels of computer systems Creativity Cost / Performance Analysis Good Ideas Mediocre Ideas Bad Ideas

Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Implementation Complexity Benchmarks Technology Trends How hard to build Importance of simplicity (wearing a seat belt); avoiding a personal disaster Theory vs. practice Implement Next Generation System Simulate New Designs and Organizations Workloads

Measurement Tools Benchmarks, Traces, Mixes Hardware: Cost, delay, area, power estimation Simulation (many levels) ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental “Laws”/Principles Understanding the limitations of any measurement tool is crucial.

SPEC First Round One program: 99% of time in single line of code New front-end compiler could improve dramatically

Cases of Benchmark Engineering The motivation is to tune the system to the benchmark to achieve peak performance. At the architecture level Specialized instructions At the compiler level (compiler flags) Blocking in Spec89  factor of 9 speedup Incorrect compiler optimizations/reordering. Would work fine on benchmark but not on other programs I/O level Spec92 spreadsheet program (sp) Companies noticed that the produced output was always out put to a file (so they stored the results in a memory buffer) and then expunged at the end (which was not measured). One company eliminated the I/O all together.

After putting in a blazing performance on the benchmark test, Sun issued a glowing press release claiming that it had outperformed Windows NT systems on the test. Pendragon president Ivan Phillips cried foul, saying the results weren't representative of real-world Java performance and that Sun had gone so far as to duplicate the test's code within Sun's Just-In-Time compiler. That's cheating, says Phillips, who claims that benchmark tests and real-world applications aren't the same thing. Did Sun issue a denial or a mea culpa? Initially, Sun neither denied optimizing for the benchmark test nor apologized for it. "If the test results are not representative of real-world Java applications, then that's a problem with the benchmark," Sun's Brian Croll said. After taking a beating in the press, though, Sun retreated and issued an apology for the optimization.[Excerpted from PC Online 1997]

Issues with Benchmark Engineering Motivated by the bottom dollar, good performance on classic suites  more customers, better sales. Benchmark Engineering  Limits the longevity of benchmark suites Technology and Applications Limits the longevity of benchmark suites.

SPEC: System Performance Evaluation Cooperative First Round 1989 10 programs yielding a single number (“SPECmarks”) Second Round 1992 SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) Compiler Flags unlimited. March 93 new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) “benchmarks useful for 3 years” Single flag setting for all programs: SPECint_base95, SPECfp_base95

Reporting Performance Results Reproducability  Apply them on publicly available benchmarks. Pecking/Picking order Real Programs Real Kernels Toy Benchmarks Synthetic Benchmarks

How to Summarize Performance Arithmetic mean (weighted arithmetic mean) tracks execution time: sum(Ti)/n or sum(Wi*Ti) Harmonic mean (weighted harmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/sum(1/Ri) or n/sum(Wi/Ri) Normalized execution time is handy for scaling performance (e.g., X times faster than SPARCstation 10) But do not take the arithmetic mean of normalized execution time, use the geometric mean = (Product(Ri)^1/n)

Performance Evaluation “For better or worse, benchmarks shape a field” Good products created when have: Good benchmarks Good ways to summarize performance Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! Execution time is the measure of computer performance!

Simulations When are simulations useful? What are its limitations, I.e. what real world phenomenon does it not account for? The larger the simulation trace, the less tractable the post-processing analysis.

Queueing Theory What are the distributions of arrival rates and values for other parameters? Are they realistic? What happens when the parameters or distributions are changed?

Chapter Summary, #1 Designing to Last through Trends Capacity Speed Logic 2x in 3 years 2x in 3 years DRAM 4x in 3 years 2x in 10 years Disk 4x in 3 years 2x in 10 years 6yrs to graduate => 16X CPU speed, DRAM/Disk size Time to run the task Execution time, response time, latency Tasks per day, hour, week, sec, ns, … Throughput, bandwidth “X is n times faster than Y” means ExTime(Y) Performance(X) --------- = -------------- ExTime(X) Performance(Y)

Chapter Summary, #2 Amdahl’s Law: CPI Law: Execution time is the REAL measure of computer performance! Good products created when have: Good benchmarks, good ways to summarize performance Die Cost goes roughly with die area4 Speedupoverall = ExTimeold ExTimenew = 1 (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle

Food for thought Two companies reports results on two benchmarks one on a Fortran benchmark suite and the other on a C++ benchmark suite. Company A’s product outperforms Company B’s on the Fortran suite, the reverse holds true for the C++ suite. Assume the performance differences are similar in both cases. Do you have enough information to compare the two products. What information will you need?

Amdahl’s Law (answer) Floating point instructions improved to run 2X; but only 10% of actual instructions are FP ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold 1 Speedupoverall = = 1.053 0.95