CpE 442 Introduction to Computer Architecture The Role of Performance

Slides:



Advertisements
Similar presentations
CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
Advertisements

Performance Evaluation of Architectures Vittorio Zaccaria.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
CpE442 Intro. To Computer Architecture CpE 442 Introduction To Computer Architecture Lecture 1 Instructor: H. H. Ammar These slides are based on the lecture.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
ECE 232 L2 Basics.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 2 Computer.
Computer Architecture Lecture 2 Instruction Set Principles.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Chapter 4 Assessing and Understanding Performance
CS / Schlesinger Lec1.1 1/20/99©UCB Spring 1999 Computer Architecture Lecture 1 Introduction and Five Components of a Computer Spring, 1999 Arie Schlesinger.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.
1 Measuring Performance Chris Clack B261 Systems Architecture.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
CpE442 Intro. To Computer Architecture CpE 442 Introduction To Computer Architecture Lecture 1 Instructor: H. H. Ammar These slides are based on the lecture.
ECE 4436ECE 5367 Introduction to Computer Architecture and Design Ji Chen Section : T TH 1:00PM – 2:30PM Prerequisites: ECE 4436.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
CENG 450 Computer Systems & Architecture Lecture 3 Amirali Baniasadi
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
Computer Architecture
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Computer Architecture CPSC 350
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
CpE 442 Introduction To Computer Architecture Lecture 1
Computer Organization
Lecture 2: Performance Evaluation
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Performance Performance The CPU Performance Equation:
How do we evaluate computer architectures?
Defining Performance Which airplane has the best performance?
Computer Architecture & Operations I
CSCE 212 Chapter 4: Assessing and Understanding Performance
Computer Architecture CSCE 350
CS2100 Computer Organisation
Computer Performance He said, to speed things up we need to squeeze the clock.
August 30, 2000 Prof. John Kubiatowicz
Performances of Computer Systems
January 25 Did you get mail from Chun-Fa about assignment grades?
Computer Performance Read Chapter 4
CS2100 Computer Organisation
Presentation transcript:

CpE 442 Introduction to Computer Architecture The Role of Performance Instructor: H. H. Ammar

Overview of Today’s Lecture: The Role of Performance Review from Last Lecture Definition and Measures of Performance Summarizing Performance and Performance Pitfalls

Review: What is "Computer Architecture" ° Co-ordination of levels of abstraction Application Operating Compiler System Instruction Set Architecture Instr. Set Proc. I/O system Digital Design Circuit Design ° Under a set of rapidly changing Forces

Review: Levels of Representation temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; High Level Language Program Compiler lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) Assembly Language Program Assembler 0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 Machine Language Program Machine Interpretation Control Signal Specification

Review: Levels of Organization SPARCstation 20 Computer SPARC Processor Memory Devices Control Input That is, any computer, no matter how primitive or advance, can be divided into five parts: 1. The input devices bring the data from the outside world into the computer. 2. These data are kept in the computer’s memory until ... 3. The datapath request and process them. 4. The operation of the datapath is controlled by the computer’s controller. All the work done by the computer will NOT do us any good unless we can get the data back to the outside world. 5. Getting the data back to the outside world is the job of the output devices. The most COMMON way to connect these 5 components together is to use a network of busses. Datapath Output

Review: Summary from Last Lecture All computers consist of five components Processor: (1) datapath and (2) control (3) Memory (4) Input devices and (5) Output devices Not all “memory” are created equally Cache: fast (expensive) memory are placed closer to the processor Main memory: less expensive memory--we can have more Input and output (I/O) devices has the messiest organization Wide range of speed: graphics vs. keyboard Wide range of requirements: speed, standard, cost ... etc. Least amount of research (so far) Let me summarize what I have said so far. The most important thing I want you to remember is that: all computers, no matter how complicated or expensive, can be divided into five components: (1) The datapath and (2) control that make up the processor. (3) The memory system that supplies data to the processor. And last but not least, the (4) input and (5) output devices that get data in and out of the computer. One thing about memory is that Not all “memory” are created equally. Some memory are faster but more expensive and we place them closer to the processor and call them “cache.” The main memory can be slower than the cache so we usually use less expensive parts so we can have more of them. Finally as you can see from the last few slides, the input and output devices usually has the messiest organization. There are several reasons for it: (1) First of all, I/O devices can have a wide range of speed. (2) Then I/O devices also have a wide range of requirements. (s) Finally to make matters worse, historically I/O has attracted the least amount of research interest. But hopefully this is changing. In this class, you will learn about all these five components and we will try to make this as enjoyable as possible. So have fun.

Processor Performance

Metrics of performance Answers per month Operations per second Application Programming Language Compiler (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s ISA Datapath Megabytes per second Control Function Units Cycles per second (clock rate) Transistors Wires Pins

Relating Processor Metrics CPU execution time = CPU clock cycles/pgm X clock cycle time or CPU execution time = CPU clock cycles/pgm ÷ clock rate CPU clock cycles/pgm = Instructions/pgm X CPI the avg. clock cycles per instruction or CPI = CPU clock cycles/pgm ÷ Instructions/pgm CPI tells us something about the Instruction Set Architecture, the Implementation of that architecture, and the program measured

Aspects of CPU Performance CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle instr. count CPI clock rate Program Compiler Instr. Set Arch. Organization Technology

Aspects of CPU Performance CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle instr count CPI clock rate Program X (x) Compiler X (x) Instr. Set. X X Organization X X Technology X

Organizational Trade-offs Application Programming Language Compiler ISA Instruction Mix Datapath CPI Control Function Units Transistors Wires Pins Cycle Time

CPI “Average cycles per instruction” "instruction frequency" CPI = (CPU Time * Clock Rate) / Instruction Count = Clock Cycles / Instruction Count n CPU time = ClockCycleTime * S CPI * I i i i = 1 n "instruction frequency" CPI = S CPI * F where F = I i i i i i = 1 Instruction Count Invest Resources where time is Spent!

Base Machine (Reg / Reg) Op Freq(Fi) CPI(i) % Time ALU 50% 1 .5 33% Example Base Machine (Reg / Reg) Op Freq(Fi) CPI(i) % Time ALU 50% 1 .5 33% Load 20% 2 .4 27% Store 10% 2 .2 13% Branch 20% 2 .4 27% 1.5 Typical Mix The CPI = 1.5 cycles per instruction Assignment 1: Turn in the solution of the following problems from the text book By Thursday September 4, Chapter 2, Exercises Section, problems number 2.1, 2.2, 2.3, 2.4, 2.10, 2.11, 2.12, 2.13, and 2.15

Assume a program of 1 million instructions, Compare the performance of Base Machine (B) with the above CPI, 1 GHZ clock, and Enhanced Machine (E) with 1.333 GHZ and a one cycle increase for L/S And branch instructions Enhanced Machine (Reg / Reg) Op Freq CPI(i) % Time ALU 50% 1 .5 25% Load 20% 3 .6 30% Store 10% 3 .3 15% Branch20% 3 .6 30% 2.0

Perf. of machine X = 1 / exec. Time of prog on machine X Perf. of E / Perf. of B = exec. Time of B / exec. Time of E = 1.5 * 1 / 2 * 0.75 = 1 Performance of B is similar to that of E, No gain in performance

Marketing Metrics MIPS = Instruction Count / (Time * 10^6) = Clock Rate / (CPI * 10^6) machines with different instruction sets ? programs with different instruction mixes ? dynamic frequency of instructions uncorrelated with performance MFLOP/S= FP Operations / (Time * 10^6) machine dependent often not where time is spent

Example showing why MIPS can fail Compare performance with Compilers 1 and 2 for a given program on a given machine Instruction Count in Billion for instruction classes A B C Compiler 1 5 1 1 Compiler 2 10 1 1 clock cycles 1 2 3 Clock cycles using compiler1 = 10 Billion Clock cycles using compiler2 = 15 Billion assuming 1GHZ clock CPU Time 1 = 10 secs CPU Time 2 = 15 secs yet the MIPS rating is MIPS 1 = (instr. Count/cpu time in sec x 10^6) = 700 MIPS 2 = 800

Why Do Benchmarks? How we evaluate differences Different systems Changes to a single system Provide a target Benchmarks should represent large class of important programs Improving benchmark performance should help many programs For better or worse, benchmarks shape a field Good ones accelerate progress good target for development Bad benchmarks hurt progress help real programs v. sell machines/papers? Inventions that help real programs don’t help benchmark

Programs to Evaluate Processor Performance (Toy) Benchmarks 10-100 line e.g.,: sieve, puzzle, quicksort Synthetic Benchmarks attempt to match average frequencies of real workloads e.g., Whetstone, dhrystone Kernels Time critical excerpts Real programs e.g., gcc, spice

Successful Benchmark: SPEC EE Times + 5 companies band together to perform Systems Performance Evaluation Committee (SPEC) in 1988: Sun, MIPS, HP, Apollo, DEC Create standard list of programs, inputs, reporting: some real programs, includes OS calls, some I/O

SPEC first round First round 1989; 10 programs, single number to summarize performance One program: 99% of time in single line of code New front-end compiler could improve dramatically

SPEC second round, SPEC95 8 integer benchmarks in C and 10 floating pt benchmarks in Fortran

Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = -------------------- = --------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then, ExTime(with E) = ((1-F) + F/S) X ExTime(without E) Speedup(with E) = ExTime(without E) ÷ ((1-F) + F/S) X ExTime(without E) <= 1/(1-F) speed up is bounded by this factor

Performance Evaluation Summary CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Time is the measure of computer performance! Good products created when have: Good benchmarks Good ways to summarize performance If not good benchmarks and summary, then choice between improving product for real programs vs. improving product to get more sales=> sales almost always wins Remember Amdahl’s Law: Speedup is limited by unimproved part of program