Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Compsci 220 / ECE 252 Fall 2004 Slides based on those of: Sorin,

Slides:



Advertisements
Similar presentations
Slide 1Michael Flynn EE382 Winter/99 EE382 Processor Design Stanford University Winter Quarter Instructor: Michael Flynn Teaching Assistant:
Advertisements

Computer Abstractions and Technology
Slide 1 Fundamentals of Computer Design CSCE430/830 Computer Architecture Instructor: Hong Jiang Courtesy of Prof. Yifeng U. of Maine Fall, 2007.
1 CIS775: Computer Architecture Chapter 1: Fundamentals of Computer Design.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
ECE 232 L4 perform.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 4 Performance,
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
ECE 232 L1 Intro.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 1 Introduction.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
CENG311 Computer Architecture Kayhan Erciyes. CS231 Assembly language and Digital Circuits Instructor:Kayhan Erciyes Office:
Digital Systems Design L01 Introduction.1 Digital Systems Design Lecture 01: Introduction Adapted from: Mary Jane Irwin ( )
Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
ECE 4436ECE 5367 Introduction to Computer Architecture and Design Ji Chen Section : T TH 1:00PM – 2:30PM Prerequisites: ECE 4436.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
CPE232 Introduction1 CPE 335 Computer Organization Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin (
Lecture 2: Computer Performance
Cs 152 L1 Intro.1 Patterson Fall 97 ©UCB What is “Computer Architecture” Computer Architecture = Instruction Set Architecture + Machine Organization.
Introduction CSE 410, Spring 2008 Computer Systems
EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
Lecture 1 1 Computer Systems Architecture Lecture 1: What is Computer Architecture?
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
Computer Organization and Design Computer Abstractions and Technology
Digital System Architecture 1 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta.
Computer Engineering Rabie A. Ramadan Lecture 1. 2 Welcome Back.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
Computer Architecture
ECE 252 / CPS 220 Pipelining Professor Alvin R. Lebeck Compsci 220 / ECE 252 Fall 2008.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Computer Architecture CPSC 350
CS252/Patterson Lec 1.1 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS252 lecture.
EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.
Morgan Kaufmann Publishers
Performance Performance
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
Introduction Computer Organization Spring 1436/37H (2015/16G) Dr. Mohammed Sinky Computer Architecture
Compsci Today’s topics l Operating Systems  Brookshear, Chapter 3  Great Ideas, Chapter 10  Slides from Kevin Wayne’s COS 126 course l Performance.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
CS4100: 計算機結構 Course Outline 國立清華大學資訊工程學系 九十九年度第二學期.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
How do we evaluate computer architectures?
CPE 232 Computer Organization Introduction
Morgan Kaufmann Publishers
Computer Architecture CSCE 350
CS775: Computer Architecture
CS/EE 6810: Computer Architecture
CMSC 611: Advanced Computer Architecture
Performance of computer systems
COMS 361 Computer Organization
Overview Prof. Eric Rotenberg
August 30, 2000 Prof. John Kubiatowicz
Performance of computer systems
CMSC 611: Advanced Computer Architecture
Presentation transcript:

Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Compsci 220 / ECE 252 Fall 2004 Slides based on those of: Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz

2 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Administrative Office Hours Office: D308 LSRC Hours: Mon 3:00-4:00, Thurs 1:00-2:00 or by appointment ( ) Phone: Teaching Assistant Shobana Ravi Office: D330 Hours: TBD Phone:

3 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Administrative (Grading) 30% Homeworks –4 to 6 Homeworks –Late < 1 day = 50% –Late > 1 day = zero 45% Examinations (Midterm + Final) 25% Research Project (work in groups of 3 or 2) –No late term projects Academic Misconduct University policy will be followed strictly Zero tolerance for cheating and/or plagiarism This course requires hard work.

4 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Administrative (Continued) Midterm Exam: In class (75 min) Closed book Final Exam: (3 hours) closed book CS Graduate Students---This is a “Quals” Course. –Quals pass based on Midterm and Final exams only

5 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Administrative (Continued) Course Web Page – –Lectures posted there shortly before class (pdf) –Homework posted there –General information about course Course News Group –duke.cs.cps220 –Use it to 1.read announcements/comments on class or homework, 2.ask questions (help), 3.communicate with each other.

6 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 SPIDER: Systems Seminar Systems & Architecture Seminar –Wednesdays 4:00-5:00 in D344 –duke.cs.os-research (spider newsgroup) Presentations on current work –Practice talks for conferences –Discussion on recent papers –Your own research Why you should go? –If you want to work in Systems/Architecture… –Good time to practice public speaking in front of friendly crowd –Learn about current topics

7 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Homework #0 Need Duke CS account? to me 1.Duke ID 2.ACPUB account name Read Chapters 1 & 2

8 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 What is This Course All About? State-of-the-art computer hardware design Topics –Uniprocessor architecture (i.e., microprocessors) –Memory architecture –I/O architecture –Brief look at multithreading and multiprocessors Fundamentals, current systems, and future systems Will read from textbook, classic papers, brand-new papers

9 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Course Goals and Expectations Course Goals –Understand how current processors work –Understand how to evaluate/compare processors –Learn how to use simulator to perform experiments –Learn research skills by performing term project Course expectations: –Will loosely follow text –Major emphasis on cutting-edge issues –Students will read a list of research papers –Term project

10 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 CPS 220 Course Focus Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of computers in 21st Century Technology Programming Languages Operating Systems History Applications Interface Design (ISA) Measurement & Evaluation Parallelism Computer Architecture: Instruction Set Design Organization Hardware Power

11 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Expected Background –Basic architecture (ECE 152 / CPS 104) –Basic OS (ECE 153 / CPS 110) Other useful and related courses: –Digital system design (ECE 251) – VLSI systems (ECE 261) – Multiprocessor architecture (ECE 259 / CPS 221) – Fault tolerant computing (ECE 254 / CPS 225) – Computer networks and systems (CPS 114 & 214) – Programming languages & compilers (CS 106 & 206) – Advanced OS (CPS 210)

12 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Course Components Reading Materials Computer Architecture: A Quantitative Approach by Hennessy and Patterson, 3rd Edition Readings in Computer Architecture by Hill, Jouppi, Sohi Recent research papers (online)

13 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Computer Architecture Is … “…the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation.” - Amdahl, Blaaw, and Brooks, IBM Journal of R&D, April 1964.

14 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Computer Architecture Topics Instruction Set Architecture Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation Addressing, Protection, Exception Handling L1 Cache L2 Cache DRAM Disks, WORM, Tape Coherence, Bandwidth, Latency Emerging Technologies Interleaving Bus protocols RAID VLSI Input/Output and Storage Memory Hierarchy Pipelining and Instruction Level Parallelism

15 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Architecture and Other Disciplines Architecture interacts with many other fields Can’t be studied in a vacuum Application Software Operating Systems, Compilers, Networking Computer Architecture Circuits, Wires, Devices, Network Hardware

16 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Levels of Computer Architecture architecture –functional appearance to immediate user »opcodes, addressing modes, architected registers implementation (microarchitecture) –logical structure that performs the architecture »pipelining, functional units, caches, physical registers realization (circuits) –physical structure that embodies the implementation »gates, cells, transistors, wires

17 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Role of the Computer Microarchitect architect: defines the hardware/software interface microarchitect: defines the hardware implementation –usually the same person decisions based on –applications –performance –cost –reliability –power...

18 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Computer Engineering Methodology Technology Trends

19 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Computer Engineering Methodology Technology Trends Evaluate Existing Systems for Bottlenecks Benchmarks

20 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Computer Engineering Methodology Technology Trends Evaluate Existing Systems for Bottlenecks Benchmarks Simulate New Designs and Organizations Workloads

21 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Technology Trends Evaluate Existing Systems for Bottlenecks Benchmarks Simulate New Designs and Organizations Workloads Computer Engineering Methodology Implement Next Generation System Implementation Complexity

22 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Applications -> Requirements -> Designs scientific: weather prediction, molecular modeling –need: large memory, floating-point arithmetic –examples: CRAY-1, T3E, IBM DeepBlue, BlueGene commercial: inventory, payroll, web serving, e-commerce –need: integer arithmetic, high I/O –examples: Clusters, SUN SPARCcenter, Enterprise desktop: multimedia, games, entertainment –need: high data bandwidth, graphics –examples: Intel Pentium4, IBM Power4, Motorola PPC 620 mobile: laptops –need: low power (battery), good performance –examples: Intel Mobile Pentium III, Transmeta TM5400 embedded: cell phones, automobile engines, door knobs –need: low power (battery + heat), low cost –examples: Compaq/Intel StrongARM, X-Scale, Transmeta TM3200

23 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Why Study Computer Architecture? answer #1: requirements are always changing aren’t computers fast enough already? –are they? –fast enough to do everything we will EVER want? »AI, VR, protein sequencing, ???? is speed the only goal? –power: heat dissipation + battery life –cost –reliability –etc

24 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Why Study Computer Architecture? answer #2: technology playing field is always changing annual technology improvements (approximate) –SRAM (logic): density +25%, speed +20% –DRAM (memory): density + 60%, speed: + 4% –disk (magnetic): density +25%, speed: + 4% –fiber: ?? parameters change and change relative to one another! designs change even if requirements fixed but requirements are not fixed

25 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Examples of Changing Designs example I: caches 1970: 10K transistors, DRAM faster than logic ->  bad idea 1990: 1M transistors, logic faster than DRAM -> good idea will caches ever be a bad idea again? example II: out-of-order execution 1985: 100K transistors + no precise interrupts -> bad idea 1995: 2M transistors + precise interrupts -> good idea 2005: 100M transistors + 10GHz clock -> bad idea? semiconductor technology is an incredible driving force

26 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Moore’s Law “Cramming More Components onto Integrated Circuits” – G.E. Moore, Electronics, 1965 observation: (DRAM) transistor density doubles annually –became known as “Moore’s Law” –wrong—density doubles every 18 months (had only 4 data points) corollaries –cost / transistor halves annually (18 months) –power per transistor decreases with scaling –speed increases with scaling –reliability increases with scaling (depends how small!)

27 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Moore’s Law “performance doubles every 18 months” common interpretation of Moore’s Law, not original intent wrong! “performance” doubles every ~2 years self-fulfilling prophecy (Moore’s Curve) –2X every 2 years = ~3% increase per month –3% per month used to judge performance features –if feature adds 9 months to schedule... –...it should add at least 30% to performance ( = 1.30  30%) –Itanium: under Moore’s Curve in a big way

28 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Technology Trends: Microprocessor Capacity CMOS improvements: Die size: 2X every 3 yrs Line width: halve / 7 yrs “Graduation Window” Pentium Pro: 5.5 million Sparc Ultra: 5.2 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Alpha 21264: 15 million Pentium III: 28 million Pentium 4: 42 million Alpha 21364: 100 million Alpha 21464: 250 million

29 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Processor Performance

30 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Alpha SPECint and SPECfp

31 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Chip Area Reachable in One Clock Cycle Fraction of Chip Reached Nanometers

32 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Power Density Power Density W/cm^2 Microns

33 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Measurement and Evaluation Design Analysis Architecture is an iterative process: Searching the space of possible designs At all levels of computer systems Bad Ideas Good Ideas Creativity Mediocre Ideas Cost / Performance Analysis

34 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Measurement Tools How do I evaluate an idea? Performance, Cost, Die Area, Power Estimation Benchmarks, Traces, Mixes Simulation (many levels) –ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental Laws Question: What is “better” Boeing 747 or Concorde?

35 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 The Bottom Line: Performance (and Cost) Time to run the task (ExTime) –Execution time, response time, latency Tasks per day, hour, week, sec, ns … (Performance) –Throughput, bandwidth Plane Boeing 747 BAD/Sud Concorde Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers Throughput (pmph) 286, ,200

36 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 The Bottom Line: Performance (and Cost) "X is n times faster than Y" means ExTime(Y) Performance(X) = ExTime(X) Performance(Y) Speed (latency) of Concorde vs. Boeing 747 Throughput of Boeing 747 vs. Concorde

37 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Performance Terminology “X is n% faster than Y” means: ExTime(Y) Performance(X) n = = ExTime(X)Performance(Y) 100 n = 100(Performance(X) - Performance(Y)) Performance(Y) Example: Y takes 15 seconds to complete a task, X takes 10 seconds. What % faster is X?

38 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Example = = Performance (X) Performance (Y) ExTime(Y) ExTime(X) = n= 100 ( ) 1.0 n=50%

39 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = = ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then: ExTime(E) = Speedup(E) =

40 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Amdahl’s Law ExTime new = ExTime old x (1 - Fraction enhanced ) + Fraction enhanced Speedup overall = ExTime old ExTime new Speedup enhanced = 1 (1 - Fraction enhanced ) + Fraction enhanced Speedup enhanced

41 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Amdahl’s Law Floating point instructions improved to run 2X; but only 10% of actual instruction execution time is FP Speedup overall = ExTime new =

42 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Amdahl’s Law Floating point instructions improved to run 2X; but only 10% of actual instruction execution time is FP Speedup overall = =1.053 ExTime new = ExTime old x ( /2) = 0.95 x ExTime old

43 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Corollary: Make The Common Case Fast All instructions require an instruction fetch, only a fraction require a data fetch/store. –Optimize instruction access over data access Programs exhibit locality Spatial Locality Temporal Locality Access to small memories is faster –Provide a storage hierarchy such that the most frequent accesses are to the smallest (closest) memories. Reg's Cache Memory Disk / Tape

44 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Occam's Toothbrush The simple case is usually the most frequent and the easiest to optimize! Do simple, fast things in hardware and be sure the rest can be handled correctly in software

45 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Metrics of Performance Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s Cycles per second (clock rate) Megabytes per second Answers per month Operations per second

46 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Aspects of CPU Performance CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Instr. Cnt CPI Clock Rate Program Compiler Instr. Set Organization Technology

47 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Aspects of CPU Performance CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPIClock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X

48 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Marketing Metrics Machines with different instruction sets ? Programs with different instruction mixes ? – Dynamic frequency of instructions Uncorrelated with performance Machine dependent Often not where time is spent Normalized: add,sub,compare,mult 1 divide, sqrt 4 exp, sin,... 8 Normalized: add,sub,compare,mult 1 divide, sqrt 4 exp, sin,... 8

49 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Cycles Per Instruction Invest Resources where time is Spent! “Average Cycles Per Instruction” “Instruction Frequency”

50 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Organizational Trade-offs Instruction Mix Cycle Time CPI Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units

51 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Example: Calculating CPI Typical Mix Base Machine (Reg / Reg) OpFreqCyclesCPI i (% Time) ALU50%1.5(33%) Load20%2.4(27%) Store10%2.2(13%) Branch20%2.4(27%) 1.5

52 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Base Machine (Reg / Reg) OpFreqCycles ALU50%1 Load20%2 Store10%2 Branch20%2 Example Add register / memory operations to traditional RISC: – One source operand in memory – One source operand in register – Cycle count of 2 Branch cycle count to increase to 3. What fraction of the loads must be eliminated for this to pay off?

53 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Next Time Benchmarks Performance Metrics Cost Instruction Set Architectures TODO Read Chapters 1 & 2 me if you need CS account HW #1 will be up Wednesday Due Sep 6