Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Compsci 220 / ECE 252 Fall 2004 Slides based on those of: Sorin,

Similar presentations


Presentation on theme: "Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Compsci 220 / ECE 252 Fall 2004 Slides based on those of: Sorin,"— Presentation transcript:

1 Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Compsci 220 / ECE 252 Fall 2004 Slides based on those of: Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz

2 2 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Administrative Office Hours Office: D308 LSRC Hours: Mon 3:00-4:00, Thurs 1:00-2:00 or by appointment (email) email: alvy@cs.duke.edualvy@cs.duke.edu Phone: 660-6551 Teaching Assistant Shobana Ravi Office: D330 Hours: TBD email: shobana@cs.duke.edu Phone: 660-6589

3 3 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Administrative (Grading) 30% Homeworks –4 to 6 Homeworks –Late < 1 day = 50% –Late > 1 day = zero 45% Examinations (Midterm + Final) 25% Research Project (work in groups of 3 or 2) –No late term projects Academic Misconduct University policy will be followed strictly Zero tolerance for cheating and/or plagiarism This course requires hard work.

4 4 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Administrative (Continued) Midterm Exam: In class (75 min) Closed book Final Exam: (3 hours) closed book CS Graduate Students---This is a “Quals” Course. –Quals pass based on Midterm and Final exams only

5 5 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Administrative (Continued) Course Web Page –http://www.cs.duke.edu/courses/cps220/fall04http://www.cs.duke.edu/courses/cps220/fall04 –Lectures posted there shortly before class (pdf) –Homework posted there –General information about course Course News Group –duke.cs.cps220 –Use it to 1.read announcements/comments on class or homework, 2.ask questions (help), 3.communicate with each other.

6 6 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 SPIDER: Systems Seminar Systems & Architecture Seminar –Wednesdays 4:00-5:00 in D344 –duke.cs.os-research (spider newsgroup) Presentations on current work –Practice talks for conferences –Discussion on recent papers –Your own research Why you should go? –If you want to work in Systems/Architecture… –Good time to practice public speaking in front of friendly crowd –Learn about current topics

7 7 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Homework #0 Need Duke CS account? Email to me (alvy@cs.duke.edu) youralvy@cs.duke.edu 1.Duke ID 2.ACPUB account name Read Chapters 1 & 2

8 8 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 What is This Course All About? State-of-the-art computer hardware design Topics –Uniprocessor architecture (i.e., microprocessors) –Memory architecture –I/O architecture –Brief look at multithreading and multiprocessors Fundamentals, current systems, and future systems Will read from textbook, classic papers, brand-new papers

9 9 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Course Goals and Expectations Course Goals –Understand how current processors work –Understand how to evaluate/compare processors –Learn how to use simulator to perform experiments –Learn research skills by performing term project Course expectations: –Will loosely follow text –Major emphasis on cutting-edge issues –Students will read a list of research papers –Term project

10 10 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 CPS 220 Course Focus Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of computers in 21st Century Technology Programming Languages Operating Systems History Applications Interface Design (ISA) Measurement & Evaluation Parallelism Computer Architecture: Instruction Set Design Organization Hardware Power

11 11 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Expected Background –Basic architecture (ECE 152 / CPS 104) –Basic OS (ECE 153 / CPS 110) Other useful and related courses: –Digital system design (ECE 251) – VLSI systems (ECE 261) – Multiprocessor architecture (ECE 259 / CPS 221) – Fault tolerant computing (ECE 254 / CPS 225) – Computer networks and systems (CPS 114 & 214) – Programming languages & compilers (CS 106 & 206) – Advanced OS (CPS 210)

12 12 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Course Components Reading Materials Computer Architecture: A Quantitative Approach by Hennessy and Patterson, 3rd Edition Readings in Computer Architecture by Hill, Jouppi, Sohi Recent research papers (online)

13 13 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Computer Architecture Is … “…the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation.” - Amdahl, Blaaw, and Brooks, IBM Journal of R&D, April 1964.

14 14 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Computer Architecture Topics Instruction Set Architecture Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation Addressing, Protection, Exception Handling L1 Cache L2 Cache DRAM Disks, WORM, Tape Coherence, Bandwidth, Latency Emerging Technologies Interleaving Bus protocols RAID VLSI Input/Output and Storage Memory Hierarchy Pipelining and Instruction Level Parallelism

15 15 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Architecture and Other Disciplines Architecture interacts with many other fields Can’t be studied in a vacuum Application Software Operating Systems, Compilers, Networking Computer Architecture Circuits, Wires, Devices, Network Hardware

16 16 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Levels of Computer Architecture architecture –functional appearance to immediate user »opcodes, addressing modes, architected registers implementation (microarchitecture) –logical structure that performs the architecture »pipelining, functional units, caches, physical registers realization (circuits) –physical structure that embodies the implementation »gates, cells, transistors, wires

17 17 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Role of the Computer Microarchitect architect: defines the hardware/software interface microarchitect: defines the hardware implementation –usually the same person decisions based on –applications –performance –cost –reliability –power...

18 18 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Computer Engineering Methodology Technology Trends

19 19 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Computer Engineering Methodology Technology Trends Evaluate Existing Systems for Bottlenecks Benchmarks

20 20 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Computer Engineering Methodology Technology Trends Evaluate Existing Systems for Bottlenecks Benchmarks Simulate New Designs and Organizations Workloads

21 21 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Technology Trends Evaluate Existing Systems for Bottlenecks Benchmarks Simulate New Designs and Organizations Workloads Computer Engineering Methodology Implement Next Generation System Implementation Complexity

22 22 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Applications -> Requirements -> Designs scientific: weather prediction, molecular modeling –need: large memory, floating-point arithmetic –examples: CRAY-1, T3E, IBM DeepBlue, BlueGene commercial: inventory, payroll, web serving, e-commerce –need: integer arithmetic, high I/O –examples: Clusters, SUN SPARCcenter, Enterprise desktop: multimedia, games, entertainment –need: high data bandwidth, graphics –examples: Intel Pentium4, IBM Power4, Motorola PPC 620 mobile: laptops –need: low power (battery), good performance –examples: Intel Mobile Pentium III, Transmeta TM5400 embedded: cell phones, automobile engines, door knobs –need: low power (battery + heat), low cost –examples: Compaq/Intel StrongARM, X-Scale, Transmeta TM3200

23 23 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Why Study Computer Architecture? answer #1: requirements are always changing aren’t computers fast enough already? –are they? –fast enough to do everything we will EVER want? »AI, VR, protein sequencing, ???? is speed the only goal? –power: heat dissipation + battery life –cost –reliability –etc

24 24 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Why Study Computer Architecture? answer #2: technology playing field is always changing annual technology improvements (approximate) –SRAM (logic): density +25%, speed +20% –DRAM (memory): density + 60%, speed: + 4% –disk (magnetic): density +25%, speed: + 4% –fiber: ?? parameters change and change relative to one another! designs change even if requirements fixed but requirements are not fixed

25 25 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Examples of Changing Designs example I: caches 1970: 10K transistors, DRAM faster than logic ->  bad idea 1990: 1M transistors, logic faster than DRAM -> good idea will caches ever be a bad idea again? example II: out-of-order execution 1985: 100K transistors + no precise interrupts -> bad idea 1995: 2M transistors + precise interrupts -> good idea 2005: 100M transistors + 10GHz clock -> bad idea? semiconductor technology is an incredible driving force

26 26 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Moore’s Law “Cramming More Components onto Integrated Circuits” – G.E. Moore, Electronics, 1965 observation: (DRAM) transistor density doubles annually –became known as “Moore’s Law” –wrong—density doubles every 18 months (had only 4 data points) corollaries –cost / transistor halves annually (18 months) –power per transistor decreases with scaling –speed increases with scaling –reliability increases with scaling (depends how small!)

27 27 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Moore’s Law “performance doubles every 18 months” common interpretation of Moore’s Law, not original intent wrong! “performance” doubles every ~2 years self-fulfilling prophecy (Moore’s Curve) –2X every 2 years = ~3% increase per month –3% per month used to judge performance features –if feature adds 9 months to schedule... –...it should add at least 30% to performance (1.03 9 = 1.30  30%) –Itanium: under Moore’s Curve in a big way

28 28 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Technology Trends: Microprocessor Capacity CMOS improvements: Die size: 2X every 3 yrs Line width: halve / 7 yrs “Graduation Window” Pentium Pro: 5.5 million Sparc Ultra: 5.2 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Alpha 21264: 15 million Pentium III: 28 million Pentium 4: 42 million Alpha 21364: 100 million Alpha 21464: 250 million

29 29 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Processor Performance

30 30 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Alpha SPECint and SPECfp

31 31 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Chip Area Reachable in One Clock Cycle Fraction of Chip Reached Nanometers

32 32 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Power Density Power Density W/cm^2 Microns

33 33 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Measurement and Evaluation Design Analysis Architecture is an iterative process: Searching the space of possible designs At all levels of computer systems Bad Ideas Good Ideas Creativity Mediocre Ideas Cost / Performance Analysis

34 34 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Measurement Tools How do I evaluate an idea? Performance, Cost, Die Area, Power Estimation Benchmarks, Traces, Mixes Simulation (many levels) –ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental Laws Question: What is “better” Boeing 747 or Concorde?

35 35 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 The Bottom Line: Performance (and Cost) Time to run the task (ExTime) –Execution time, response time, latency Tasks per day, hour, week, sec, ns … (Performance) –Throughput, bandwidth Plane Boeing 747 BAD/Sud Concorde Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers 470 132 Throughput (pmph) 286,700 178,200

36 36 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 The Bottom Line: Performance (and Cost) "X is n times faster than Y" means ExTime(Y) Performance(X) --------- = --------------- ExTime(X) Performance(Y) Speed (latency) of Concorde vs. Boeing 747 Throughput of Boeing 747 vs. Concorde

37 37 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Performance Terminology “X is n% faster than Y” means: ExTime(Y) Performance(X) n --------- =-------------- = 1 + ----- ExTime(X)Performance(Y) 100 n = 100(Performance(X) - Performance(Y)) Performance(Y) Example: Y takes 15 seconds to complete a task, X takes 10 seconds. What % faster is X?

38 38 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Example 15 10 = 1.5 1.0 = Performance (X) Performance (Y) ExTime(Y) ExTime(X) = n= 100 (1.5 - 1.0) 1.0 n=50%

39 39 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = ------------- = ------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then: ExTime(E) = Speedup(E) =

40 40 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Amdahl’s Law ExTime new = ExTime old x (1 - Fraction enhanced ) + Fraction enhanced Speedup overall = ExTime old ExTime new Speedup enhanced = 1 (1 - Fraction enhanced ) + Fraction enhanced Speedup enhanced

41 41 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Amdahl’s Law Floating point instructions improved to run 2X; but only 10% of actual instruction execution time is FP Speedup overall = ExTime new =

42 42 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Amdahl’s Law Floating point instructions improved to run 2X; but only 10% of actual instruction execution time is FP Speedup overall = 1 0.95 =1.053 ExTime new = ExTime old x (0.9 +.1/2) = 0.95 x ExTime old

43 43 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Corollary: Make The Common Case Fast All instructions require an instruction fetch, only a fraction require a data fetch/store. –Optimize instruction access over data access Programs exhibit locality Spatial Locality Temporal Locality Access to small memories is faster –Provide a storage hierarchy such that the most frequent accesses are to the smallest (closest) memories. Reg's Cache Memory Disk / Tape

44 44 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Occam's Toothbrush The simple case is usually the most frequent and the easiest to optimize! Do simple, fast things in hardware and be sure the rest can be handled correctly in software

45 45 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Metrics of Performance Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s Cycles per second (clock rate) Megabytes per second Answers per month Operations per second

46 46 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Aspects of CPU Performance CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Instr. Cnt CPI Clock Rate Program Compiler Instr. Set Organization Technology

47 47 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Aspects of CPU Performance CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPIClock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X

48 48 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Marketing Metrics Machines with different instruction sets ? Programs with different instruction mixes ? – Dynamic frequency of instructions Uncorrelated with performance Machine dependent Often not where time is spent Normalized: add,sub,compare,mult 1 divide, sqrt 4 exp, sin,... 8 Normalized: add,sub,compare,mult 1 divide, sqrt 4 exp, sin,... 8

49 49 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Cycles Per Instruction Invest Resources where time is Spent! “Average Cycles Per Instruction” “Instruction Frequency”

50 50 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Organizational Trade-offs Instruction Mix Cycle Time CPI Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units

51 51 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Example: Calculating CPI Typical Mix Base Machine (Reg / Reg) OpFreqCyclesCPI i (% Time) ALU50%1.5(33%) Load20%2.4(27%) Store10%2.2(13%) Branch20%2.4(27%) 1.5

52 52 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Base Machine (Reg / Reg) OpFreqCycles ALU50%1 Load20%2 Store10%2 Branch20%2 Example Add register / memory operations to traditional RISC: – One source operand in memory – One source operand in register – Cycle count of 2 Branch cycle count to increase to 3. What fraction of the loads must be eliminated for this to pay off?

53 53 © 2004 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Next Time Benchmarks Performance Metrics Cost Instruction Set Architectures TODO Read Chapters 1 & 2 Email me if you need CS account HW #1 will be up Wednesday Due Sep 6


Download ppt "Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Compsci 220 / ECE 252 Fall 2004 Slides based on those of: Sorin,"

Similar presentations


Ads by Google