© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Lecture 1 An Overview of High-Performance Computer Architecture ECE 463/521 Spring 2006.

Slides:



Advertisements
Similar presentations
Lecture 4: CPU Performance
Advertisements

Morgan Kaufmann Publishers The Processor
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
CMPT 334 Computer Organization
Instruction-Level Parallelism (ILP)
1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.
Chapter 12 Pipelining Strategies Performance Hazards.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.
Appendix A Pipelining: Basic and Intermediate Concepts
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Pipelining Enhancing Performance. Datapath as Designed in Ch. 5 Consider execution of: lw $t1,100($t0) lw $t2,200($t0) lw $t3,300($t0) Datapath segments.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Pipelining Example Laundry Example: Three Stages
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
10/11: Lecture Topics Execution cycle Introduction to pipelining
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.
CS203 – Advanced Computer Architecture Pipelining Review.
PipeliningPipelining Computer Architecture (Fall 2006)
Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.
Chapter Six.
Lecture 18: Pipelining I.
Computer Organization CS224
Morgan Kaufmann Publishers
Performance of Single-cycle Design
ELEN 468 Advanced Logic Design
Single Clock Datapath With Control
Pipeline Implementation (4.6)
CDA 3101 Spring 2016 Introduction to Computer Organization
Pipelining: Advanced ILP
Morgan Kaufmann Publishers The Processor
Pipelining review.
The processor: Pipelining and Branching
Pipelining in more detail
Chapter Six.
Chapter Six.
Control unit extension for data hazards
Lecture: Pipelining Extensions
Lecture 20: OOO, Memory Hierarchy
November 5 No exam results today. 9 Classes to go!
CS203 – Advanced Computer Architecture
Overview Prof. Eric Rotenberg
Pipelining.
Control unit extension for data hazards
Control unit extension for data hazards
Lecture 1 An Overview of High-Performance Computer Architecture
Guest Lecturer: Justin Hsia
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
Presentation transcript:

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Lecture 1 An Overview of High-Performance Computer Architecture ECE 463/521 Spring 2006 Edward F. Gehringer

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Automobile Factory (note: non-animated version) Station 1Station 2Station 3Station 4Station 5 Build frame Connect doors Connect headlights Embed engine Connect wheels & transmission

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Automobile Factory (note: non-animated version) Station 1Station 2Station 3Station 4Station 5 Build frame Connect doors Connect headlights Embed engine Connect wheels & transmission

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Automobile Factory (note: non-animated version) Station 1Station 2Station 3Station 4Station 5 Build frame Connect doors Connect headlights Embed engine Connect wheels & transmission

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Automobile Factory (note: non-animated version) Station 1Station 2Station 3Station 4Station 5 Build frame Connect doors Connect headlights Embed engine Connect wheels & transmission

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Automobile Factory (note: non-animated version) Station 1Station 2Station 3Station 4Station 5 Build frame Connect doors Connect headlights Embed engine Connect wheels & transmission

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Automobile Factory (note: non-animated version) Station 1Station 2Station 3Station 4Station 5 Build frame Connect doors Connect headlights Embed engine Connect wheels & transmission

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Basic Assembly Line Unchangeable truth –It takes a long time to build one car –Example: Time spent in assembly line is 1 hour (12 min. per station) Basic assembly line –Throughput = 1 car per hour –We wait until first car is fully assembled before starting the next one: –  only 1 car in assembly line at a time –  only 1 station is active at a time; other 4 are idle

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Automobile Factory (note: non-animated version) Station 1Station 2Station 3Station 4Station 5 Build frame Connect doors Connect headlights Embed engine Connect wheels & transmission

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Automobile Factory (note: non-animated version) Station 1Station 2Station 3Station 4Station 5 Build frame Connect doors Connect headlights Embed engine Connect wheels & transmission

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Automobile Factory (note: non-animated version) Station 1Station 2Station 3Station 4Station 5 Build frame Connect doors Connect headlights Embed engine Connect wheels & transmission

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Automobile Factory (note: non-animated version) Station 1Station 2Station 3Station 4Station 5 Build frame Connect doors Connect headlights Embed engine Connect wheels & transmission

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Automobile Factory (note: non-animated version) Station 1Station 2Station 3Station 4Station 5 Build frame Connect doors Connect headlights Embed engine Connect wheels & transmission

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Automobile Factory (note: non-animated version) Station 1Station 2Station 3Station 4Station 5 Build frame Connect doors Connect headlights Embed engine Connect wheels & transmission

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Pipelined Assembly Line Unchangeable truth –It still takes a long time to build one car Pipelining –Time to fill pipeline = 1 hour –Once filled, throughput = 1 car per 12 minutes –Speedup due to pipelining is (unusual definition)...

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Simple Processor Pipeline Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Example Instruction ADD r1, r2, r3 –r1  r2 + r3 –IF: Fetch the ADD instruction from memory using the current PC (program counter), then PC  PC + 1 –ID: Decode the ADD instruction to determine the opcode, read values of r2 and r3 from the register file –EX: Perform r2 + r3 in the ALU (arithmetic/logic unit) –MEM: Do nothing (only loads/stores access memory) –WB: Write result of r2 + r3 into r1, in the register file

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Pipeline Performance Problems (1) Data dependences –ADD r1, r2, r3 –SUB r4, r1, r9 –SUB must wait (“stall”) in ID stage until ADD completes ADD writes the result r1 into register file in WB SUB reads the result r1 from register file in ID

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Data Dependence Stalls ADD Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Data Dependence Stalls SUBADD Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Data Dependence Stalls SUBADD Register file r1 (stalled) Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) r1

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Data Dependence Stalls Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) Register file r1 SUBADD (stalled) (bubble)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Data Dependence Stalls Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) Register file r1 SUBADD (stalled) (bubble)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Data Dependence Stalls Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) SUB (bubble)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Speedup with data dependences What is the speedup of this pipeline (T sequential /T pipelined ) if 1/10th of all instructions contain a data dependence? Can you give a general formula for a k-stage pipeline? What other information do you need to know?

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Reducing Data Dependence Stalls We could directly forward results from producer to consumer, bypassing the register file. –The hardware is called “data bypass,” “result bypass,” or “register file bypass.” –The technique is called “bypassing” or “forwarding.”

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Data Bypass ADD Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Data Bypass SUB Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) ADD

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Data Bypass SUBADD Register file r1 (garbage) Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Data Bypass r1 (correct) data bypass SUBADD Register file r1 (garbage) Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Pipeline Performance Problems (2) Branches ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11 –Which instruction should be fetched after the branch? –IF stage stalls until BEQ reaches EX stage. –EX stage evaluates branch condition (r5 = = r7). “taken”

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Branch Stalls ADD Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Branch Stalls BEQ Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) ADD ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Branch Stalls BEQ Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) ADD (bubble) ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Branch Stalls Branch outcome: taken BEQ Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) ADD (bubble) ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Branch Stalls BEQ Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) ADD (bubble) AND ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Reducing Branch Stalls Branch prediction –“Learn” which way a given branch tends to go. –Like predicting the economy, branch prediction is based on past history. –Even simple predictors can be 80% accurate. –If correct: no branch stalls. –In incorrect: “Quash” instructions in previous pipeline stages. Performance degrades to the stall case. May have additional penalties to “clean up” the pipeline.

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Branch Prediction (correct) ADD Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 BEQ Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) ADD ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11 Branch Prediction (correct)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Predict taken AND Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) BEQADD ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11 Branch Prediction (correct)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Branch Prediction (incorrect) ADD Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 BEQ Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) ADD ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11 Branch Prediction (incorrect)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Predict not taken SUB Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) Write result WB (write- back) BEQADD ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11 Branch Prediction (incorrect)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 ADD Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) LD Write result WB (write- back) SUB Branch outcome: taken BEQ ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11 Branch Prediction (incorrect)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 ADD Stage 1Stage 2Stage 3Stage 4Stage 5 Fetch instruction IF (instruction fetch) Decode & read registers ID (instruction decode) Execute EX (execute) Access memory MEM (memory) LD Write result WB (write- back) SUBBEQAND ADD r1, r2, r3 BEQ X, r5, r7 SUB r4, r1, r9 LD r4, 10(r4) …… X: AND r4, r10, r11 Branch Prediction (incorrect)

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Speedup with branch stalls What is the speedup of the pipeline if 1/5 of the instructions are branches, and 4/5 of those are correctly predicted? Can you give a general formula for a k-stage pipeline? What other information do you need to know?

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Sears Tower Repairman Repair shop is in the basement –Has many tools. –A few are used frequently, e.g., hammer, crescent wrench, screwdriver –Most are used infrequently, e.g., socket wrenches

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Sears Tower Repairman Problem –Sears Tower has 110 stories! –Today, you are working on the top floor. –Can’t bring entire shop with you. –Don’t know exactly which tools to bring with you from the basement.

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Sears Tower Repairman Solution –Carry frequently used tools in your tool belt. –Tool-belt becomes a “cache” of tools — drastically reduces the number of trips down to the basement. –When you have to fetch ¼" socket wrench, common sense says to also fetch ½", ¾", etc., just in case.

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Caches The processor-memory speed gap –Processor is very fast Intel Pentium-4: 1 GHz, 1 clock cycle = 1 ns –Large memory is slow! Main memory: 50 ns to access, 50 times slower than Pentium-4! –Processor wants large and fast memory. LARGE: O/S and applications consume lots of memory FAST: Otherwise, processor stalls nearly 100% of time waiting for memory.

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Caches 256MB Main Memory 50-ns read time Processor 1-ns clock 64KB cache memory 2-ns read time “Hits” 95% of time Tool-belt Basement shop

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Average access time What is the average access time in this memory system?

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Caches Caches are effective because of locality of reference. –Temporal locality: If you access an item, you are likely to access it again in near future. Tool-belt contains frequently used tools. –Spatial locality: If you access an item, you are likely to access a nearby item in the near future. This is why repairman also fetched ½" and ¾" socket wrenches when (s)he only needed ¼".

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Overview of Topics in 463/521 1.Measuring performance and cost 2.Caches and memory hierarchies 3.Instruction-set architecture (ISA) –Defines software/hardware interface 4.Simple pipelining –Data and control (branch) dependences –Data bypasses –Branch prediction

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Overview of Topics in 463/521 5.Complex pipelining and instruction-level parallelism (ILP). –Data hazards –Dynamic instruction scheduling, register renaming, Tomasulo’s algorithm. –Precise interrupts –Superscalar, VLIW, and vector processors.

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Projects Three projects –Cache simulator –Branch predictor simulator –Dynamic instruction scheduling pipeline simulator Programming for projects is harder than anything most of you have encountered before.

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Course Web site These three homepages are linked to a “common” Web site. Any info specific to one course will be listed in the announcements section of that course’s home page.