Introduction to Computer Organization and Architecture

Slides:

Advertisements

Similar presentations

Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.

Advertisements

Review: MIPS Pipeline Data and Control Paths

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

Chapter Six Enhancing Performance with Pipelining

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

55:035 Computer Architecture and Organization Lecture 10.

Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.

Pipelined Datapath and Control

CMPE 421 Parallel Computer Architecture

Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.

CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.

CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (

CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

Fall 2015, Oct 5... ELEC / Lecture 6 1 ELEC / Computer Architecture and Design Fall 2015 Pipelining (Chapter 6) Vishwani.

Spr 2016, Mar 9... ELEC / Lecture 7 1 ELEC / Computer Architecture and Design Spring 2016 Pipeline Control and Performance.

CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

Access the Instruction from Memory

Computer Organization

Exceptions Another form of control hazard Could be caused by…

Stalling delays the entire pipeline

Note how everything goes left to right, except …

CDA 3101 Spring 2016 Introduction to Computer Organization

Pipelining Chapter 6.

Computer Architecture

CSCI206 - Computer Organization & Programming

Performance of Single-cycle Design

Single Clock Datapath With Control

Appendix C Pipeline implementation

ECS 154B Computer Architecture II Spring 2009

ECS 154B Computer Architecture II Spring 2009

CDA 3101 Spring 2016 Introduction to Computer Organization

\course\cpeg323-08F\Topic6b-323

ECE232: Hardware Organization and Design

Design of the Control Unit for Single-Cycle Instruction Execution

Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.

Chapter 4 The Processor Part 3

Review: MIPS Pipeline Data and Control Paths

Chapter 6 Enhancing Performance with Pipelining

Chapter 4 The Processor Part 2

SOLUTIONS CHAPTER 4.

Pipelining review.

Single-cycle datapath, slightly rearranged

ELEC / Computer Architecture and Design Spring Pipelining (Chapter 6)

Pipelining Chapter 6.

A pipeline diagram Clock cycle lw $t0, 4($sp) IF ID

Pipelining in more detail

CSCI206 - Computer Organization & Programming

\course\cpeg323-05F\Topic6b-323

Data Hazards Data Hazard

The Processor Lecture 3.6: Control Hazards

The Processor Lecture 3.4: Pipelining Datapath and Control

Vishwani D. Agrawal James J. Danaher Professor

An Introduction to pipelining

The Processor Lecture 3.5: Data Hazards

Instruction Execution Cycle

Pipeline Control unit (highly abstracted)

Pipelining Appendix A and Chapter 3.

Morgan Kaufmann Publishers The Processor

Pipelining - 1.

A relevant question Assuming you’ve got: One washer (takes 30 minutes)

Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.

The Processor: Datapath & Control.

©2003 Craig Zilles (derived from slides by Howard Huang)

Pipelined datapath and control

ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.

Presentation transcript:

Introduction to Computer Organization and Architecture Lecture 12 By Juthawut Chantharamalee http://dusithost.dusit.ac.th/~juthawut_cha/home.htm

Introduction to Computer Organization and Architecture Outline Pipelining Basics Throughput Execution Time Pipeline Registers Program Execution Hazards Structural Data Control Introduction to Computer Organization and Architecture

ILP: Instruction Level Parallelism Single-cycle and multi-cycle datapaths execute one instruction at a time. How can we get better performance? Answer: Execute multiple instruction at a time: Pipelining – Enhance a multi-cycle datapath to fetch one instruction every cycle. Parallelism – Fetch multiple instructions every cycle. Introduction to Computer Organization and Architecture

Automobile Team Assembly 1 hour 1 car assembled every four hours 6 cars per day 180 cars per month 2,040 cars per year Introduction to Computer Organization and Architecture

Automobile Assembly Line Task 1 1 hour Task 2 Task 3 Task 4 Mecahnical Electrical Painting Testing First car assembled in 4 hours (pipeline latency) thereafter 1 car per hour 21 cars on first day, thereafter 24 cars per day 717 cars per month 8,637 cars per year Introduction to Computer Organization and Architecture

Throughput: Team Assembly Red car started Red car completed Mechanical Electrical Painting Testing Mechanical Electrical Painting Testing Blue car started Blue car completed Time Time of assembling one car = n hours where n is the number of nearly equal subtasks, each requiring 1 unit of time Throughput = 1/n cars per unit time Introduction to Computer Organization and Architecture

Throughput: Assembly Line Car 1 Car 2 Car 3 Car 4 . Mechanical Electrical Painting Testing Mechanical Electrical Painting Testing Mechanical Electrical Painting Testing Mechanical Electrical Painting Testing Car 1 complete Car 2 complete time Time to complete first car = n time units (latency) Cars completed in time T = T – n + 1 Throughput = 1- (n - 1)/ T car per unit time Throughput (assembly line) 1 – (n - 1)/ T n (n – 1) ─────────────────── = ──────── = n – ───── → n Throughput (team assembly) 1/n T as T→∞ Introduction to Computer Organization and Architecture

Some Features of Assembly Line Electrical parts delivered (JIT) Task 2 1 hour Task 3 1 hour Task 4 1 hour Task 1 1 hour Mechanical Electrical Painting Testing 3 cars in the assembly line are suspects, to be removed (flush pipeline) Stall assembly line to fix the cause of defect Defect found Introduction to Computer Organization and Architecture

Pipelining in a Computer Divide datapath into nearly equal tasks, to be performed serially and requiring non-overlapping resources. Insert registers at task boundaries in the datapath; registers pass the output data from one task as input data to the next task. Synchronize tasks with a clock having a cycle time that just exceeds the time required by the longest task. Break each instruction down into a fixed number of tasks so that instructions can be executed in a staggered fashion. Introduction to Computer Organization and Architecture

Single-Cycle Datapath Instruction class Instr. fetch (IF) Instr. Decode (also reg. file read) (ID) Execution (ALU Operation) (EX) Data access (MEM) Write Back (Reg. file write) (WB) Total time lw 2ns 1ns 8ns sw R-format add, sub, and, or, slt B-format, beq No operation on data; idle time equalizes instruction length to a fixed clock period. Introduction to Computer Organization and Architecture

Execution Time: Single-Cycle IF ID EX MEM WB IF ID EX MEM WB 0 2 4 6 8 10 12 14 16 . . Time (ns) lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) Clock cycle time = 8 ns Total time for executing three lw instructions = 24 ns Introduction to Computer Organization and Architecture

Pipelined Datapath Instruction class Instr. fetch (IF) Instr. Decode (also reg. file read) (ID) Execu-tion (ALU Opera-tion) (EX) Data access (MEM) Write Back (Reg. file write) (WB) Total time lw 2ns 1ns 10ns sw R-format: add, sub, and, or, slt B-format: beq No operation on data; idle time inserted to equalize instruction lengths. Introduction to Computer Organization and Architecture

Execution Time: Pipeline IF ID EX MEM RW IF ID EX MEM RW IF ID EX MEM RW 0 2 4 6 8 10 12 14 16 . . Time (ns) lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) Clock cycle time = 2 ns, four times faster than single-cycle clock Total time for executing three lw instructions = 14 ns Single-cycle time 24 Performance ratio = ──────────── = ── = 1.7 Pipeline time 14 Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Pipeline Performance Clock cycle time = 2 ns 1,003 lw instructions: Total time for executing 1,003 lw instructions = 2,014 ns Single-cycle time 8,024 Performance ratio = ──────────── = ──── = 3.98 Pipeline time 2,014 10,003 lw instructions: Performance ratio = 80,024 / 20,014 = 3.998 → Clock cycle ratio (4) Pipeline performance approaches clock-cycle ratio for long programs. Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Single-Cycle Datapath IF: Instr. fetch ID: Instr. decode, reg. file read EX: Execute, address calc. MEM: mem. access WB: writeback 4 Add 1 mux 0 ALU Branch opcode CONTROL MemtoReg 26-31 RegWrite ALUSrc 21-25 ALU zero MemWrite MemRead Instr. mem. PC Reg. File Data mem. 16-20 1 mux 0 0 mux 1 1 mux 0 11-15 RegDst ALUOp ALU Cont. Sign ext. Shift left 2 0-15 0-5 Introduction to Computer Organization and Architecture

Pipelining of RISC Instructions Fetch Instruction Examine Opcode Fetch Operands Perform Operation Store Result IF ID EX MEM WB Instruction Decode Execute Memory Write Fetch instruction and Operation Back Fetch operands to Reg file Although an instruction takes five clock cycles, one instruction is completed every cycle. Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Pipeline Registers Instr. mem. PC Add Reg. File Data 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 ALU Cont. CONTROL opcode MemWrite MemRead Branch zero 0-15 0-5 11-15 16-20 21-25 26-31 MemtoReg ALUOp ALUSrc RegDst RegWrite IF/ID ID/EX EX/MEM MEM/WB This requires a CONTROL not too different from single-cycle Introduction to Computer Organization and Architecture

Pipeline Register Functions Four pipeline registers are added: Register name Data held IF/ID PC+4, Instruction word (IW) ID/EX PC+4, R1, R2, IW(0-15) sign ext., IW(11-15) EX/MEM PC+4, zero, ALUResult, R2, IW(11-15) or IW(16-20) MEM/WB M[ALUResult], ALUResult, IW(11-15) or IW(16-20) Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Pipelined Datapath IF/ID ID/EX EX/MEM MEM/WB 4 Add 1 mux 0 ALU opcode Shift left 2 26-31 zero 21-25 Instr mem ALU 16-20 PC Reg. File Data mem 1 mux 0 0 mux 1 Sgn ext 11-15 for R-type 16-20 for I-type lw 0-15 Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Five-Cycle Pipeline CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Add Instruction add $t0, $s1, $s2 Machine instruction word 000000 10001 10010 01000 00000 100000 opcode $s1 $s2 $t0 function CC1 CC2 CC3 CC4 CC5 DM IM ID, REG. FILE READ ALU REG. FILE WRITE IF/ID ID/EX EX/MEM MEM/WB IF ID EX MEM WB read $s1 add write $t0 read $s2 $s1+$s2 Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Pipelined Datapath Executing add IF/ID ID/EX EX/MEM MEM/WB 4 Add 1 mux 0 ALU opcode Shift left 2 26-31 s1 zero $s1 21-25 ALU Instr mem addr Data mem data 16-20 PC Reg. File s2 $s2 1 mux 0 0 mux 1 Sign ext. 11-15 for R-type 16-20 for I-type lw t0 0-15 Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Load Instruction lw $t0, 1200 ($t1) 100011 01001 01000 0001 0010 0000 0000 opcode $t1 $t0 1200 CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IF ID EX MEM WB read $t1 add read write $t0 sign ext $t1+1200 M[addr] 1200 Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Pipelined Datapath Executing lw IF/ID ID/EX EX/MEM MEM/WB 4 Add 1 mux 0 ALU opcode Shift left 2 26-31 t1 zero $t1 21-25 ALU Instr mem addr Data mem data 16-20 PC Reg. File 1 mux 0 0 mux 1 Sign ext. 11-15 for R-type 16-20 for I-type lw t0 0-15 1200 Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Store Instruction sw $t0, 1200 ($t1) 101011 01001 01000 0001 0010 0000 0000 opcode $t1 $t0 1200 CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IF ID EX MEM WB read $t1 add write sign ext $t1+1200 M[addr] 1200 (addr) ← $t0 Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Pipelined Datapath Executing sw IF/ID ID/EX EX/MEM MEM/WB 4 Add 1 mux 0 ALU opcode Shift left 2 26-31 t1 zero $t1 21-25 ALU Instr mem addr Data mem data PC 16-20 Reg. File $t0 t0 1 mux 0 0 mux 1 Sign ext. 11-15 for R-type 16-20 for I-type lw 0-15 1200 Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Executing a Program Consider a five-instruction segment: lw $10, 20($1) sub $11, $2, $3 add $12, $3, $4 lw $13, 24($1) add $14, $5, $6 Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Program Execution CC1 CC2 CC3 CC4 CC5 time IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM lw $10, 20($1) IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM sub $11, $2, $3 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM Program instructions add $12, $3, $4 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM lw $13, 24($1) IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM add $14, $5, $6 Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture CC5 MEM: sub $11, $2, $3 WB: lw $10, 20($1) IF: add $14, $5, $6 ID: lw $13, 24($1) EX: add $12, $3, $4 IF/ID ID/EX EX/MEM MEM/WB 4 Add 1 mux 0 ALU opcode Shift left 2 26-31 zero 21-25 Instr mem ALU 16-20 Data mem. PC Reg. File 1 mux 0 0 mux 1 Sign ext. 11-15 for R-type 16-20 for I-type lw 0-15 Introduction to Computer Organization and Architecture

Advantages of Pipeline After the fifth cycle (CC5), one instruction is completed each cycle; CPI ≈ 1, neglecting the initial pipeline latency of 5 cycles. Pipeline latency is defined as the number of stages in the pipeline, or The number of clock cycles after which the first instruction is completed. The clock cycle time is about four times shorter than that of single-cycle datapath and about the same as that of multicycle datapath. For multicycle datapath, CPI = 3. …. So, pipelined execution is faster, but . . . Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Pipeline Hazards Definition: Hazard in a pipeline is a situation in which the next instruction cannot complete execution one clock cycle after completion of the present instruction. Three types of hazards: Structural hazard (resource conflict) Data hazard Control hazard Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Structural Hazard Two instructions cannot execute due to a resource conflict. Example: Consider a computer with a common data and instruction memory. The fourth cycle of a lw instruction requires memory access (memory read) and at the same time the first cycle of the fourth instruction requires instruction fetch (memory read). This will cause a memory resource conflict. Introduction to Computer Organization and Architecture

Example of Structural Hazard CC1 CC2 CC3 CC4 CC5 time IM/DM IM/DM ID, REG. FILE READ ALU REG. FILE WRITE lw $10, 20($1) IF/ID ID/EX EX/MEM MEM/WB IM/DM IM/DM ID, REG. FILE READ ALU REG. FILE WRITE IF/ID ID/EX EX/MEM MEM/WB sub $11, $2, $3 Common data and instr. Mem. IM/DM IM/DM ID, REG. FILE READ ALU REG. FILE WRITE add $12, $3, $4 IF/ID ID/EX EX/MEM MEM/WB Program instructions IM/DM IM/DM ID, REG. FILE READ ALU REG. FILE WRITE lw $13, 24($1) IF/ID ID/EX EX/MEM MEM/WB Needed by two instructions Introduction to Computer Organization and Architecture

Possible Remedies for Structural Hazards Provide duplicate hardware resources in datapath. Control unit or compiler can insert delays (no-op cycles) between instructions. This is known as pipeline stall or bubble. Introduction to Computer Organization and Architecture

Stall (Bubble) for Structural Hazard CC1 CC2 CC3 CC4 CC5 IM/DM ID, REG. FILE READ ALU REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM lw $10, 20($1) sub $11, $2, $3 add $12, $3, $4 lw $13, 24($1) time Program instructions Stall (bubble) Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Data Hazard Data hazard means that an instruction cannot be completed because the needed data, being generated by another instruction in the pipeline, is not available. Example: consider two instructions: add $s0, $t0, $t1 sub $t2, $s0, $t3 # needs $s0 Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Example of Data Hazard CC1 CC2 CC3 CC4 CC5 time Write s0 in CC5 DM IM ID, REG. FILE READ ALU REG. FILE WRITE add $s0, $t0, $t1 IF/ID ID/EX EX/MEM MEM/WB DM IM ID, REG. FILE READ ALU REG. FILE WRITE sub $t2, $s0, $t3 IF/ID ID/EX EX/MEM MEM/WB Read s0 and t3 in CC3 Program instructions We need to read s0 from reg file in cycle 3 But s0 will not be written in reg file until cycle 5 However, s0 will only be used in cycle 4 And it is available at the end of cycle 3 Introduction to Computer Organization and Architecture

Forwarding or Bypassing Output of a resource used by an instruction is forwarded to the input of some resource being used by another instruction. Forwarding can eliminate some, but not all, data hazards. Introduction to Computer Organization and Architecture

Forwarding for Data Hazard CC1 CC2 CC3 CC4 CC5 time Write s0 in CC5 DM IM ID, REG. FILE READ ALU REG. FILE WRITE add $s0, $t0, $t1 IF/ID ID/EX EX/MEM MEM/WB Forwarding DM IM ID, REG. FILE READ ALU REG. FILE WRITE sub $t2, $s0, $t3 IF/ID ID/EX EX/MEM MEM/WB Read s0 and t3 in CC3 Program instructions Introduction to Computer Organization and Architecture

Forwarding Unit Hardware ID/EX EX/MEM MEM/WB FORW. MUX ALU Data Mem. Data to reg. file MUX FORW. MUX Destination registers Source reg. IDs from opcode Forwarding Unit Introduction to Computer Organization and Architecture

Forwarding Alone May Not Work CC1 CC2 CC3 CC4 CC5 time Write s0 in CC5 DM IM ID, REG. FILE READ ALU REG. FILE WRITE lw $s0, 20($s1) IF/ID ID/EX EX/MEM MEM/WB DM IM ID, REG. FILE READ ALU REG. FILE WRITE sub $t2, $s0, $t3 IF/ID ID/EX EX/MEM MEM/WB Read s0 and t3 in CC3 Program instructions data needed by sub (data hazard) data available from memory only at the end of cycle 4 Introduction to Computer Organization and Architecture

Use Bubble and Forwarding CC1 CC2 CC3 CC4 CC5 time Write s0 in CC5 DM IM ID, REG. FILE READ ALU REG. FILE WRITE lw $s0, 20($s1) IF/ID ID/EX EX/MEM MEM/WB stall (bubble) Forwarding DM IM ID, REG. FILE READ ALU REG. FILE WRITE sub $t2, $s0, $t3 IF/ID ID/EX EX/MEM MEM/WB Program instructions Introduction to Computer Organization and Architecture

Hazard Detection Unit Hardware Disable write Hazard Detection Unit ID/EX EX/MEM MEM/WB Control FORW. MUX ALU NOP MUX Data Mem. PC FORW. MUX Instruction to reg. file IF/ID Control signals Forwarding Unit Source reg. IDs from opcode Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Resolving Hazards Hazards are resolved by Hazard detection and forwarding units. Compiler’s understanding of how these units work can improve performance. Introduction to Computer Organization and Architecture

Avoiding Stall by Code Reorder C code: A = B + E; C = B + F; MIPS code: lw $t1, 0($t0) . $t1 written lw $t2, 4($t0) . . $t2 written add $t3, $t1, $t2 . . . $t1, $t2 needed sw $t3, 12($t0) . . . . lw $t4, 8($t0) . . . . . $t4 written add $t5, $t1, $t4 . . . . . $t4 needed sw $t5, 16,($t0) . . . . . . . . . . . . . . . Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Reordered Code C code: A = B + E; C = B + F; MIPS code: lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 no hazard sw $t3, 12($t0) add $t5, $t1, $t4 no hazard sw $t5, 16,($t0) Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Control Hazard Instruction to be fetched is not known! Example: Instruction being executed is branch-type, which will determine the next instruction: add $4, $5, $6 beq $1, $2, 40 next instruction . . . 40 and $7, $8, $9 Introduction to Computer Organization and Architecture

Stall on Branch time CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM WRITE IF/ID ID/EX MEM/WB EX/MEM add $4, $5, $6 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM beq $1, $2, 40 Program instructions Stall (bubble) IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM next instruction or and $7, $8, $9

Introduction to Computer Organization and Architecture Why Only One Stall? Extra hardware in ID phase: Additional ALU to compute branch address Comparator to generate zero signal Hazard detection unit writes the branch address in PC Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Ways to Handle Branch Stall or bubble Branch prediction: Heuristics Next instruction Prediction based on statistics (dynamic) Hardware decision (dynamic) Prediction error: pipeline flush Delayed branch Introduction to Computer Organization and Architecture

Delayed Branch Example Stall on branch add $4, $5, $6 beq $1, $2, skip next instruction . . . skip or $7, $8, $9 Delayed branch beq $1, $2, skip add $4, $5, $6 next instruction . . . skip or $7, $8, $9 Instruction executed irrespective of branch decision Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Delayed Branch CC1 CC2 CC3 CC4 CC5 time DM IM ID, REG. FILE READ ALU REG. FILE WRITE IF/ID ID/EX EX/MEM MEM/WB beq $1, $2, skip Program instructions DM IM ID, REG. FILE READ ALU REG. FILE WRITE add $4, $5, $6 IF/ID ID/EX EX/MEM MEM/WB DM next instruction or skip or $7, $8, $9 IM ID, REG. FILE READ ALU REG. FILE WRITE IF/ID ID/EX EX/MEM MEM/WB Introduction to Computer Organization and Architecture

Introduction to Computer Organization and Architecture Summary: Hazards Structural hazards Cause: resource conflict Remedies: (i) hardware resources, (ii) stall (bubble) Data hazards Cause: data unavailablity Remedies: (i) forwarding, (ii) stall (bubble), (iii) code reordering Control hazards Cause: out-of-sequence execution (branch or jump) Remedies: (i) stall (bubble), (ii) branch prediction/pipeline flush, (iii) delayed branch/pipeline flush Introduction to Computer Organization and Architecture

The End Lecture 12