CSCI206 - Computer Organization & Programming

Slides:

Advertisements

Similar presentations

Morgan Kaufmann Publishers The Processor

Advertisements

COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

ELEN 468 Advanced Logic Design

CMPT 334 Computer Organization

1 A few words about the quiz Closed book, but you may bring in a page of handwritten notes. –You need to know what the “core” MIPS instructions do. –I.

MIPS Pipelined Datapath

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.

Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.

Pipelining Andreas Klappenecker CPSC321 Computer Architecture.

CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.

Pipelined Datapath and Control

Pipelining Enhancing Performance. Datapath as Designed in Ch. 5 Consider execution of: lw $t1,100($t0) lw $t2,200($t0) lw $t3,300($t0) Datapath segments.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

CMPE 421 Parallel Computer Architecture

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S

5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

Introduction to Computer Organization Pipelining.

CSCE 212 Chapter 6 Enhancing Performance with Pipelining Instructor: Jason D. Bakos.

1 Pipelining CDA 3101 Discussion Section Question 1 – 6.1 Suppose that time for an ALU operation can be shortened by 25% in the following figure.

Simulator Outline of MIPS Simulator project  Write a simulator for the MIPS five-stage pipeline that does the following: Implements a subset of.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Pipeline Timing Issues

Exceptions Another form of control hazard Could be caused by…

Stalling delays the entire pipeline

CS2100 Computer Organization

CDA3101 Recitation Section 8

Pipelining Chapter 6.

CSCI206 - Computer Organization & Programming

ELEN 468 Advanced Logic Design

Basic Pipeline Datapath

Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards

Single Clock Datapath With Control

Pipeline Implementation (4.6)

CDA 3101 Spring 2016 Introduction to Computer Organization

ECE232: Hardware Organization and Design

Morgan Kaufmann Publishers The Processor

CS 5513 Computer Architecture Pipelining Examples

Pipelining review.

Single-cycle datapath, slightly rearranged

Pipelining Chapter 6.

Computer Architecture

Pipelining in more detail

Computer Architecture

CSCI206 - Computer Organization & Programming

Data Hazards Data Hazard

The Processor Lecture 3.6: Control Hazards

Pipelining: Basic Concepts

Interactive MIPS Datapath Tutorial

Reducing pipeline hazards – three techniques

Pipelining Chapter 6.

Morgan Kaufmann Publishers The Processor

Pipelining Chapter 6.

Systems Architecture II

Guest Lecturer: Justin Hsia

CS 3853 Computer Architecture Pipelining Examples

MIPS Pipelined Datapath

Problem ??: (?? marks) Consider executing the following code on the MIPS pipelined datapath: add $t5, $t6, $t8 add $t9, $t5, $t4 lw $t3, 100($t9) sub $t2,

Need to stall for one cycle.

Presentation transcript:

CSCI206 - Computer Organization & Programming Pipeline Datapath and Control zyBook: 11.6

The MIPS Pipeline

Hazard Summary data - An instruction depends on a data value produced or consumed by another instruction -- Reorder -- Forwarding (EX-EX, Mem-EX) control - The execution of an instruction depends on a control decision made by an earlier instruction (e.g., branch) -- Delay slot (nop) -- Compute diff at the ID stage structural - An instruction in the pipeline needs a resource being used by another instruction in the pipeline at the same moment -- Reorder if possible -- Delay

EXAMPLES

Show the pipeline diagram CYCLE 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F add v1, v1, v2 beq v0, v1, loop addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken

Show the pipeline diagram CYCLE 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D add v1, v1, v2 beq v0, v1, loop addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken

Show the pipeline diagram CYCLE 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E add v1, v1, v2 beq v0, v1, loop addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken

Show the pipeline diagram CYCLE 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken In cycle 4 branch wants to execute, but needs the new value of v1. It is available at the end of cycle 4. So we have to stall. This stalls everything before this stage in the pipeline, so we cannot fetch the addi.

Show the pipeline diagram CYCLE 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M W add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken In cycle 5, we have the value of v1 in EX. But MIPS only has forwarding EX-EX, MEM-EX, and MEM-MEM. Not EX-ID. So, we have to again stall. (no fetch again)

Show the pipeline diagram CYCLE 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M W add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 Finally in cycle 6 we can decode the new value of v1 (without forwarding). Since we were able to decode, we can also fetch the next instruction in cycle 6. Since branch is resolved in Decode, we don’t have to show EMW stages (they are NOPs) MIPS branch uses 2 optimizations, assume the branch is NOT taken

Show the pipeline diagram CYCLE 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M W add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 no hazards for addi MIPS branch uses 2 optimizations, assume the branch is NOT taken

Show the pipeline diagram CYCLE 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M W add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 Fast forward to cycle 11. execution took 11 cycles. IPC = 5 / 11 = 0.45 MIPS branch uses 2 optimizations, assume the branch is NOT taken

Show the pipeline diagram for CYCLE 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) lw r2, 400(r4) addi r3, r1, r2 sw r3, 0(r4) subi r4, r4, 4 First two instructions are hazard free addi depends on both r1 and r2. sw depends on r3 (addi)

Show the pipeline diagram for CYCLE 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) lw r2, 400(r4) addi r3, r1, r2 sw r3, 0(r4) subi r4, r4, 4 First two instructions are hazard free addi depends on both r1 and r2. sw depends on r3 (addi)

Show the pipeline diagram for CYCLE 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E lw r2, 400(r4) addi r3, r1, r2 sw r3, 0(r4) subi r4, r4, 4 No issues until addi goes to decode

Show the pipeline diagram for CYCLE 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 Decode in cycle 4 would get both old values (r1, r2) We could forward r1 from MEM to EX in 5 But r1 is not yet available, so we must stall, since D stalls, sw cannot fetch.

Show the pipeline diagram for CYCLE 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 Decode in cycle 5. Load new value for r1 (WB in same cycle is OK) Need to forward MEM->EX for r2 in cycle 6.

Show the pipeline diagram for CYCLE 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 Forward MEM->EX for r2. Draw an arrow from previous cycle’s M to current cycle’s E Decode sw in 6, but we get the old value for r3. But that’s OK, sw doesn’t need the new value until the start of MEM, we can use a forwarding path

Show the pipeline diagram for CYCLE 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 No issues

Show the pipeline diagram for CYCLE 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 SW fetched the wrong r3, but the new value for r3 is at the output of the MEM stage, so we need a MEM-MEM forward.

Show the pipeline diagram for CYCLE 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4

Show the pipeline diagram for CYCLE 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 IPC = 5 / 10 = 0.5

Show the pipeline diagram for 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) add r1, r1, r2 sw r1, 0(sp)

Show the pipeline diagram for 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) add r1, r1, r2 sw r1, 0(sp)

Show the pipeline diagram for CYCLE 1 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F add r1, r1, r2 sw r1, 0(sp)

Show the pipeline diagram for CYCLE 2 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D add r1, r1, r2 sw r1, 0(sp)

Show the pipeline diagram for CYCLE 3 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E add r1, r1, r2 - sw r1, 0(sp) if we decode in 3, add will need r1 to execute in 4. the value isn’t available until the end of cycle 4 (lw finishes mem). So we need to stall in cycle 3.

Show the pipeline diagram for CYCLE 4 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M add r1, r1, r2 - sw r1, 0(sp) We will forward from the output of MEM to the input of EX in the next cycle (r1 for add)

Show the pipeline diagram for CYCLE 5 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M W add r1, r1, r2 - sw r1, 0(sp) We will forward from the output of MEM to the input of EX in the next cycle (r1 for add) sw needs the new r1 at the beginning of MEM, that will be in cycle 7, we can get it from the output of MEM

Show the pipeline diagram for CYCLE 6 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M W add r1, r1, r2 - sw r1, 0(sp) sw computes the memory address 0 + sp in EX, so no forward needed.

Show the pipeline diagram for CYCLE 7 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M W add r1, r1, r2 - sw r1, 0(sp) sw writes r1 at mem[0+sp] in cycle 7, the value r1 is at the output of MEM so forward it to the input.

Show the pipeline diagram for CYCLE 8 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M W add r1, r1, r2 - sw r1, 0(sp) Done. IPC = 3 / 8