1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6.

Slides:

Advertisements

Similar presentations

MIPS Pipelined Datapath

Advertisements

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University RISC Pipeline See: P&H Chapter 4.6.

1 CS3410 Guest Lecture A Simple CPU: remaining branch instructions CPU Performance Pipelined CPU Tudor Marian.

Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.

© Kavita Bala, Computer Science, Cornell University Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipelining See: P&H Chapter 4.5.

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 1.6,

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath

EECS 470 Pipeline Hazards Lecture 4 Coverage: Appendix A.

Computer Architecture Lecture 3 Coverage: Appendix A

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.

Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.

Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.

CSE378 Pipelining1 Pipelining Basic concept of assembly line –Split a job A into n sequential subjobs (A 1,A 2,…,A n ) with each A i taking approximately.

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 1.6,

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

Pipelined Datapath and Control

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

MIPS Pipeline Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter 4.6.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

Sample Code (Simple) Run the following code on a pipelined datapath: add1 2 3 ; reg 3 = reg 1 + reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg.

Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.

COMP541 Multicycle MIPS Montek Singh Mar 25, 2010.

CDA 5155 Computer Architecture Week 1.5. Start with the materials: Conductors and Insulators Conductor: a material that permits electrical current to.

Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University MIPS Pipeline See P&H Chapter 4.6.

Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University MIPS Pipeline See P&H Chapter 4.6.

CMPE 421 REVIEW: MIDTERM 1. A MODIFIED FIVE-Stage Pipeline PC A Y R MD1 addr inst Inst Memory Imm Ext add rd1 GPRs rs1 rs2 ws wd rd2 we wdata addr wdata.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Interstage Buffers 1 Computer Organization II © McQuain Pipeline Timing Issues Consider executing: add $t2, $t1, $t0 sub $t3, $t1, $t0 or.

CPU Performance Pipelined CPU

CS161 – Design and Architecture of Computer Systems

CS161 – Design and Architecture of Computer Systems

Pipeline Timing Issues

Computer Organization

Morgan Kaufmann Publishers The Processor

Single Clock Datapath With Control

ECS 154B Computer Architecture II Spring 2009

School of Computing and Informatics Arizona State University

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 2

Single-cycle datapath, slightly rearranged

Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2013

Current Design.

Pipelining and Hazards

Pipelining in more detail

Systems Architecture II

Pipelining Basic concept of assembly line

The Processor Lecture 3.6: Control Hazards

The Processor Lecture 3.4: Pipelining Datapath and Control

Control unit extension for data hazards

An Introduction to pipelining

Instruction Execution Cycle

Designing a Pipelined CPU

Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Pipelining Basic concept of assembly line

Pipelining Basic concept of assembly line

Pipelining Appendix A and Chapter 3.

Morgan Kaufmann Publishers The Processor

Introduction to Computer Organization and Architecture

Pipelining Hakim Weatherspoon CS 3410 Computer Science

Pipelining Hakim Weatherspoon CS 3410 Computer Science

MIPS Pipeline Hakim Weatherspoon CS 3410, Spring 2013 Computer Science

MIPS Pipelined Datapath

CS161 – Design and Architecture of Computer Systems

Presentation transcript:

1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6

2 Homework

3 Announcements - Homework 2 due tomorrow midnight - Programming Assignment 1 release tomorrow -Pipelined MIPS processor (topic of today) -Subset of MIPS ISA - Feedback -We want to hear from you! -Content?

4 Absolute Jump tgt +4 || Data Mem addr ext 555 Reg. File PC Prog. Mem ALU inst control imm offset + =? cmp Could have used ALU for link add +4 opmnemonicdescription 0x3JAL targetr31 = PC+8 (+8 due to branch delay slot) PC = (PC+4) || (target << 2)

5 A Processor alu PC imm memory d in d out addr target offset cmp control =? new pc register file inst extend +4 Review: Single cycle processor

6 Single Cycle Processor Advantages Single Cycle per instruction make logic and clock simple Disadvantages Since instructions take different time to finish, memory and functional unit are not efficiently utilized. Cycle time is the longest delay. –Load instruction Best possible CPI is 1

7 Pipeline Hazards 0h1h2h3h…

8 Write- Back Memory Instruction Fetch Execute Instruction Decode register file control A Processor alu imm memory d in d out addr inst PC memory compute jump/branch targets new pc +4 extend

9 Basic Pipeline Five stage “RISC” load-store architecture 1.Instruction fetch (IF) –get instruction from memory, increment PC 2.Instruction Decode (ID) –translate opcode into control signals and read registers 3.Execute (EX) –perform ALU operation, compute jump/branch targets 4.Memory (MEM) –access memory if needed 5.Writeback (WB) –update register file Slides thanks to Sally McKee & Kavita Bala

10 Pipelined Implementation Break instructions across multiple clock cycles (five, in this case) Design a separate stage for the execution performed during each clock cycle Add pipeline registers to isolate signals between different stages

11 Write- Back Memory Instruction Fetch Execute Instruction Decode extend register file control Pipelined Processor alu memory d in d out addr PC memory new pc inst IF/ID ID/EXEX/MEMMEM/WB imm B A ctrl B DD M compute jump/branch targets +4

12 IF Stage 1: Instruction Fetch Fetch a new instruction every cycle Current PC is index to instruction memory Increment the PC at end of cycle (assume no branches for now) Write values of interest to pipeline register (IF/ID) Instruction bits (for later decoding) PC+4 (for later computing branch targets)

13 IF PC instruction memory new pc inst addrmc 00 = read word 1 IF/ID WE 1 Rest of pipeline +4 PC+4 pcsel pcreg pcrel pcabs

14 ID Stage 2: Instruction Decode On every cycle: Read IF/ID pipeline register to get instruction bits Decode instruction, generate control signals Read from register file Write values of interest to pipeline register (ID/EX) Control information, Rd index, immediates, offsets, … Contents of Ra, Rb PC+4 (for computing branch targets later)

15 ID ctrl ID/EX Rest of pipeline PC+4 inst IF/ID PC+4 Stage 1: Instruction Fetch register file WE Rd Ra Rb D B A B A extend imm decode result dest

16 EX Stage 3: Execute On every cycle: Read ID/EX pipeline register to get values and control bits Perform ALU operation Compute targets (PC+4+offset, etc.) in case this is a branch Decide if jump/branch should be taken Write values of interest to pipeline register (EX/MEM) Control information, Rd index, … Result of ALU operation Value in case this is a memory store instruction

17 Stage 2: Instruction Decode pcrel pcabs EX ctrl EX/MEM Rest of pipeline B D ctrl ID/EX PC+4 B A alu + || branch? imm pcsel pcreg

18 MEM Stage 4: Memory On every cycle: Read EX/MEM pipeline register to get values and control bits Perform memory load/store if needed –address is ALU result Write values of interest to pipeline register (MEM/WB) Control information, Rd index, … Result of memory operation Pass result of ALU operation

19 MEM ctrl MEM/WB Rest of pipeline Stage 3: Execute M D ctrl EX/MEM B D memory d in d out addr mc

20 WB Stage 5: Write-back On every cycle: Read MEM/WB pipeline register to get values and control bits Select value and write to register file

21 WB Stage 4: Memory ctrl MEM/WB M D result dest

22 IF/ID +4 ID/EXEX/MEMMEM/WB mem d in d out addr inst PC+4 OP B A Rd B D M D PC+4 imm OP Rd OP Rd PC inst mem Rd RaRb D B A

23 Example add r3, r1, r2; nand r6, r4, r5; lw r4, 20(r2); add r5, r2, r5; sw r7, 12(r3);

24 0:add 1:nand 2:lw 3:add 4:sw r0 r1 r2 r3 r4 r5 r6 r IF/ID +4 ID/EXEX/MEMMEM/WB mem d in d out addr inst PC+4 OP B A Rd B D M D PC+4 imm OP Rd OP Rd PC inst mem 77 add r3, r1, r2nand r6, r4, r5 add r3, r1, r2lw r4, 20(r2) nand r6, r4, r5 add r3, r1, r2add r5, r2, r5 lw r4, 20(r2) nand r6, r4, r5 add r3, r1, r2sw r7, 12(r3) add r5, r2, r5 lw r4, 20(r2) nand r6, r4, r5 add r3, r1, r2sw r7, 12(r3) add r5, r2, r5 lw r4, 20(r2) nand r6, r4, r5sw r7, 12(r3) add r5, r2, r5 lw r4, 20(r2)sw r7, 12(r3) add r5, r2, r5sw r7, 12(r3) Rd RaRb D B A

25 Time Graphs add nand lw add sw Clock cycle Latency: Throughput: Concurrency: CPI = IFIDEX MEM WB IFIDEX MEM WB IFIDEX MEM WB IFIDEX MEM WB IFIDEX MEM WB

26 Pipelining Recap Powerful technique for masking latencies Logically, instructions execute one at a time Physically, instructions execute in parallel –Instruction level parallelism Abstraction promotes decoupling Interface (ISA) vs. implementation (Pipeline)

27 The end

28 Sample Code (Simple) Assume eight-register machine Run the following code on a pipelined datapath add ; reg 3 = reg 1 + reg 2 nand ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7 Slides thanks to Sally McKee

29 PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX MUXMUX Bits 0-2 Bits op dest offset valB valA PC+1 target ALU result op dest valB op dest ALU result mdata instruction 0 R2 R3 R4 R5 R1 R6 R0 R7 regA regB Bits data dest IF/ID ID/EXEX/MEMMEM/WB

30 data dest IF/ID ID/EXEX/MEMMEM/WB

31 PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX MUXMUX Bits 0-2 Bits nop add R2 R3 R4 R5 R1 R6 R0 R7 Bits data dest Fetch: add Time: 1 IF/ID ID/EXEX/MEMMEM/WB

32 PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX MUXMUX Bits 0-2 Bits add nop nand R2 R3 R4 R5 R1 R6 R0 R7 1 2 Bits data dest Fetch: nand nand add Time: 2 IF/ID ID/EXEX/MEMMEM/WB

33 PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX MUXMUX Bits 0-2 Bits nand add 3 9 nop lw 4 20(2) R2 R3 R4 R5 R1 R6 R0 R7 4 5 Bits data dest Fetch: lw 4 20(2) lw 4 20(2) nand add Time: IF/ID ID/EXEX/MEMMEM/WB

34 PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX MUXMUX Bits 0-2 Bits lw nand 6 7 add add R2 R3 R4 R5 R1 R6 R0 R7 2 4 Bits data dest Fetch: add add lw 4 20(2) nand add Time: IF/ID ID/EXEX/MEMMEM/WB

35 PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX MUXMUX Bits 0-2 Bits add lw 4 18 nand sw 7 12(3) R2 R3 R4 R5 R1 R6 R0 R7 2 5 Bits data dest Fetch: sw 7 12(3) sw 7 12(3) add lw 4 20 (2) nand add Time: IF/ID ID/EXEX/MEMMEM/WB

36 PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX MUXMUX Bits 0-2 Bits sw add 5 7 lw R2 R3 R4 R5 R1 R6 R0 R7 3 7 Bits data dest No more instructions sw 7 12(3) add lw 4 20(2) nand Time: IF/ID ID/EXEX/MEMMEM/WB

37 PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX MUXMUX Bits 0-2 Bits sw 7 22 add R2 R3 R4 R5 R1 R6 R0 R7 Bits data dest No more instructions nop nop sw 7 12(3) add lw 4 20(2) Time: IF/ID ID/EXEX/MEMMEM/WB

38 PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX MUXMUX Bits 0-2 Bits sw R2 R3 R4 R5 R1 R6 R0 R7 Bits data dest No more instructions nop nop nop sw 7 12(3) add Time: Slides thanks to Sally McKee IF/ID ID/EXEX/MEMMEM/WB

39 PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX MUXMUX Bits 0-2 Bits R2 R3 R4 R5 R1 R6 R0 R7 Bits data dest No more instructions nop nop nop nop sw 7 12(3) Time: 9 IF/ID ID/EXEX/MEMMEM/WB