ECE 232 L19.Pipeline2.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 19 Pipelining,

Slides:



Advertisements
Similar presentations
CMPT 334 Computer Organization
Advertisements

CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor.
Computer Architecture
ECE 232 L22.Pipeline3.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 22 Pipelining,
CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
EE30332 Ch6 DP.1 Ch 6: Pipelining Modified from Dave Patterson’s notes  Laundry Example  Ann, Brian, Cathy, Dave each have one load of clothes to wash,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Chapter Six Enhancing Performance with Pipelining
CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.
ECE 232 L15.Miulticycle.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 15 Multi-cycle.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
CS 152 L10 Pipeline Intro (1)Fall 2004 © UC Regents CS152 – Computer Architecture and Engineering Fall 2004 Lecture 10: Basic MIPS Pipelining Review John.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
Pipelining Datapath Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley) and Hank Walker (TAMU)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.
1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.
CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Ceg3420 L1 4.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining.
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
Ceg3420 L13.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining.
ECE 232 L12.Datapath.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 12 Datapath.
Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
W.S Computer System Design Lecture 7 Wannarat Suntiamorntut.
Lecture 12: Pipeline Datapath Design Professor Mike Schulte Computer Architecture ECE 201.
CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.
EECS 322 March 27, 2000 Based on Dave Patterson slides Instructor: Francis G. Wolff Case Western Reserve University This presentation.

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Pipelining CS365 Lecture 9. D. Barbara Pipeline CS465 2 Outline  Today’s topic  Pipelining is an implementation technique in which multiple instructions.
1. Convert the RISCEE 1 Architecture into a pipeline Architecture (like Figure 6.30) (showing the number data and control bits). 2. Build the control line.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Introduction to Computer Organization Pipelining.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
Computer Organization
CMSC 611: Advanced Computer Architecture
Pipelining Lessons 6 PM T a s k O r d e B C D A 30
Dave Patterson (http.cs.berkeley.edu/~patterson)
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Chapter 4 The Processor Part 2
Pipelining review.
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Pipelining Chapter 6.
Systems Architecture II
Pipelining Lessons 6 PM T a s k O r d e B C D A 30
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Instruction Execution Cycle
CS152 Computer Architecture and Engineering Lecture 14 Pipelining Control Continued Introduction to Advanced Pipelining.
Morgan Kaufmann Publishers The Processor
CMCS Computer Architecture Lecture 20 Pipelined Datapath and Control April 11, CMSC411.htm Mohamed.
Guest Lecturer: Justin Hsia
Recall: Performance Evaluation
Presentation transcript:

ECE 232 L19.Pipeline2.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 19 Pipelining, cont’d Maciej Ciesielski

ECE 232 L19.Pipeline2.2 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Outline °Pipelining, review °Design Control Datapath Examples

ECE 232 L19.Pipeline2.3 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Designing a Pipelined Processor °Go back and examine your datapath and control diagram °Associate resources with states °Ensure that flows do not conflict, or figure out how to resolve the conflict °Assert control in appropriate stage

ECE 232 L19.Pipeline2.4 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Pipelined Datapath (as in book); hard to read

ECE 232 L19.Pipeline2.5 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Pipelined Processor (almost) for slides °What happens if we start a new instruction every cycle? Exec Reg. File Mem Access Data Mem A B S M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl

ECE 232 L19.Pipeline2.6 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Control and Datapath A <- R[rs]; B<– R[rt] S <– A + B; IR <- Mem[PC]; PC <– PC+4; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Equal Exec Reg. File Mem Access Data Mem A B S Reg File PC Next PC IR Inst. Mem D M

ECE 232 L19.Pipeline2.7 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Pipelining the Load Instruction °The five independent functional units in the pipeline datapath are: Instruction Memory for the Ifetch stage Register File’s Read ports (bus A and busB) for the Reg/Dec stage ALU for the Exec stage Data Memory for the Mem stage Register File’s Write port (bus W) for the Wr stage Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7 IfetchReg/DecExecMemWr1st lw IfetchReg/DecExecMemWr2nd lw IfetchReg/DecExecMemWr3rd lw

ECE 232 L19.Pipeline2.8 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers The Four Stages of R-type °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: ALU operates on the two register operands Update PC °Wr: Write the ALU output back to the register file Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecWrR-type

ECE 232 L19.Pipeline2.9 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Pipelining the R-type and Load Instruction °We have pipeline conflict or structural hazard: Two instructions try to write to the register file at the same time! Only one write port Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type Ops! We have a problem!

ECE 232 L19.Pipeline2.10 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Important Observation °Each functional unit can only be used once per instruction °Each functional unit must be used at the same stage for all instructions: Load uses Register File’s Write Port during its 5th stage R-type uses Register File’s Write Port during its 4th stage IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type 1234 °2 ways to solve this pipeline hazard.

ECE 232 L19.Pipeline2.11 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Solution 1: Insert “Bubble” into the Pipeline °Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle The control logic can be complex. Lose instruction fetch and issue opportunity. °No instruction is started in Cycle 6! Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExec IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWr R-type IfetchReg/DecExecWr R-type Pipeline Bubble IfetchReg/DecExecWr

ECE 232 L19.Pipeline2.12 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Solution 2: Delay R-type’s Write by One Cycle °Delay R-type’s register write by one cycle: Now R-type instructions also use Reg File’s write port at Stage 5 Mem stage is a NOOP stage: nothing is being done. Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type Exec IfetchReg/Dec Exec WrR-type Mem

ECE 232 L19.Pipeline2.13 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Modified Control & Datapath Exec Reg. File Mem Access Data Mem A B S Reg File Equal PC Next PC IR Inst. Mem D M IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– M; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– M; S <– A + SX; Mem[S] <- B if Cond PC < PC+SX; M <– S

ECE 232 L19.Pipeline2.14 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers The Four Stages of Store °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: Calculate the memory address °Mem: Write the data into the Data Memory Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemStoreWr

ECE 232 L19.Pipeline2.15 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers The Three Stages of Beq °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: Compares the two register operand, Select correct branch target address Latch into PC Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemBeqWr

ECE 232 L19.Pipeline2.16 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Control Diagram IR <- Mem[PC]; PC < PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; M <– S Exec Reg. File Mem Access Data Mem A B S Reg File Equal PC Next PC IR Inst. Mem D M

ECE 232 L19.Pipeline2.17 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Datapath + Data Stationary Control Exec Reg. File Mem Access Data Mem A B S Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt op rs rt fun im ex me wb rw v me wb rw v wb rw v

ECE 232 L19.Pipeline2.18 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Let’s Try it Out °Execute these instructions on pipelined machine 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 ……. 100andr13, r14, 15 these addresses are octal

ECE 232 L19.Pipeline2.19 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Start: Fetch 10 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 …. 100andr13, r14, 15 Exec Reg. File Mem Access Data Mem A B S Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt im nn nn 10

ECE 232 L19.Pipeline2.20 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Fetch 14, Decode 10 PC 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 …. 100andr13, r14, 15 Exec Reg. File Mem Access Data Mem A B S Reg File Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt im n nn 14 lw r1, r2(35)

ECE 232 L19.Pipeline2.21 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Fetch 20, Decode 14, Exec 10 PC 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 … 100andr13, r14, 15 Exec Reg. File Mem Access Data Mem r2 B S Reg File Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt 35 nn 20 lw r1 addI r2, r2, 3

ECE 232 L19.Pipeline2.22 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Fetch 24, Decode 20, Exec 14, Mem 10 PC 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 … 100andr13, r14, 15 Exec Reg. File Mem Access Data Mem r2 B r2+35 Reg File Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 45 3 n 24 lw r1 sub r3, r4, r5 addI r2, r2, 3

ECE 232 L19.Pipeline2.23 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 … 100andr13, r14, 15 Exec Reg. File Mem Access Data Mem r4 r5 r2+3 Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M[r2+35] lw r1 beq r6, r7 100 addI r2 sub r3

ECE 232 L19.Pipeline2.24 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Fetch 34, Dcd 30, Ex 24, Mem 20, WB 14 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 … 100andr13, r14, 15 r1= M [r2+35] Address 100 will be loaded into PC Exec Reg. File Mem Access Data Mem r6 r7 r2+3 Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl 9xx 34 beq addI r2 sub r3 r4-r5 100 ori r8, r9 17 =? EXE Assume branch taken 30: first delayed instruction

ECE 232 L19.Pipeline2.25 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Fetch 100, Dcd 34, Ex 30, Mem 24, WB 20 PC r1= M [r2+35] 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 … 100andr13, r14, 15 r2 = r2+3 Exec Reg. File Mem Access Data Mem r9 x Reg File Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl beq sub r3 r4-r5 17 ori r8 xxx add r10, r11, r12 34: second delayed instruction

ECE 232 L19.Pipeline2.26 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Fetch 104, Dcd 100, Ex 34, Mem 30, WB 24 r1= M [r2+35] 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 … 100andr13, r14, 15 r2 = r2+ 3 r3 = r4- r5 Two instructions squashed in the branch shadow! Exec Reg. File Mem Access Data Mem r11 r12 Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl beq xx ori r8 xxx add r10 and r13, r14, r15 n r9 | 17

ECE 232 L19.Pipeline2.27 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Fetch 108, Dcd 104, Ex 100, Mem 34, WB 30 r1= M [r2+35] 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 … 100andr13, r14, 15 r2 = r2+ 3 r3 = r4- r5 Exec Reg. File Mem Access Data Mem r14 r15 Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl 110 xx ori r8 add r10 and r13 n r9 | 17 r11+r12

ECE 232 L19.Pipeline2.28 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Fetch 114, Dcd 110, Ex 104, Mem 100, WB 34 Exec Reg. File Mem Access Data Mem Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1= M [r2+35] 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 … 100andr13, r14, r2 = r2+ 3 r3 = r4- r5 r8 = r9 | 17 add r10 and r13 n r11+r12 NO WB NO Overflow r14 & R15

ECE 232 L19.Pipeline2.29 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Summary: Pipelining °What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores °What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction °We’ll build a simple pipeline and look at these issues °We’ll talk about modern processors and what really makes it hard: exception handling trying to improve performance with out-of-order execution, etc.

ECE 232 L19.Pipeline2.30 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Summary °Pipelining is a fundamental concept multiple steps using distinct resources °Utilize capabilities of the datapath by pipelined instruction processing start next instruction while working on the current one limited by length of longest stage (plus fill/flush) detect and resolve hazards