Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.

Slides:



Advertisements
Similar presentations
CS152 Computer Architecture and Engineering Lecture 12 Introduction to Pipelining: Datapath and Control March 8 th, 2004 John Kubiatowicz (
Advertisements

1 IKI20210 Pengantar Organisasi Komputer Kuliah no. 25: Pipeline 10 Januari 2003 Bobby Nazief Johny Moningka
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Computer Architecture
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
ECE 232 L22.Pipeline3.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 22 Pipelining,
CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
EE30332 Ch6 DP.1 Ch 6: Pipelining Modified from Dave Patterson’s notes  Laundry Example  Ann, Brian, Cathy, Dave each have one load of clothes to wash,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor Start X:40.
CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
ECE 232 L19.Pipeline2.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 19 Pipelining,
331 Lec18.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath [Adapted from Dave.
CS 152 L10 Pipeline Intro (1)Fall 2004 © UC Regents CS152 – Computer Architecture and Engineering Fall 2004 Lecture 10: Basic MIPS Pipelining Review John.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
Pipelining Datapath Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley) and Hank Walker (TAMU)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.
1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.
CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Ceg3420 L1 4.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining.
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
Ceg3420 L13.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining.
CS152 / Kubiatowicz Lec /17/01©UCB Fall 2001 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Spring W :332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s.
Introduction to Pipelining Rabi Mahapatra Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley)
Lecture 12: Pipeline Datapath Design Professor Mike Schulte Computer Architecture ECE 201.
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
Computer Organization CS224 Chapter 4 Part b The Processor Spring 2010 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
EECS 322 March 27, 2000 Based on Dave Patterson slides Instructor: Francis G. Wolff Case Western Reserve University This presentation.

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)
Pipelining CS365 Lecture 9. D. Barbara Pipeline CS465 2 Outline  Today’s topic  Pipelining is an implementation technique in which multiple instructions.
EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch.
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
Lecture 18: Pipelining I.
Computer Organization
CMSC 611: Advanced Computer Architecture
ECE232: Hardware Organization and Design
Pipelining Lessons 6 PM T a s k O r d e B C D A 30
Dave Patterson (http.cs.berkeley.edu/~patterson)
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
CS-447– Computer Architecture Lecture 14 Pipelining (2)
Pipelining Lessons 6 PM T a s k O r d e B C D A 30
An Introduction to pipelining
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
CMCS Computer Architecture Lecture 20 Pipelined Datapath and Control April 11, CMSC411.htm Mohamed.
Recall: Performance Evaluation
Presentation transcript:

Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002

Revisiting Pipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup Stall for Dependences ABCD 6 PM 789 TaskOrderTaskOrder Time

Structural Hazards –Hardware design Control Hazard –Decision based on results Data Hazard –Data Dependency Revisiting Pipelining Hazards

Control Signals for existing Datapath The Right to Left Control can lead to hazards

Place registers between each step

Example 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15

Start: Fetch 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 IF PC Next PC 10 = nnnn

Fetch 14, Decode 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1, r2(35) ID IF PC Next PC 14 = nnn

Fetch 20, Decode 14, Exec 10 Exec Reg. File Mem Acces s Data Mem r2 BS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt 35 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 addI r2, r2, 3 EX PC Next PC 20 = n n

Fetch 24, Decode 20, Exec 14, Mem 10 Exec Reg. File Mem Acces s Data Mem r2 B r2+35 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 sub r3, r4, r5 addI r2, r2, 3 ID IF EX M PC Next PC 24 = n

Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 Exec Reg. File Mem Acces s Data Mem r4 r5 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M[r2+35] 67 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 beq r6, r7 100 addI r2 sub r3 ID IF EX M WB PC Next PC 30 =

Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 Exec Reg. File Mem Acces s Data Mem r6 r7 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 9xx 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 beq addI r2 sub r3 r4-r5 100 ori r8, r9 17 ID IF EX M WB PC Next PC 100 =

Pipelining Load Instruction The five independent functional units in the pipeline datapath are: –Instruction Memory for the Ifetch stage –Register File’s Read ports (bus A and busB) for the Reg/Dec stage –ALU for the Exec stage –Data Memory for the Mem stage –Register File’s Write port (bus W) for the Wr stage Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7 IfetchReg/DecExecMemWr1st lw IfetchReg/DecExecMemWr2nd lw IfetchReg/DecExecMemWr3rd lw

Pipelining the R Instruction Ifetch: Instruction Fetch –Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: –ALU operates on the two register operands –Update PC Wr: Write the ALU output back to the register file Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecWrR-type

Pipelingng Both L and R type We have pipeline conflict or structural hazard: –Two instructions try to write to the register file at the same time! –Only one write port Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type Ops! We have a problem!

Important Observations Each functional unit can only be used once per instruction Each functional unit must be used at the same stage for all instructions: –Load uses Register File’s Write Port during its 5th stage –R-type uses Register File’s Write Port during its 4th stage IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type 1234

Solution Delay R-type’s register write by one cycle: –Now R-type instructions also use Reg File’s write port at Stage 5 –Mem stage is a NOOP stage: nothing is being done. Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWr IfetchReg/DecMemWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/Dec Exec WrR-type Mem Exec

Datapath (Without Pipeline) IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Acces s Data Mem ABS Reg File Equal PC Next PC IR Inst. Mem DM

Datapath (With Pipeline) IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– M; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– M; S <– A + SX; Mem[S] <- B if Cond PC < PC+SX; M <– S Exec Reg. File Mem Acces s Data Mem AB S Reg File Equal PC Next PC IR Inst. Mem DM M <– S

Mem Structural Hazard and Solution I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg ALU Mem Reg MemReg

Control Hazard - #1 Stall Stall: wait until decision is clear Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) => slow I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg Mem Lost potential

Control Hazard – #2 Predict Predict: guess one direction then back up if wrong Impact: 0 lost cycles per branch instruction if right, 1 if wrong (right ­ 50% of time) More dynamic scheme: history of 1 branch I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg

Control Hazard - #3 Delayed Branch Delayed Branch: Redefine branch behavior (takes place after next instruction) Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” (­ 50% of time) I n s t r. O r d e r Time (clock cycles) Add Beq Misc ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg Load Mem ALU Reg MemReg

Data Hazards (RAW) Dependencies backwards in time are hazards I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg

Data Hazards [contd…] “Forward” result from one stage to another I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg

Data Hazards [contd…] Reg Dependencies backwards in time are hazards Can’t solve with forwarding: Must delay/stall instruction dependent on loads Time (clock cycles) lw r1,0(r2) sub r4,r1,r3 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm ALU Im Reg DmReg Stall

Hazard Detection I-Fet ch DCD MemOpFetch OpFetch Exec Store IFetch DCD ° ° ° Structural Hazard I-Fet ch DCD OpFetch Jump IFetch DCD ° ° ° Control Hazard IF DCD EX Mem WB IF DCD OF Ex Mem RAW (read after write) Data Hazard WAW Data Hazard (write after write) IF DCD OF Ex RSWAR Data Hazard (write after read) IF DCD EX Mem WB

Hazard Detection Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. A RAW hazard exists on register  if  Rregs( i )  Wregs( j ) A WAW hazard exists on register  if  Wregs( i )  Wregs( j ) A WAR hazard exists on register  if  Wregs( i )  Rregs( j ) Window on execution: Only pending instructions can cause hazards Inst J Inst I New Inst Instruction Movement:

Computing CPI Start with Base CPI Add stalls Suppose: –CPI base =1 –Freq branch =20%, freq load =30% –Suppose branches always cause 1 cycle stall –Loads cause a 2 cycle stall Then: CPI = 1 + (1  0.20)+(2  0.30)= 1.8

Summary Control Signals need to be propagated Insert Registers between every stage to “remember” and “propagate” values Solutions to Control Hazard are Stall, Predict and Delayed Branch Solutions to Data Hazard is “Forwarding” Effective CPI = CPI ideal + CPI stall