Ceg3420 L1 4.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining.

Slides:



Advertisements
Similar presentations
CS152 Computer Architecture and Engineering Lecture 12 Introduction to Pipelining: Datapath and Control March 8 th, 2004 John Kubiatowicz (
Advertisements

1 IKI20210 Pengantar Organisasi Komputer Kuliah no. 25: Pipeline 10 Januari 2003 Bobby Nazief Johny Moningka
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Computer Architecture
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
Instruction-Level Parallelism (ILP)
ECE 232 L22.Pipeline3.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 22 Pipelining,
CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor Start X:40.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
ECE 232 L19.Pipeline2.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 19 Pipelining,
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
Pipelining Datapath Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley) and Hank Walker (TAMU)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.
CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
Ceg3420 L13.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining.
CS152 / Kubiatowicz Lec /17/01©UCB Fall 2001 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.
Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Lecture 12: Pipeline Datapath Design Professor Mike Schulte Computer Architecture ECE 201.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Analogy: Gotta Do Laundry
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Pipelining Example Laundry Example: Three Stages
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)
Pipelining CS365 Lecture 9. D. Barbara Pipeline CS465 2 Outline  Today’s topic  Pipelining is an implementation technique in which multiple instructions.
CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.
EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch.
1. Convert the RISCEE 1 Architecture into a pipeline Architecture (like Figure 6.30) (showing the number data and control bits). 2. Build the control line.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
Lecture 18: Pipelining I.
CMSC 611: Advanced Computer Architecture
ECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design
CpE 442 Designing a Pipeline Processor (lect. II)
Chapter 4 The Processor Part 2
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Lecturer: Alan Christopher
CS152 – Computer Architecture and Engineering Lecture 11 –
An Introduction to pipelining
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
CS152 Computer Architecture and Engineering Lecture 14 Pipelining Control Continued Introduction to Advanced Pipelining.
Pipelining Appendix A and Chapter 3.
Give qualifications of instructors: DAP
Introduction to Computer Organization and Architecture
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
CMCS Computer Architecture Lecture 20 Pipelined Datapath and Control April 11, CMSC411.htm Mohamed.
Recall: Performance Evaluation
Presentation transcript:

ceg3420 L1 4.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining

ceg3420 L1 4.2 DAP Fa97,  U.CB Recap: Sequential Laundry °Sequential laundry takes 8 hours for 4 loads °If they learned pipelining, how long would laundry take? 30 TaskOrderTaskOrder B C D A Time 30 6 PM AM

ceg3420 L1 4.3 DAP Fa97,  U.CB Recap: Pipelining Lessons (its intuitive!) °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks operating simultaneously using different resources °Potential speedup = Number pipe stages °Pipeline rate limited by slowest pipeline stage °Unbalanced lengths of pipe stages reduces speedup °Time to “fill” pipeline and time to “drain” it reduces speedup °Stall for Dependences 6 PM 789 Time B C D A 30 TaskOrderTaskOrder

ceg3420 L1 4.4 DAP Fa97,  U.CB Recap: Ideal Pipelining IFDCDEXMEMWB IFDCDEXMEMWB IFDCDEXMEMWB IFDCDEXMEMWB IFDCDEXMEMWB Maximum Speedup  Number of stages Speedup  Time for unpipelined operation Time for longest stage Example: 40ns data path, 5 stages, Longest stage is 10 ns, Speedup  4 Assume instructions are completely independent!

ceg3420 L1 4.5 DAP Fa97,  U.CB Recap: Graphically Representing Pipelines °Can help with answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand datapaths

ceg3420 L1 4.6 DAP Fa97,  U.CB Recap: Can pipelining get us into trouble? °Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time -e.g., multiple memory accesses, multiple register writes -solutions: multiple memories, stretch pipeline control hazards: attempt to make a decision before condition is evaulated -e.g., any conditional branch -solutions: prediction, delayed branch data hazards: attempt to use item before it is ready -e.g., add r1,r2,r3; sub r4, r1,r5; lw r6, 0(r7); or r8, r6,r9 -solutions: forwarding/bypassing, stall/bubble

ceg3420 L1 4.7 DAP Fa97,  U.CB Recap: Pipelined Datapath with Data Stationary Control npc I mem Regs B alu S D mem m IAU PC lw $2,20($5) Regs Operand Register Selects ALU Op MEM Op Result Reg Select and Enable Just like Time-State! A imoprwn <= PC immed

ceg3420 L1 4.8 DAP Fa97,  U.CB Recap °Pipelining is a fundamental concept multiple steps using distinct resources °Utilize capabilities of the Datapath by pipelined instruction processing start next instruction while working on the current one limited by length of longest stage (plus fill/flush) detect and resolve hazards °What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores °Hazards make it hard °We’ll build a simple pipeline and look at these issues

ceg3420 L1 4.9 DAP Fa97,  U.CB °The Five Classic Components of a Computer °Today’s Topics: Recap last lecture Pipelined Control/ Do it yourself Pipelined Control Administrivia Hazards/Forwarding Exceptions Review MIPS R3000 pipeline Advanced Pipelining? The Big Picture: Where are We Now? Control Datapath Memory Processor Input Output

ceg3420 L DAP Fa97,  U.CB Recap: Control Diagram IR <- Mem[PC]; PC < PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Acces s Data Mem ABS Reg File Equal PC Next PC IR Inst. Mem D M <– S M

ceg3420 L DAP Fa97,  U.CB But recall use of “Data Stationary Control” °The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/DecExecMem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemWr Wr

ceg3420 L DAP Fa97,  U.CB Datapath + Data Stationary Control Exec Reg. File Mem Acces s Data Mem ABS Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt op rs rt fun im ex me wb rw v me wb rw v wb rw v

ceg3420 L DAP Fa97,  U.CB Let’s Try it Out 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 these addresses are octal

ceg3420 L DAP Fa97,  U.CB Start: Fetch 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 nn nn IF PC Next PC 10 =

ceg3420 L DAP Fa97,  U.CB Fetch 14, Decode 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 n nn lw r1, r2(35) ID IF PC Next PC 14 =

ceg3420 L DAP Fa97,  U.CB Fetch 20, Decode 14, Exec 10 Exec Reg. File Mem Acces s Data Mem r2 BS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt 35 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 nn lw r1 addI r2, r2, 3 ID IF EX PC Next PC 20 =

ceg3420 L DAP Fa97,  U.CB Fetch 24, Decode 20, Exec 14, Mem 10 Exec Reg. File Mem Acces s Data Mem r2 B r2+35 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 n lw r1 sub r3, r4, r5 addI r2, r2, 3 ID IF EX M PC Next PC 24 =

ceg3420 L DAP Fa97,  U.CB Administrative Issues °Schedule Ahead °Course Feedback Like on-line lecture notes!! pace of class!! Like Computers in the news!! Prerequisite Quiz? 39 great, 2 so-so, 1 bad idea Online Submission? Spread TA office hours? Slow lectures last 20 minutes? °Computers in the news: Alpha/Intel patent scabble to be settled this week? 8 M T W T F final report proj present pipeline (5)cache(6)xtra & writeup midterm M T W T F 16 last lecture

ceg3420 L DAP Fa97,  U.CB Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 Exec Reg. File Mem Acces s Data Mem r4 r5 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M[r2+35] 67 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 beq r6, r7 100 addI r2 sub r3 ID IF EX M WB PC Next PC 30 = Note Delayed Branch: always execute ori after beq

ceg3420 L DAP Fa97,  U.CB Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 Exec Reg. File Mem Acces s Data Mem r6 r7 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 9xx 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 beq addI r2 sub r3 r4-r5 100 ori r8, r9 17 ID IF EX M WB PC Next PC 100 =

ceg3420 L DAP Fa97,  U.CB Fetch 104, Dcd 100, Ex 30, Mem 24, WB 20 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 ID EX M WB PC Next PC ___ = Fill it in yourself! ?

ceg3420 L DAP Fa97,  U.CB Fetch 110, Dcd 104, Ex 100, Mem 30, WB 24 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 EX M WB PC Next PC ___ = Fill it in yourself! ?? ? ?

ceg3420 L DAP Fa97,  U.CB Fetch 114, Dcd 110, Ex 104, Mem 100, WB 30 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 M WB PC Next PC ___ = Fill it in yourself! ?? ? ? ? ?

ceg3420 L DAP Fa97,  U.CB Pipeline Hazards Again I-Fet ch DCD MemOpFetch OpFetch Exec Store IFetch DCD ° ° ° Structural Hazard I-Fet ch DCD OpFetch Jump IFetch DCD ° ° ° Control Hazard IF DCD EX Mem WB IF DCD OF Ex Mem RAW (read after write) Data Hazard WAW Data Hazard (write after write) IF DCD OF Ex RSWAR Data Hazard (write after read) IF DCD EX Mem WB

ceg3420 L DAP Fa97,  U.CB Data Hazards °Avoid some “by design” eliminate WAR by always fetching operands early (DCD) in pipe eleminate WAW by doing all WBs in order (last stage, static) °Detect and resolve remaining ones stall or forward (if possible) IF DCD EX Mem WB IF DCD OF Ex Mem RAW Data Hazard WAW Data Hazard IF DCD OF Ex RSRAW Data Hazard IF DCD EX Mem WB

ceg3420 L DAP Fa97,  U.CB Hazard Detection °Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. °A RAW hazard exists on register  if  Rregs( i )  Wregs( j ) Keep a record of pending writes (for inst's in the pipe) and compare with operand regs of current instruction. When instruction issues, reserve its result register. When on operation completes, remove its write reservation. °A WAW hazard exists on register  if  Wregs( i )  Wregs( j ) °A WAR hazard exists on register  if  Wregs( i )  Rregs( j )

ceg3420 L DAP Fa97,  U.CB Record of Pending Writes °Current operand registers °Pending writes °hazard <= ((rs == rw ex) & regW ex ) OR ((rs == rw mem) & regW me ) OR ((rs == rw wb) & regW wb ) OR ((rt == rw ex) & regW ex ) OR ((rt == rw mem) & regW me ) OR ((rt == rw wb ) & regW wb ) npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt

ceg3420 L DAP Fa97,  U.CB Resolve RAW by forwarding °Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Data Forwarding = Data Bypassing npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt Forward mux

ceg3420 L DAP Fa97,  U.CB What about memory operations? ° If instructions are initiated in order and operations always occur in the same stage, there can be no hazards between memory operations! ° What does delaying WB on arithmetic operations cost? – cycles ? – hardware ? ° What about data dependence on loads? R1 <- R4 + R5 R2 <- Mem[ R2 + I ] R3 <- R2 + R1 => AB op Rd Ra Rb Rd to reg file R T Rd "Delayed Loads"

ceg3420 L DAP Fa97,  U.CB Compiler Avoiding Load Stalls:

ceg3420 L DAP Fa97,  U.CB What about Interrupts, Traps, Faults? °External Interrupts: Allow pipeline to drain, Load PC with interupt address °Faults (within instruction, restartable) Force trap instruction into IF disable writes till trap hits WB must save multiple PCs or PC + state Refer to MIPS solution

ceg3420 L DAP Fa97,  U.CB Exception Handling npc I mem Regs B alu S D mem m IAU PC lw $2,20($5) Regs A imoprwn detect bad instruction address detect bad instruction detect overflow detect bad data address Allow exception to take effect

ceg3420 L DAP Fa97,  U.CB Exception Problem °Exceptions/Interrupts: 5 instructions executing in 5 stage pipeline How to stop the pipeline? Restart? Who caused the interrupt? StageProblem interrupts occurring IFPage fault on instruction fetch; misaligned memory access; memory-protection violation IDUndefined or illegal opcode EXArithmetic exception MEMPage fault on data fetch; misaligned memory access; memory-protection violation; memory error °Load with data page fault, Add with instruction page fault? °Solution 1: interrupt vector/instruction 2: interrupt ASAP, restart everything incomplete

ceg3420 L DAP Fa97,  U.CB Resolution: Freeze above & Bubble Below npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt bubble freeze

ceg3420 L DAP Fa97,  U.CB FYI: MIPS R3000 clocking discipline °2-phase non-overlapping clocks °Pipeline stage is two (level sensitive) latches phi1 phi2 phi1 phi2 Edge-triggered

ceg3420 L DAP Fa97,  U.CB MIPS R3000 Instruction Pipeline Inst Fetch Decode Reg. Read ALU / E.AMemoryWrite Reg TLB I-Cache RF Operation WB E.A. TLB D-Cache TLB I-cache RF ALUALU TLB D-Cache WB Resource Usage Write in phase 1, read in phase 2 => eliminates bypass from WB

ceg3420 L DAP Fa97,  U.CB Recall: Data Hazard on r1 With MIPS R3000 pipeline, no need to forward from WB stage

ceg3420 L DAP Fa97,  U.CB MIPS R3000 Multicycle Operations Ex: Multiply, Divide, Cache Miss Stall all stages above multicycle operation in the pipeline Drain (bubble) stages below it Use control word of local stage state to step through multicycle operation AB op Rd Ra Rb mul Rd Ra Rb Rd to reg file R T Rd

ceg3420 L DAP Fa97,  U.CB Summary °Pipelines pass control information down the pipe just as data moves down pipe °Forwarding/Stalls handled by local control °Exceptions stop the pipeline °MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load) °More performance from deeper pipelines, parallelism