Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 1 Lecture 6 Introduction to Pipelining.

Similar presentations


Presentation on theme: "Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 1 Lecture 6 Introduction to Pipelining."— Presentation transcript:

1 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 1 Lecture 6 Introduction to Pipelining

2 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 2 Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold A BCD Pipelining: Its Natural! Washer takes 30 minutes Dryer takes 40 minutes Folder takes 20 minutes

3 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 3 Sequential Laundry TaskOrderTaskOrder 304020304020304020304020 6 PM 789 10 11 Midnight Time If they learned pipelining, how long would laundry take? Sequential laundry takes 6 hours for 4 loads A 90 B C D

4 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 4 Pipelined Laundry Start Work ASAP TaskOrderTaskOrder 3040 20 6 PM 789 10 11 Midnight Time Pipelined laundry takes 3.5 hours for 4 loads A 90 B C D

5 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 5 Pipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate is limited by the slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduce speedup Time to “fill” pipeline and time to “drain” it reduces speedup TaskOrderTaskOrder 6 PM 789 Time 3040 20 A B C D Filling Draining

6 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 6 DLX Instructions Instruction type/ Instruction meaning Opcode Data transfers Only memory address mode is 16-bit disp + contents of a GPR LB, LBU, SB Load byte, load byte unsigned, store byte LH, LHU, SH Half word LW, SW Word(to/from integer registers) LF, LD, SF, SD Load SP float, load DP float, store SP float, store DP float MOVI2S, MOVS2I Move from/to GPR to/from a special register MOVF, MOVD Copy one FP register or a DP pair to another register or pair MOVFP2I, MOVI2FP Move 32 bits from/to FP registers to/from integer registers Arithmetic/logical ADD, ADDI, ADDU, ADDUI Add, add immediate(16 bits); signed and unsigned SUB, SUBI, SUBU, SUBUI Subtract MULT, MULTU, DIV, DIVU Multiply and divide, signed and unsigned; operands must be FP regs; all operations take and yield 32-bit values AND, ANDI And, and immediate OR, ORI, XOR, XORIOR, Exclusive-OR LHI Load high immediate --- load upper half of register with immediate

7 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 7 DLX instructions Shift SLL, SRL, SRA, SLLI,Shifts: both immediate(S__I) and variable form (S__); logical, arithmetic SRLI, SRAI S__, S__ISet conditional: “__” may be LT, GT, LE, GE, EQ, NE Control Conditional branches and jumps; PC-relative or through register BEQZ, BNEZBranch GPR equal/not equal to zero: 16-bit offset from PC+4 BFPT, BFPF Test comparison bit in the FP status register and branch; 16-bit offset J, JRJumps:26-bit offset or target in register JAL, JALRJump and link: save PC+4 in R31 TRAPTransfer to operating system at a vectored address RFE Return to user code from an exception; restore user mode Floating pointFP operations on DP and SP format FcnD, FcnF Fcn: ADD, SUB, MULT, DIV CVTF2D, CVTF2I,Convert instructions: F single precision, D double precision, I integer CVTD2F, CVTD2I,Both operands are FPRs CVTI2F, CVTI2D, __D, __FDP and SP compares: “__” = LT, GT, LE, GE, EQ, NE; sets bits in FP status register

8 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 8 DLX Instruction Format Opcode rs1 rdImmediate 6 55 16 I - type instruction Loads, stores, all immediates, conditional branches, Jump register, jump and link reg 6 55 R - type instruction 5 11 Opcode rs1 rs2 rd func Register-register ALU operations: Func - Add, Sub,... Opcode 6 J - type instruction Offset added to PC 26 Jump and Jump and link, trap and return from exception

9 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 9 5 Steps of DLX Instr. Execution: Step1 Step 1:Instruction fetch cycle (IF) –Read instruction from memory and store into IR IR  Mem[PC] –Calculate the next instruction address NPC  PC+4 1 instruction is stored in consecutive 4 bytes Instr. Memory PC Add +4 NPC IR

10 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 10 5 Steps of DLX Instr. Execution: Step2 Step 2: Instruction decode/register fetch cycle (ID) –Read source registers to A and B A  Regs[IR 6..10 ] B  Regs[IR 11..15 ] –Make 16 bits sign extension of 16-bit immediate field to make a 32-bit immediate value Imm  ((IR 16 ) 16 ## IR 16..31 ) –Decoding is done in parallel: fixed-field decoding b  Rd Sign Ext Reg File 16 32 IR A B Imm b Rd OP

11 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 11 5 Steps of DLX Instr. Execution: Step 3 Step 3:Execution/effective address cycle (EX): – Memory reference: Effective Address calculation » ALUOutput  A + Imm – Register-register ALU instruction: Perform ALU operation with R’s » ALUOutput  A func B; func B – Register-Immediate ALU instruction: Perform ALU operation with immediate operand » ALUOutput  A op Imm – Branch: Effective Address calculation for branch target address Determine condition code » ALUOutput  NPC + Imm; Cond  (A op 0)

12 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 12 Step 3 EX Zero? MUX ALU NPC A B Imm ALUOut Cond OP

13 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 13 5 Steps of DLX Instr. Execution: Step 4 Step 4:Memory access/branch completion cycle (MEM): –Memory reference : Access memory either for LD: LMD  Mem[ALUOutput] or for ST: Mem[ALUOutput]  B –Branch : Test Condition if (cond) PC  ALUOutput, else PC  NPC; Data Memory MUX ALUOut NPC Cond PC B LMD

14 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 14 5 Steps of DLX Instr. Execution: Step 5 Step 5:Write-back cycle (WB): Reg-Reg ALU : Store the result into the destination register Regs[IR 16..20 ]  ALUOutput; Reg-Immediate ALU : Store the result into destination register Regs[IR 11..15 ]  ALUOutput; Load instruction: Store the data read from memory to the destination register Regs[IR 11..15 ]  LMD; MUX LMD ALUOut Register File OP

15 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 15 5 Steps of DLX Datapath MEM Stage WB Stage IF Stage ID StageEX Stage Instr. Memory Sign Ext Zero? Data Memory PC MUX Add ALU Reg File +4 16 32 SMD ALU Output LMD

16 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 16 A Simple Implementation A multi-cycle implementation –needs temporary registers-- NPC, IC, A, B, Imm, Cond, ALUOutput, LMD –CPI improvements: Branch - 4 cycles, ALU - 4 cycles if brach freq : 12 %, ALU instr. freq : 44% CPI = 0.12 x 4 + 0.44 x 4 + 0.44 x 5 = 4.44 A single-cycle implementation –one long clock cycle –very inefficient for most machines that have a reasonable variation among the amount of work –requires the duplication of FU that could be shared in a multi-cycle implementation MR-instructions

17 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 17 Visualizing Pipeline IM Reg ALU DMReg IM Reg ALU DMReg IM Reg ALU DMReg IM Reg ALU DMReg IM Reg ALU DMReg Instruction Order Time(clock cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Filling Draining

18 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 18 Saving Information Produced by Each Stage of Pipeline Information need to be stored at the end of a clock cycle, otherwise it will be lost Each pipeline stage produces information(data, address, and control) at the end of the clock cycle Thus, we need a storage(called inter-stage buffer) at end of each pipeline stage

19 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 19 F/D Buffer –IR, NPC D/A Buffer –A, B, Imm, b(destination Reg address to store result), OP(OP-code), cond –NPC A/M Buffer –ALUout(arithmetic result or effective address) –NPC, cond, b, OP M/W Buffer –LMD(data for LD) –ALUout(arithmetic result), b, OP Inter-Stage Buffer in DLX Pipeline

20 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 20 Pipelined DLX Datapath - Multicycle - IF Stage Instr. Memory PC Add +4 MEM Stage EX Stage Zero? MUX ALU SMD Data Memory WB Stage MUX LMD ID Stage Sign Ext Reg File 16 32 MUX F/D BufferD/A BufferA/M Buffer M/W Buffer F/D Buffer

21 Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 21 Reminder In conventional Single Port Memory, Instruction Memory and Data Memory are the same memory –Both IF and Mem stages use memory –One instruction uses the same hardware resource in two different cycles –Two instructions try to use the same hardware resource in different stages of pipeline at the same time For Branch instructions, Branch Target Address is available in the Mem stage


Download ppt "Introduction to PipelineCS510 Computer ArchitecturesLecture 6 - 1 Lecture 6 Introduction to Pipelining."

Similar presentations


Ads by Google