Download presentation
Presentation is loading. Please wait.
Published byMaurice Cummings Modified over 9 years ago
1
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining
2
OCC - CS/CIS CS116-Ch00-Orientation 2 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 2 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Topics Explaining pipelining through example CPU pipelining MIPS instructions & pipelining Pipeline hazards Recent trends in performance
3
OCC - CS/CIS CS116-Ch00-Orientation 3 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 3 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Introduction Still in CPU’s Datapath Control Datapath Memory Processor Input Output
4
OCC - CS/CIS CS116-Ch00-Orientation 4 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 4 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) What is pipelining? Implementation technique in which multiple instructions are overlapped in execution Real-life pipelining examples? – Laundry – Factory production lines – Traffic?? Key to making processors fast
5
OCC - CS/CIS CS116-Ch00-Orientation 5 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 5 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Laundry You have 4 loads of cloths to wash: Steps (stages) required: – Wash – Dry – Fold – Store clothes into drawers Maybe let your roommate help here Each stage needs 30 minutes We can’t start the next step until the previous step is finished ABCD
6
OCC - CS/CIS CS116-Ch00-Orientation 6 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 6 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Laundry There are 2 approaches to do this job: – Sequential (non-pipelined): Wait until the first load is put away in order to start the next load – Pipelined (ASAP): As soon as the washer is empty, start putting the next load, while the first load is put into dryer
7
OCC - CS/CIS CS116-Ch00-Orientation 7 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 7 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Laundry Sequential – Needs 8 hours for 4 loads A B C D T i me 7 6 P M 89 1 0 11 12 1 2 AM T a sk order
8
OCC - CS/CIS CS116-Ch00-Orientation 8 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 8 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Laundry Pipelined Laundry: – Start work ASAP – Needs only 3.5 hours for 4 loads!
9
OCC - CS/CIS CS116-Ch00-Orientation 9 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 9 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelined Laundry Observations: At some point, all stages of washing will be operating concurrently Pipelining doesn’t reduce number of stages – Doesn’t help latency of single task – Does help throughput of entire workload As long as we have separate resources, we can pipeline the tasks Multiple tasks operating simultaneously use different resources
10
OCC - CS/CIS CS116-Ch00-Orientation 10 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 10 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelined Laundry Observations: Speedup due to pipelining depends on the number of stages in the pipeline Pipeline rate limited by slowest pipeline stage – If dryer needs 45 min, time for all stages has to be 45 min to accommodate it – Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup If one load depends on anther, we will have to wait (Delay/Stall for Dependencies)
11
OCC - CS/CIS CS116-Ch00-Orientation 11 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 11 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Problems If we have washer-dryer combination instead of separate washer & dryer, we can’t make the two steps in parallel If your roommate is busy doing something else, you will have to add a stage of going to the drawer and putting your stuff yourself – One resource for folding – Another resource for put away => Structural Hazard
12
OCC - CS/CIS CS116-Ch00-Orientation 12 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 12 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Problems If you are cleaning the uniform of a football team, you need to determine whether the detergent & water temperature settings are ok or should be changed, depending on how dirty are the uniforms – Stall ? Slow Wait until you get the dry uniforms to see if you need to change setup. Repeat until you have the right formula – Continue with the next load & check formula later? => Control hazards
13
OCC - CS/CIS CS116-Ch00-Orientation 13 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 13 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Problems In the folding process, if you discover that the load of socks you have need the mate of each of the socks that is still the the washer, you will have to wait until they dry in order to be able to fold => Data hazards
14
OCC - CS/CIS CS116-Ch00-Orientation 14 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 14 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Can we pipeline instruction execution? For the following instructions, which resources do you need for each of these steps? – store/ load word – add/ subtract/ and/ or/ slt – branch if equal
15
OCC - CS/CIS CS116-Ch00-Orientation 15 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 15 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Review: 5 stages of a MIPS instruction – Fetch instruction from instruction memory – Read registers while decoding instruction – Execute operation or calculate address, depending on the instruction type – Access an operand from data memory – Write result into a register We can reduce the cycles to fit the stages. Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWr Load
16
OCC - CS/CIS CS116-Ch00-Orientation 16 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 16 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Review: Datapath resources
17
OCC - CS/CIS CS116-Ch00-Orientation 17 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 17 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Example: Resources for Load Instruction – Fetch instruction from instruction memory (Ifetch) Instruction memory (IM) – Read registers while decoding instruction(Reg/Dec) Register file & decoder (Reg) – Execute operation or calculate address ALU – Access an operand from data memory (Mem) Data memory (DM) – Write result into a register (Wr) Register file (Reg)
18
OCC - CS/CIS CS116-Ch00-Orientation 18 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 18 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Note that accessing source & destination registers is performed in two different parts of the cycle We need to decide upon which part of the cycle should reading and writing to the register file take place.
19
OCC - CS/CIS CS116-Ch00-Orientation 19 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 19 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining
20
OCC - CS/CIS CS116-Ch00-Orientation 20 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 20 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining: Example Assumptions: – Only consider the following instructions: lw, sw, add, sub, and, or, slt, beq – Operation times for instruction classes are: Memory access 2 ns ALU operation 2 ns Register file read or write 1 ns – Use a single- cycle model – Clock cycle must accommodate the slowest instruction (2 ns) – Both pipelined & non-pipelined approaches use the same HW components
21
OCC - CS/CIS CS116-Ch00-Orientation 21 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 21 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining: Example Single-Cycle, non-pipelined execution Total time for 3 instructions: 24 ns
22
OCC - CS/CIS CS116-Ch00-Orientation 22 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 22 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining: Example Single-cycle, pipelined execution – Improve performance by increasing throughput Total time for 3 instructions = 14 ns Each instruction adds 2 ns to total execution time – Stage time limited by slowest resource (2 ns)
23
OCC - CS/CIS CS116-Ch00-Orientation 23 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 23 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Example: Instructions total times: Assumption: – No delay in the following processor units: MUX, Control unit, PC access, Sign-extend units Slowest resources: – Instruction memory – ALU – Data memory
24
OCC - CS/CIS CS116-Ch00-Orientation 24 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 24 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Example: Theoretically: – Speedup should be equal to number of stages Practically: – Stages are imperfectly balanced – Pipelining needs overhead – Speedup less than number of stages If we have 3 consecutive instructions – Non-pipelined needs 8 x 3 = 24 ns – Pipelined needs 14 ns => Speedup = 24 / 14 = 1.7
25
OCC - CS/CIS CS116-Ch00-Orientation 25 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 25 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Example: If we have 1003 consecutive instructions – Add more time for 1000 instruction (i.e. 1003 instruction)on the previous example Non-pipelined total time= 1000 x 8 + 24 = 8024 ns Pipelined total time = 1000 x 2 + 14 = 2014 ns => Speedup ~ 3.98 ~(8 ns / 2 ns) ~ near perfect speedup => Performance increases for larger number of instructions – Higher throughput
26
OCC - CS/CIS CS116-Ch00-Orientation 26 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 26 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining MIPS Instruction Set MIPS was designed with pipelining in mind => Pipelining is easy in MIPS: 1.All instruction are the same length 2.Limited instruction format 3.Memory operands appear only in lw & sw 4.Operands must be aligned in memory
27
OCC - CS/CIS CS116-Ch00-Orientation 27 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 27 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining MIPS Instruction Set 1) All MIPS instruction are the same length – Fetch instruction in 1st pipeline stage – Decode instructions in 2nd stage – If instruction length varies (e.g. 80x86), pipelining will be more challenging
28
OCC - CS/CIS CS116-Ch00-Orientation 28 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 28 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining MIPS Instruction Set 2) MIPS has limited instruction format – Source register in the same place for each instruction Symmetry – 2nd stage can begin reading at the same time as decoding – If instruction format wasn’t symmetric, stage 2 should be split into 2 distinct stages => Total stages = 6 (instead of 5)
29
OCC - CS/CIS CS116-Ch00-Orientation 29 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 29 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining MIPS Instruction Set 3) Memory operands appear only in lw & sw – We can use the execute stage to calculate memory address – Access memory in the next stage – If we needed to operate on operands in memory (e.g. 80x86), stages 3 & 4 would expand to Address calculation Memory access Execute
30
OCC - CS/CIS CS116-Ch00-Orientation 30 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 30 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining MIPS Instruction Set 4) Operands must be aligned in memory – Transfer of more than one data operand can be done in a single stage with no conflicts – Need not worry about single data transfer instruction requiring 2 data memory accesses – Requested data can be transferred between the CPU & memory in a single pipeline stage
31
OCC - CS/CIS CS116-Ch00-Orientation 31 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 31 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelined Datapath Basic Idea: – Resources for Different Stages should be Independent – Add registers & other HW to split datapath into stages I n st r u c on m e mo r y Ad d r es s 4 32 0 Add A dd r e sut Sh f t l e f t2 I n s r uct i o n M u x 0 1 Add P C 0 W r e d ata M u x 1 R eg ist e r s Re a d d ata 1 Re a d d ata 2 R e ad eg ist e r 1 R e a d r eg ist e r 2 1 6 S ig n e x t end W rit e r eg ist e r W ri t e d ata R e a d d ata Add r e s s Data mem o r y 1 AL U r e s u lt M u x A L U Z e r o I F : I n st ru c ti o n fetc h I D: I n st ruc t i o n d eco d e/ re g is t er f i le r e ad E X : Exec ut e / addr ess c al c u l ati on M E M: M e m o r y a cce s s WB: Wri te b a ck Registers should be here
32
OCC - CS/CIS CS116-Ch00-Orientation 32 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 32 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelined Datapath Example: add Instruction Terminology: – Shading indicates that the element is used Shading on the right means reading Shading on the left means writing – For register file Hashed means not used at the corresponding half of the cycle add $s0, $t0, $t1 T i me 2468 1 0 I F ID WB EX MEM ReadWrite
33
OCC - CS/CIS CS116-Ch00-Orientation 33 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 33 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelined Datapath Graphically Representing Pipelines Can help with answering questions like: – How many cycles does it take to execute this code? – What is the ALU doing during cycle 4?
34
OCC - CS/CIS CS116-Ch00-Orientation 34 1998 Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 34 OCC-CS 116 Fall 2003 1998 Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelined Datapath Another way to represent pipelining IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB Program Flow Time
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.