Download presentation
Presentation is loading. Please wait.
Published byEmmanuel Ramsell Modified over 9 years ago
1
1 IKI20210 Pengantar Organisasi Komputer Kuliah no. 25: Pipeline 10 Januari 2003 Bobby Nazief (nazief@cs.ui.ac.id) Johny Moningka (moningka@cs.ui.ac.id) bahan kuliah: http://www.cs.ui.ac.id/~iki20210/ Sumber: 1. Hamacher. Computer Organization, ed-4. 2. Materi kuliah CS152, th. 1997, UCB.
2
2 Pipeline Salah Satu Cara Mempercepat Eksekusi Instruksi
3
3 Pipelining is Natural! °Laundry Example °Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold °Washer takes 30 minutes °Dryer takes 40 minutes °“Folder” takes 20 minutes ABCD
4
4 Sequential Laundry °Sequential laundry takes 6 hours for 4 loads °If they learned pipelining, how long would laundry take? ABCD 304020304020304020304020 6 PM 789 10 11 Midnight TaskOrderTaskOrder Time
5
5 Pipelined Laundry: Start work ASAP °Pipelined laundry takes 3.5 hours for 4 loads ABCD 6 PM 789 10 11 Midnight TaskOrderTaskOrder Time 3040 20
6
6 Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Pipeline rate limited by slowest pipeline stage °Multiple tasks operating simultaneously using different resources °Potential speedup = Number pipe stages °Unbalanced lengths of pipe stages reduces speedup °Time to “fill” pipeline and time to “drain” it reduce speedup °Stall for Dependences ABCD 6 PM 789 TaskOrderTaskOrder Time 3040 20
7
7 Pipelining Instruction Execution
8
8 Kilas Balik: Tahapan Eksekusi Instruksi Instruksi: AddR1,(R3); R1 R1 + M[R3] Langkah-langkah: 1.Fetch instruksi 1.PC out, MAR in, Read, Clear Y, Set carry-in to ALU, Add, Z in 2.Z out, PC in, WMFC 3.MDR out, IR in 2.Fetch operand #1 (isi lokasi memori yg ditunjuk oleh R3) 4.R3 out, MAR in, Read 5.R1 out, Y in, WMFC 3.Lakukan operasi penjumlahan 6.MDR out, Add, Z in 4.Simpan hasil penjumlahan di R1 7.Z out, R1 in, End
9
9 The Five Stages of (MIPS) Load Instruction °Ifetch: Instruction Fetch °Reg/Dec: Registers Fetch and Instruction Decode °Exec: Calculate the memory address °Mem: Read the data from the Data Memory °Wr: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrLoad Load/Store Architecture: access to/from memory only by Load/Store instructions
10
10 Pipelined Execution IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB Program Flow Time °Overlapping instruction execution °Maximum number instructions executed simultaneously = number of stages
11
11 Why Pipeline? °Non-pipeline machine 10 ns/cycle x 4.6 CPI (due to instr mix) x 100 inst = 4600 ns °Ideal pipelined machine 10 ns/cycle x (4 cycle fill + 1 CPI x 100 inst) = 1040 ns Clk Cycle 1 Non-pipeline Implementation: IfetchRegExecMemWr Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipeline Implementation: IfetchRegExecMemWrStore Ifetch R-type IfetchRegExecMemWrR-type
12
12 Why Pipeline? Because the resources are there! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg
13
13 Restructuring Datapath
14
14 Partitioning the Datapath (1/2) PC Next PC Operand Fetch Exec Reg. File Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr nPC_sel RegWr MemWr MemRd °Add registers between smallest steps Store Instruction Store Source (Register) Operands Store Results Store Read-Data (from Memory)
15
15 Partitioning the Datapath (2/2) ALU Reg. File Mem Access Data Mem A B R M Reg File Equal PC Next PC IR Inst. Mem Valid IRexe Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrLoad
16
16 Pipeline Hazards
17
17 Can pipelining get us into trouble? °Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time -E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready -E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer -instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaluated -E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in -branch instructions °Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards
18
18 Mem Single Memory is a Structural Hazard I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg ALU Mem Reg MemReg Detection is easy in this case! (right half highlight means read, left half write)
19
19 °Stall: wait until decision is clear Its possible to move up decision to 2nd stage by adding hardware to check registers as being read °Impact: 2 clock cycles per branch instruction => slow Control Hazard Solutions I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg Mem
20
20 °Predict: guess one direction then back up if wrong Predict not taken °Impact: 1 clock cycles per branch instruction if right, 2 if wrong (right 50% of time) °More dynamic scheme: history of 1 branch ( 90%) Control Hazard Solutions I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg
21
21 °Redefine branch behavior (takes place after next instruction) “delayed branch” °Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” ( 50% of time) °As launch more instruction per clock cycle, less useful Control Hazard Solutions I n s t r. O r d e r Time (clock cycles) Add Beq Misc ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg Load Mem ALU Reg MemReg
22
22 Data Hazard on r1 add r1,r2,r3 sub r4, r1,r3 and r6, r1,r7 or r8, r1,r9 xor r10, r1,r11
23
23 Dependencies backwards in time are hazards Data Hazard on r1: I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFID/RFEXMEMWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg
24
24 “Forward” result from one stage to another Data Hazard Solution: I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFID/RFEXMEMWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg
25
25 Forwarding Structure °Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Data Forwarding = Data Bypassing npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt Forward mux
26
26 Dependencies backwards in time are hazards Can’t solve with forwarding: Must delay/stall instruction dependent on loads Forwarding (or Bypassing): What about Loads Time (clock cycles) lw r1,0(r2) sub r4,r1,r3 IFID/RFEXMEMWB ALU Im Reg Dm Reg ALU Im Reg DmReg
27
27 Execution Delay/Stall Time (clock cycles) lw r1,0(r2) no-op IFID/RFEXMEMWB ALU Im Reg Dm Reg sub r4,r1,r3 ALU Im Reg DmReg
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.