Download presentation
Presentation is loading. Please wait.
Published byMiranda Curtis Modified over 8 years ago
1
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr. M. Ashraf Chughtai
2
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 2 Recap: Lecture 9 Single cycle verses multi cycle datapath Key components of multi cycle data path Design and information flow in multi cycle data path Multi cycle control unit design Finite State Machine–based control Unit Microprogram-based controller
3
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 3 What is pipelining? Pipelining is a fundamental concept It utilizes capabilities of the Datapath by
4
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 4 Pipelining is Natural! Laundry Example! Four loads: A, B, C, D Four laundry operations: Wash, Dry, fold and place into drawers Washer takes 30 minutes Dryer takes 30 minutes “Folder” takes 30 minutes “Stasher” takes 30 minutes to put clothes into drawers ABCD
5
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 5 Sequential Laundry 30 TaskOrderTaskOrder B C D A Time 30 6 PM 7 8 9 10 11 12 1 2 AM Explanation next please ……………..
6
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 6 Pipelined Laundry: Start work ASAP Pipelined laundry takes 3.5 hours for 4 loads! TaskOrderTaskOrder 12 2 AM 6 PM 7 8 9 10 11 1 Time 30 A B C D
7
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 7 Features of Pipelined Processor All the functional units operate independently All the functional units operate independently Multiple tasks operating simultaneously using different resources Multiple tasks operating simultaneously using different resources Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipelining doesn’t help latency of single task, it helps throughput of entire workload Potential speedup = Number pipe stages ……… Cont’d Next please!
8
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 8 Pipelining Lessons Pipeline rate limited by: - Slowest pipeline stage - Time to “fill” pipeline and time to “drain” it reduces speedup - Unbalanced lengths of pipe stages reduces speedup If washer takes longer time than the dryer then dryer has to wait! Stall for Dependences
9
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 9 Five Steps of Datapath Ins. fetch Dec/RegExecMemWr
10
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 10 Pipelined Processor Design PC Next PC IR Inst. Mem A B Reg File IRex Dcd Ctrl Exec S IRmem Ex Ctrl Reg. File Equal WB Ctrl Mem Access Data Mem M IRwb Mem Ctrl Instruction Fetch ID/Register Read Execute/ Address Memory Rd/Wrt Write Back (Reg. Wrt)
11
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 11 Pipeline Control IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Instruction Fetch ID/Reg. Rd Exe/Address Memory Rd/Wrt Reg. Wrt (WB)
12
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 12 Pipelined Registers Included A B Reg File IRex Dcd Ctrl Exec S IRmem Ex Ctrl Reg. File Equal WB Ctrl Mem Access Data Mem M I Rwb Mem Ctrl PC Next PC IR Inst. Mem Instruction Fetch ID/Register Read Execute/ Address Memory Rd/Wrt Write Back (Reg. Wrt)
13
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 13 Five Steps as Stages of Pipeline. Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrLoad
14
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 14 Multiple Cycle verses Pipeline – Pipeline enhances performance 5 6 7 8 9 10 Clk Cycle 2 3 4 1 12 13 14 11 Multiple Cycle Implementation: IfetchRegExecMemWrIfetchRegExecMem LoadStore Ifetch R-type RegExecMem Load IfetchRegExecMemWr Pipeline Implementation: IfetchRegExecMemWr Store IfetchRegExecMemWr R-type Explanation next slide…….
15
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 15 3 Instructions program reconsideredLoadStore R-type (ADD)
16
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 16Example The cycle time of a single cycle machine is 45 ns, and of multi cycle and pipelined machines is 10 ns; and average CPI due to instruction mix on multi cycle machine is 4.6. What is the execution time on each type of machine? Ans: Single Cycle Machine –45 ns/cycle x 1 CPI x 100 inst = 4500 ns Multi Cycle Machine –10 ns/cycle x 4.6 CPI x 100 inst = 4600 ns Pipelined machine –10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
17
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 17 Another Example Consider a multicycle, unpiplined processor requires 4 cycles for the ALU and Branch operations and 5 cycles for the memory operation. Assume the relative frequency of these operations is 40%, 25% and 35% respectively; and the clock cycle is of 1 n sec. In pipelined implementation, due to clock skew and setup processor adds 0.2 n sec. to the clock Ignoring any latency impact, how much is the speedup from the pipelined processor?
18
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 18Solution Unpiplined Processor: Average Execution Time/Instruction = Clock Cycle x Average CPI = 1 n sec. x [{(0.4 +.25)} x 4 + 0.35 x 5] =1 n sec x (0.65 x 4 + 0.35 x 5) =1 n sec x (0.65 x 4 + 0.35 x 5) = 1 n sec. x (2.60 + 1.75) =4.35 n sec Pipelined Processor: Average Execution Time/ Instruction = Clock cycle + overhead = 1 n sec. + 0.2 n. sec =1.2 n sec =1.2 n sec Speed up = 4.35 / 1.2 = 3.62 times
19
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 19 Pipelined Execution Representation Program Flow IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMem WB IFetchDcdExecMemWB Time 1 st Inst. 2 nd Inst. 3 rd Inst 4 th Inst 5 th Inst. Conventional Representation Conventional Representation - Helps showing the program flow viz-a-viz time
20
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 20 Graphical Representation Explanation…… Next Please Time (clock cycles) I.Mem I n s t r. O r d e r Instr 1 Instr 2 Instr 3 Instr 4 ALU I.Mem Reg D. Mem ALU I.Mem Reg D. Mem Reg ALU I.Mem Reg D.Mem Reg ALU D.Mem Reg ALU I.Mem Reg Mem Reg Instr 5 CC1 CC3 CC2 CC5 CC4 CC6 CC8 CC7 CC9 Reg
21
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 21 Why Pipeline? Because the resources are there! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg
22
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 22 Can pipelining get us into trouble? Structural hazards – – Data hazards – – Control hazards
23
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 23 How Stall degrades the performance? The pipelined CPI with stalls = Ideal CPI + Stall clock cycles per instruction
24
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 24 How Stall degrades the performance? 1. 1. Speedup w.r.t unpiplined = CPI Unpiplined 1 + stall cycles per instruction 2. 2. Speedup w.r.t. pipeline depth: : Speedup w.r.t pipeline depth = pipeline depth 1 + stall cycles per instruction
25
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 25 Summary multi cycle datapath verses pipeline datapath Key components of pipeline data path Performance enhancement due to pipeline Hazards in pipelined datapath
26
MAC/VU-Advanced Computer Architecture Lecture 10 –Computer Hardware Design (4) 26 Asslam-u-aLacum and ALLAH Hafiz
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.