ECE232: Hardware Organization and Design

Slides:

Advertisements

Similar presentations

PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,

Advertisements

Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.

Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.

Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.

Computer Architecture

CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.

Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

331 Lec18.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath [Adapted from Dave.

Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Pipelining Datapath Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley) and Hank Walker (TAMU)

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.

1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.

Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.

Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.

Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.

Spring W :332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s.

Introduction to Pipelining Rabi Mahapatra Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley)

CS1104: Computer Organisation School of Computing National University of Singapore.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

B 0000 Pipelining ENGR xD52 Eric VanWyk Fall

EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.

Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.

Analogy: Gotta Do Laundry

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

Computer Organization CS224 Chapter 4 Part b The Processor Spring 2010 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.

CMPE 421 Parallel Computer Architecture

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.

CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.

Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and

Pipelining Example Laundry Example: Three Stages

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Pipelining CS365 Lecture 9. D. Barbara Pipeline CS465 2 Outline  Today’s topic  Pipelining is an implementation technique in which multiple instructions.

CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.

CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.

Lecture 18: Pipelining I.

Computer Organization

Pipelines An overview of pipelining

Review: Instruction Set Evolution

CMSC 611: Advanced Computer Architecture

Pipelining Lessons 6 PM T a s k O r d e B C D A 30

ECE232: Hardware Organization and Design

Chapter 3: Pipelining 순천향대학교 컴퓨터학부 이 상 정 Adapted from

Processor Design: Pipeline

Chapter 4 The Processor Part 2

CMSC 611: Advanced Computer Architecture

Lecturer: Alan Christopher

Serial versus Pipelined Execution

CS-447– Computer Architecture Lecture 14 Pipelining (2)

Pipelining Lessons 6 PM T a s k O r d e B C D A 30

The Processor Lecture 3.6: Control Hazards

The Processor Lecture 3.4: Pipelining Datapath and Control

An Introduction to pipelining

Chapter 8. Pipelining.

Pipelining Appendix A and Chapter 3.

A relevant question Assuming you’ve got: One washer (takes 30 minutes)

Recall: Performance Evaluation

Presentation transcript:

ECE232: Hardware Organization and Design Part 11: Pipelining Chapter 4/6 http://www.ecs.umass.edu/ece/ece232/ Other handouts Course schedule with due dates To handout next time HW#1 Combinations to AV system, etc. 5581 (1988 in 113 IST) Call AV hot line at 8-777-0035

CPI Calculation CPI stands for average number of Cycles Per Instruction Assume an instruction mix of 24% loads, 12% stores, 44% R-format, 18% branches, and 2% jumps CPI = 0.24 * 5 + 0.12 * 4 + 0.44 * 4 + 0.18 * 3 + 0.02 * 3 = 4.04 Speedup? Question: Can we achieve a CPI of 1???

Speeding up through pipelining Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 30 minutes “Folder” takes 30 minutes “Stasher” takes 30 minutes to put clothes into drawers A B C D

Sequential Laundry 6 PM 7 8 9 10 11 12 1 2 AM T a s k O r d e 30 30 30 Sequential laundry takes 8 hours for 4 loads If they learned pipelining, how long would laundry take? 6 PM 7 8 9 10 11 12 1 2 AM T a s k O r d e 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time A B C D

Pipelined Laundry: Start work ASAP 6 PM 7 8 9 10 11 12 1 2 AM B C D A 30 Time T a s k O r d e Pipelined laundry takes 3.5 hours for 4 loads!

Pipelining Lessons 6 PM 7 8 9 T a s k O r d e B C D A 30 Pipelining doesn’t help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup 6 PM 7 8 9 Time T a s k O r d e B C D A 30

Pipelining Instructions Time (in cycles) Fetch = 10 ns Decode = 6 ns Execute = 8 ns Memory = 10 ns Write back = 6 ns F D EX M W F D EX M W F D EX M W Instruction F D EX M W F D EX M W F D EX M W

Single Cycle, Multiple Cycle, vs. Pipeline Clk Single Cycle Implementation: Load Store Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load Store R-type Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Ifetch Here are the timing diagrams showing the differences between the single cycle, multiple cycle, and pipeline implementations. For example, in the pipeline implementation, we can finish executing the Load, Store, and R-type instruction sequence in seven cycles. In the multiple clock cycle implementation, however, we cannot start executing the store until Cycle 6 because we must wait for the load instruction to complete. Similarly, we cannot start the execution of the R-type instruction until the store instruction has completed its execution in Cycle 9. In the Single Cycle implementation, the cycle time is set to accommodate the longest instruction, the Load instruction. Consequently, the cycle time for the Single Cycle implementation can be five times longer than the multiple cycle implementation. But may be more importantly, since the cycle time has to be long enough for the load instruction, it is too long for the store instruction so the last part of the cycle here is wasted. +2 = 77 min. (X:57) Pipeline Implementation: Load Ifetch Reg Exec Mem Wr Store Ifetch Reg Exec Mem Wr R-type Ifetch Reg Exec Mem Wr

Why Pipeline? Suppose we execute 100 instructions Single Cycle Machine 45 ns/cycle x 1 CPI x 100 inst = 4500 ns Multicycle Machine 10 ns/cycle x 4.04 CPI (for the given inst mix) x 100 inst = 4040 ns Instruction mix of 24% loads, 12% stores, 44% R-format, 18% branches, and 2% jumps Ideal pipelined machine (with 5 stages) 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns Speedup=4.33 vs. single-cycle 3.88 vs. multi-cycle (for the given inst mix)

Why Pipeline? Because the resources are there! d e Time (clock cycles) Inst 1 Inst 2 Inst 3 Inst 5 Inst 4 ALU Im Reg Dm

Pipelining Rules Inst 5 Inst 4 Inst 3 Inst 2 Inst 1 ALU IMem Reg DMem Forward traveling signals at each stage are latched Only perform logic on signals in the same stage signal labeling useful to prevent errors, e.g., IRR, IRA, IRM, IRW Backward travelling signals at each stage represent hazards

MIPS Pipelined Datapath State registers between pipeline stages to isolate them IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WriteBack Inst 5 Inst 4 Inst 3 Inst 2 Inst 1 Add Add 4 Shift left 2 Instruction Memory Read Addr 1 Data Memory Register File Read Data 1 Read Addr 2 Read Address IFetch/Dec PC Read Data Dec/Exec Exec/Mem Address Write Addr ALU Read Data 2 Mem/WB Note two exceptions to right-to-left flow WB that writes the result back into the register file in the middle of the datapath Selection of the next value of the PC, one input comes from the calculated branch address from the MEM stage Only later instructions in the pipeline can be influenced by these two REVERSE data movements. The first one (WB to ID) leads to data hazards. The second one (MEM to IF) leads to control hazards. All instructions must update some state in the processor – the register file, the memory, or the PC – so separate pipeline registers are redundant to the state that is updated (not needed). PC can be thought of as a pipeline register: the one that feeds the IF stage of the pipeline. Unlike all of the other pipeline registers, the PC is part of the visible architecture state – its content must be saved when an exception occurs (the contents of the other pipe registers are discarded). Write Data Write Data Sign Extend 16 32 System Clock

Pipeline Hazards Data hazards: an instruction uses the result of a previous instruction (RAW) ADD R1, R2, R3 or SW R1, 4(R2) SUB R4, R1, R5 LW R3, 4(R2) Control hazards: the address of the next instruction to be executed depends on a previous instruction BEQ R1,R2,CONT SUB R6,R7,R8 … CONT: ADD R3,R4,R5 Structural hazards: two instructions need access to the same resource e.g., single memory shared for instruction fetch and load/store

Structural Hazard lw Inst 1 Inst 2 Inst 3 Inst 4 Time (clock cycles) Reading data from memory ALU Mem Reg lw I n s t r. O r d e ALU Mem Reg Inst 1 ALU Mem Reg Inst 2 ALU Mem Reg Inst 3 Reading instruction from memory ALU Mem Reg Inst 4 Fix with separate instruction and data memories (I$ and D$)

Data Hazards (RAW) Time (in cycles) Instruction ADD R1, R2, R3 F D EX M W Write Data to R1 Here F D EX M W Instruction Get data from R1 Here ADD R1, R2, R3 SUB R4, R1, R5

One Way to handle a Data Hazard By waiting – introducing stalls – but impacts CPI ALU IM Reg DM add $1,… I n s t r. O r d e stall stall stall ALU IM Reg DM sub $4,$1,$5

Must allow Wr/Rd in REG in same cycle Split cycle into two halves I n s t r. O r d e Time (clock cycles) Inst 1 Inst 2 Inst 3 Inst 5 Inst 4 ALU Im Reg Dm

Only two stall cycles add $1,… stall stall sub $4,$1,$5 and $6,$1,$7 Write in 1st half, Read in 2nd half ALU IM Reg DM add $1,… I n s t r. O r d e stall stall sub $4,$1,$5 and $6,$1,$7 ALU IM Reg DM

Register File (write and then read) Time (clock cycles) Fix register file access hazard by doing reads in the second half of the cycle and writes in the first half ALU IM Reg DM add $1, I n s t r. O r d e ALU IM Reg DM Inst 1 ALU IM Reg DM Inst 2 ALU IM Reg DM or $8,$1,$9 For lecture Define register reads to occur in the second half of the cycle and register writes in the first half clock edge that controls loading of pipeline state registers

Forwarding with Load-use Data Hazards ALU IM Reg DM lw $1,4($2) I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 For lecture Note that lw is just another example of register usage (beyond ALU ops) Need to stall even with forwarding when data hazard involves a load ALU IM Reg DM xor $4,$1,$5 sub needs to stall Will still need one stall cycle even with forwarding

Injecting Bubbles and sub lw Inst -1 Inst -2 and sub bubble lw Inst -1 IF ID EX MEM WB and sub lw Inst -1 Inst -2 and sub bubble lw Inst -1 Add Add 4 Shift left 2 Instruction Memory Read Addr 1 Data Memory Register File Read Data 1 Read Addr 2 Read Address IFetch/Dec PC Read Data Dec/Exec Exec/Mem Address Write Addr ALU Read Data 2 Mem/WB Write Data Write Data Note two exceptions to right-to-left flow WB that writes the result back into the register file in the middle of the datapath Selection of the next value of the PC, one input comes from the calculated branch address from the MEM stage Only later instructions in the pipeline can be influenced by these two REVERSE data movements. The first one (WB to ID) leads to data hazards. The second one (MEM to IF) leads to control hazards. All instructions must update some state in the processor – the register file, the memory, or the PC – so separate pipeline registers are redundant to the state that is updated (not needed). PC can be thought of as a pipeline register: the one that feeds the IF stage of the pipeline. Unlike all of the other pipeline registers, the PC is part of the visible architecture state – its content must be saved when an exception occurs (the contents of the other pipe registers are discarded). Inst –2 Inst –1 lw sub and Sign Extend 16 32 System Clock

3 Types of Data Hazards RAW (read after write) WAW (write after write) only hazard for ‘fixed’ pipelines later instruction must read after earlier instruction writes WAW (write after write) variable-length pipeline later instruction must write after earlier instruction writes WAR (write after read) instruction with late read (e.g., waiting for an execution unit) later instruction must write after earlier instruction reads F D EX M W add $1,$2,$3 sub $4,$1,$5 F D EX M W F D E1 E2 E3 E4 E5 W div $1,$4,$3 add $1,$2,$5 F D EX M W mlt $4,$1,$3 add $1,$2,$5 F D s1 s2 s3 s4 s5 E1 E2 E3 W F D EX M W

Control Hazard Time (in cycles) Instruction JR R25 ... XX: ADD ... F D EX M W Destination Available Here F D EX M W Instruction Need Destination Here JR R25 ... XX: ADD ... Simple solution: Flush Instruction fetch until branch resolved