Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining.

Slides:

Advertisements

Similar presentations

PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,

Advertisements

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

CMPT 334 Computer Organization

Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.

Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.

Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.

Goal: Describe Pipelining

Computer Architecture

CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.

1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.

Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

Pipelining Andreas Klappenecker CPSC321 Computer Architecture.

Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Pipelining Datapath Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley) and Hank Walker (TAMU)

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.

1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.

CS 61C L30 Introduction to Pipelined Execution (1) Garcia, Fall 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.

Appendix A Pipelining: Basic and Intermediate Concepts

Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.

1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.

Introduction to Pipelining Rabi Mahapatra Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley)

9.2 Pipelining Suppose we want to perform the combined multiply and add operations with a stream of numbers: A i * B i + C i for i =1,2,3,…,7.

CS1104: Computer Organisation School of Computing National University of Singapore.

B 0000 Pipelining ENGR xD52 Eric VanWyk Fall

Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.

Analogy: Gotta Do Laundry

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

CMPE 421 Parallel Computer Architecture

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.

Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

Pipelining Example Laundry Example: Three Stages

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

Pipelining CS365 Lecture 9. D. Barbara Pipeline CS465 2 Outline  Today’s topic  Pipelining is an implementation technique in which multiple instructions.

LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.

CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.

Lecture 18: Pipelining I.

Computer Organization

Pipelines An overview of pipelining

Morgan Kaufmann Publishers

Performance of Single-cycle Design

Pipeline Implementation (4.6)

ECE232: Hardware Organization and Design

Pipelining Lessons 6 PM T a s k O r d e B C D A 30

Chapter 4 The Processor Part 2

Lecturer: Alan Christopher

Serial versus Pipelined Execution

Pipelining Lessons 6 PM T a s k O r d e B C D A 30

The Processor Lecture 3.4: Pipelining Datapath and Control

An Introduction to pipelining

Pipelining Appendix A and Chapter 3.

Guest Lecturer: Justin Hsia

A relevant question Assuming you’ve got: One washer (takes 30 minutes)

Recall: Performance Evaluation

Presentation transcript:

Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 2 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Topics Explaining pipelining through example CPU pipelining MIPS instructions & pipelining Pipeline hazards Recent trends in performance

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 3 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Introduction Still in CPU’s Datapath Control Datapath Memory Processor Input Output

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 4 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) What is pipelining? Implementation technique in which multiple instructions are overlapped in execution Real-life pipelining examples? – Laundry – Factory production lines – Traffic?? Key to making processors fast

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 5 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Laundry You have 4 loads of cloths to wash: Steps (stages) required: – Wash – Dry – Fold – Store clothes into drawers Maybe let your roommate help here Each stage needs 30 minutes We can’t start the next step until the previous step is finished ABCD

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 6 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Laundry There are 2 approaches to do this job: – Sequential (non-pipelined): Wait until the first load is put away in order to start the next load – Pipelined (ASAP): As soon as the washer is empty, start putting the next load, while the first load is put into dryer

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 7 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Laundry Sequential – Needs 8 hours for 4 loads A B C D T i me 7 6 P M AM T a sk order

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 8 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Laundry Pipelined Laundry: – Start work ASAP – Needs only 3.5 hours for 4 loads!

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 9 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelined Laundry Observations: At some point, all stages of washing will be operating concurrently Pipelining doesn’t reduce number of stages – Doesn’t help latency of single task – Does help throughput of entire workload As long as we have separate resources, we can pipeline the tasks Multiple tasks operating simultaneously use different resources

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 10 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelined Laundry Observations: Speedup due to pipelining depends on the number of stages in the pipeline Pipeline rate limited by slowest pipeline stage – If dryer needs 45 min, time for all stages has to be 45 min to accommodate it – Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup If one load depends on anther, we will have to wait (Delay/Stall for Dependencies)

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 11 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Problems If we have washer-dryer combination instead of separate washer & dryer, we can’t make the two steps in parallel If your roommate is busy doing something else, you will have to add a stage of going to the drawer and putting your stuff yourself – One resource for folding – Another resource for put away => Structural Hazard

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 12 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Problems If you are cleaning the uniform of a football team, you need to determine whether the detergent & water temperature settings are ok or should be changed, depending on how dirty are the uniforms – Stall ? Slow Wait until you get the dry uniforms to see if you need to change setup. Repeat until you have the right formula – Continue with the next load & check formula later? => Control hazards

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 13 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining Example: Problems In the folding process, if you discover that the load of socks you have need the mate of each of the socks that is still the the washer, you will have to wait until they dry in order to be able to fold => Data hazards

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 14 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Can we pipeline instruction execution? For the following instructions, which resources do you need for each of these steps? – store/ load word – add/ subtract/ and/ or/ slt – branch if equal

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 15 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Review: 5 stages of a MIPS instruction – Fetch instruction from instruction memory – Read registers while decoding instruction – Execute operation or calculate address, depending on the instruction type – Access an operand from data memory – Write result into a register We can reduce the cycles to fit the stages. Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWr Load

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 16 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Review: Datapath resources

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 17 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Example: Resources for Load Instruction – Fetch instruction from instruction memory (Ifetch) Instruction memory (IM) – Read registers while decoding instruction(Reg/Dec) Register file & decoder (Reg) – Execute operation or calculate address ALU – Access an operand from data memory (Mem) Data memory (DM) – Write result into a register (Wr) Register file (Reg)

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 18 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Note that accessing source & destination registers is performed in two different parts of the cycle We need to decide upon which part of the cycle should reading and writing to the register file take place.

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 19 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 20 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining: Example Assumptions: – Only consider the following instructions: lw, sw, add, sub, and, or, slt, beq – Operation times for instruction classes are: Memory access 2 ns ALU operation 2 ns Register file read or write 1 ns – Use a single- cycle model – Clock cycle must accommodate the slowest instruction (2 ns) – Both pipelined & non-pipelined approaches use the same HW components

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 21 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining: Example Single-Cycle, non-pipelined execution Total time for 3 instructions: 24 ns

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 22 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining: Example Single-cycle, pipelined execution – Improve performance by increasing throughput Total time for 3 instructions = 14 ns Each instruction adds 2 ns to total execution time – Stage time limited by slowest resource (2 ns)

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 23 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Example: Instructions total times: Assumption: – No delay in the following processor units: MUX, Control unit, PC access, Sign-extend units Slowest resources: – Instruction memory – ALU – Data memory

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 24 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Example: Theoretically: – Speedup should be equal to number of stages Practically: – Stages are imperfectly balanced – Pipelining needs overhead – Speedup less than number of stages If we have 3 consecutive instructions – Non-pipelined needs 8 x 3 = 24 ns – Pipelined needs 14 ns => Speedup = 24 / 14 = 1.7

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 25 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) CPU Pipelining Example: If we have 1003 consecutive instructions – Add more time for 1000 instruction (i.e instruction)on the previous example Non-pipelined total time= 1000 x = 8024 ns Pipelined total time = 1000 x = 2014 ns => Speedup ~ 3.98 ~(8 ns / 2 ns) ~ near perfect speedup => Performance increases for larger number of instructions – Higher throughput

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 26 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining MIPS Instruction Set MIPS was designed with pipelining in mind => Pipelining is easy in MIPS: 1.All instruction are the same length 2.Limited instruction format 3.Memory operands appear only in lw & sw 4.Operands must be aligned in memory

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 27 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining MIPS Instruction Set 1) All MIPS instruction are the same length – Fetch instruction in 1st pipeline stage – Decode instructions in 2nd stage – If instruction length varies (e.g. 80x86), pipelining will be more challenging

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 28 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining MIPS Instruction Set 2) MIPS has limited instruction format – Source register in the same place for each instruction Symmetry – 2nd stage can begin reading at the same time as decoding – If instruction format wasn’t symmetric, stage 2 should be split into 2 distinct stages => Total stages = 6 (instead of 5)

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 29 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining MIPS Instruction Set 3) Memory operands appear only in lw & sw – We can use the execute stage to calculate memory address – Access memory in the next stage – If we needed to operate on operands in memory (e.g. 80x86), stages 3 & 4 would expand to Address calculation Memory access Execute

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 30 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelining MIPS Instruction Set 4) Operands must be aligned in memory – Transfer of more than one data operand can be done in a single stage with no conflicts – Need not worry about single data transfer instruction requiring 2 data memory accesses – Requested data can be transferred between the CPU & memory in a single pipeline stage

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 31 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelined Datapath Basic Idea: – Resources for Different Stages should be Independent – Add registers & other HW to split datapath into stages I n st r u c on m e mo r y Ad d r es s Add A dd r e sut Sh f t l e f t2 I n s r uct i o n M u x 0 1 Add P C 0 W r e d ata M u x 1 R eg ist e r s Re a d d ata 1 Re a d d ata 2 R e ad eg ist e r 1 R e a d r eg ist e r S ig n e x t end W rit e r eg ist e r W ri t e d ata R e a d d ata Add r e s s Data mem o r y 1 AL U r e s u lt M u x A L U Z e r o I F : I n st ru c ti o n fetc h I D: I n st ruc t i o n d eco d e/ re g is t er f i le r e ad E X : Exec ut e / addr ess c al c u l ati on M E M: M e m o r y a cce s s WB: Wri te b a ck Registers should be here

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 32 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelined Datapath Example: add Instruction Terminology: – Shading indicates that the element is used Shading on the right means reading Shading on the left means writing – For register file Hashed means not used at the corresponding half of the cycle add $s0, $t0, $t1 T i me I F ID WB EX MEM ReadWrite

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 33 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelined Datapath Graphically Representing Pipelines Can help with answering questions like: – How many cycles does it take to execute this code? – What is the ALU doing during cycle 4?

OCC - CS/CIS CS116-Ch00-Orientation Morgan Kaufmann Publishers ( Augmented & Modified by M.Malaty) 34 OCC-CS 116 Fall Morgan Kaufmann Publishers (Augmented & Modified by M.Malaty and M. Beers) Pipelined Datapath Another way to represent pipelining IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB Program Flow Time