Computer Architecture Pipelines Diagrams are from Computer Architecture: A Quantitative Approach, 2nd, Hennessy and Patterson.

Slides:

Advertisements

Similar presentations

Lecture 4: CPU Performance

Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

CMPT 334 Computer Organization

Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.

CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.

1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6.

CIS429/529 Winter 2007 Pipelining-1 1 Pipeling RISC/MIPS64 five stage pipeline Basic pipeline performance Pipeline hazards Branch hazards More pipeline.

Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Chapter 13 (In Part) Computer Architecture Pipelines and caches Diagrams are from Computer Architecture: A Quantitative Approach, 2nd, Hennessy and Patterson.

CIS629 Fall 2002 Pipelining 2- 1 Control Hazards Created by branch statements BEQZLOC ADDR1,R2,R3. LOCSUBR1,R2,R3 PC needs to be computed but it happens.

Computer Architecture Pipelines Diagrams are from Computer Architecture: A Quantitative Approach, 2nd, Hennessy and Patterson.

DLX Instruction Format

Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.

Appendix A Pipelining: Basic and Intermediate Concepts

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

Pipelining Basics Assembly line concept An instruction is executed in multiple steps Multiple instructions overlap in execution A step in a pipeline is.

CSE378 Pipelining1 Pipelining Basic concept of assembly line –Split a job A into n sequential subjobs (A 1,A 2,…,A n ) with each A i taking approximately.

-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.

Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

CSC 4250 Computer Architectures September 15, 2006 Appendix A. Pipelining.

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.

EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

CMPE 421 Parallel Computer Architecture

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

1 Pipelining Part I CS What is Pipelining? Like an Automobile Assembly Line for Instructions –Each step does a little job of processing the instruction.

CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and

Processor Design CT101 – Computing Systems. Content GPR processor – non pipeline implementation Pipeline GPR processor – pipeline implementation Performance.

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.

EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch.

11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

CMPE 421 REVIEW: MIDTERM 1. A MODIFIED FIVE-Stage Pipeline PC A Y R MD1 addr inst Inst Memory Imm Ext add rd1 GPRs rs1 rs2 ws wd rd2 we wdata addr wdata.

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

CS203 – Advanced Computer Architecture Pipelining Review.

Computer Organization

CDA3101 Recitation Section 8

Review: Instruction Set Evolution

ELEN 468 Advanced Logic Design

CMSC 611: Advanced Computer Architecture

5 Steps of MIPS Datapath Figure A.2, Page A-8

Morgan Kaufmann Publishers The Processor

Appendix C Pipeline implementation

ECE232: Hardware Organization and Design

\course\cpeg323-08F\Topic6b-323

Chapter 3: Pipelining 순천향대학교 컴퓨터학부 이 상 정 Adapted from

Chapter 4 The Processor Part 2

Design of the Control Unit for One-cycle Instruction Execution

Pipelining in more detail

\course\cpeg323-05F\Topic6b-323

Pipeline control unit (highly abstracted)

Control unit extension for data hazards

An Introduction to pipelining

Instruction Execution Cycle

Pipeline control unit (highly abstracted)

Pipeline Control unit (highly abstracted)

Pipelining Appendix A and Chapter 3.

Pipelining Hazards.

Presentation transcript:

Computer Architecture Pipelines Diagrams are from Computer Architecture: A Quantitative Approach, 2nd, Hennessy and Patterson.

CMPE12cCyrus Bazeghi 2 Instruction Execution Can break down the process of “running” an instruction into stages. These stages are what needs to be done to complete the execution of each instruction. Some instructions will not require some stages.

CMPE12cCyrus Bazeghi 3 “Instruction Execution” The DLX (MIPS) datapath allows every instruction to be executed in 4 or 5 cycles

CMPE12cCyrus Bazeghi 4 Instruction Execution 1.Instruction Fetch (IF) - Get the instruction to be execute. IR  M[PC] NPC  PC + 4 IR – Instruction register NPC – Next program counter

CMPE12cCyrus Bazeghi 5 2.Instruction Decode/Register Fetch (ID) – Figure out what the instruction is supposed to do and what it needs. A  Register File[Rs1] B  Register File[Rs2] Imm  {(IR16)16, IR15..0} Instruction Execution A & B & Imm are registers that hold inputs to the ALU which is in the execute stage

CMPE12cCyrus Bazeghi 6 Instruction Execution 3.Execution (EX) -The instruction has been decoded, so execution can be split according to instruction type. Reg-Reg:ALUout  A op B Reg-Imm:ALUout  A op Imm Branch:ALUout  NPC + Imm cond  (A {==, !=} 0) LD/ST:ALUout  A op Imm to form effective Address Jump:ALUout  {PC[31:28],Imm[25:0],00}

CMPE12cCyrus Bazeghi 7 Instruction Execution 4.Memory Access/Branch Completion (MEM) – Besides the IF stage this is the only stage that access the memory to load and store data. Load:LMD = Mem[ALUout] Store:Mem[ALUout]  B Branch:PC  (cond) ? ALUout : NPC Jump/JAL:PC  ALUout JR:PC  A ELSE:PC  NPC LMD=Load Memory Data Register

CMPE12cCyrus Bazeghi 8 5.Write-Back (WB) – Store all the results and loads back to registers. ALU Instruction: Rd  ALUoutput Load: Rd  LMD JAL: R31  Old NPC Instruction Execution

CMPE12cCyrus Bazeghi 9 What is Pipelining?? Pipelining is an implementation technique whereby multiple instructions are overlapped in execution.

CMPE12cCyrus Bazeghi 10 How instructions move through the pipeline Instr 1IFIDEXMEMWB Instr 2IFIDEXMEMWB Instr 3IFIDEXMEMWB Instr 4IFIDEXMEM Instr 5IFIDEX Time 

CMPE12cCyrus Bazeghi 11 Pipelining The pipeline can be thought of as a series of data paths shifted in time

CMPE12cCyrus Bazeghi 12 Pipeline Considerations What are the overheads of pipelining? What are the complexities? What types of pipelines are there?

CMPE12cCyrus Bazeghi 13 Pipelining Pipeline Overhead Latches between stages Must latch data and control Expansion of stage delay… Original time was IF + ID + EX + MEM + WB With phases skipped for some instructions New best time is MAX (IF,ID,EX,MEM,WB) + Latch Delay More memory ports or separate memories Requires complicated control to make as fast as possible But so do non-pipelined high performance designs

CMPE12cCyrus Bazeghi 14 Pipelining Pipeline complexities 3 or 5 or 10 instructions are executing at the same time What if the instructions are dependent? Must have hardware or software ways to ensure proper execution. What if an interrupt occurs? Would like to be able to figure out what instruction caused it!!

CMPE12cCyrus Bazeghi 15 Pipeline Stalls – Structural What causes pipeline stalls? 1. Structural Hazard Inability to perform 2 tasks. Example: Memory needed for both instructions and data SpecInt92 has 35% load and store If this remains a hazard, each load or store will stall one IF.

CMPE12cCyrus Bazeghi 16 Pipeline Stalls – Data 2.1 Data Hazards – Read after Write (RAW) Data is needed before it is written. ADD R1,R2,R3IFIDEXMEMWB ADD R4,R1,R5IFIDEXMEMWB ADD R4,R1,R5IFIDEXMEMWB ADD R4,R2,R3IFIDEXMEMWB TIME = Calc Done R1 Written R1 Read?R1 Used R1 Read?

CMPE12cCyrus Bazeghi 17 Pipeline Stalls – Data Data must be forwarded to other instructions in pipeline if ready for use but not yet stored. In 5-stage MIPS pipeline, this means forwarding to next three instructions. Can reduce to 2 instructions if register bank can process a simultaneous read and write. Text indicates this as writing in first half of WB and reading in second half of ID. More likely, both occur at once, with read receiving the data as it is written. Read after Write (RAW)

CMPE12cCyrus Bazeghi 18 Pipeline Stalls – Data Load Delays Data available at end of MEM, not in time for the Instr + 1’s EX phase.

CMPE12cCyrus Bazeghi 19 Pipeline Stalls – Data So insert a bubble…

CMPE12cCyrus Bazeghi 20 Pipeline Stalls – Data A Store requires two registers at two different times: Rs1 is required for address calculation during EX Data to be stored (Rd) is required during MEM Data can be loaded and immediately stored Data from the ALU must be forwarded to MEM for stores

CMPE12cCyrus Bazeghi 21 Pipeline Stalls – Data 2.2 Data Hazards – Write After Write (WAW) Out-of-order completion of instructions IF ID/RF ISSUE FMUL1 FMUL2 FDIV1 FDIV2 FDIV3 FDIV4 WB Length of FDIV pipe could cause WAW error FDIV R1, R2, R3 FMUL R1, R2, R3

CMPE12cCyrus Bazeghi 22 Pipeline Stalls – Data 2.3 Data Hazards – Write After Read (WAR) A slow reader followed by a fast writer IF ID ISSUE FLOAD1 FMADD1 FMADD2 FMADD3 FMADD4 WB Read registers for multiply Read registers for add FMADD R1, R2, R3, R4IF ID IS FM1 FM2 FM3 FM4 WB FLOAD R4, #300(R5) IF ID IS F1 WB

CMPE12cCyrus Bazeghi 23 Pipeline Stalls – Data 2.4 Data Hazards – Read After Read (RAR) A read after a read is not a hazard (Assuming that a read does not change any state)

CMPE12cCyrus Bazeghi 24 Pipeline Stalls - Control 3.Control Hazards Branch, Jump, Interrupt Or why computers like straight line code… 3.1 Control Hazards – Jump Jump IF ID EX MEM WB IF ID EX MEM WB When is knowledge of jump available? When is target available? With absolute addressing, may be able to fit jump into IF phase with zero overhead.

CMPE12cCyrus Bazeghi 25 Pipeline Stalls – Control This relies on performing simple decode in IF. If IF is currently the slowest phase, this will slow the clock. Relative jumps would require an adder and its delay.

CMPE12cCyrus Bazeghi 26 Pipeline Stalls – Control 3.2 Control Hazards – Branch Branch IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Branch is resolved in MEM and information forwarded to IF, causing three stalls. 16% of the SpecInt instructions are branches, and about 67% are taken, CPI = CPI IDEAL + STALLS = 1 + 3(0.67)(0.16) = 1.32 Must Resolve branch sooner Update PC sooner Fetch correct next instruction more frequently

CMPE12cCyrus Bazeghi 27 Pipeline Stalls – Control Delayed branch and branch delay slot Ideally, fill slot with instruction from before branch Otherwise from target (since branches are more often taken than not) Or from fall-through Slots filled 50-70% of time Interrupt recovery complicated

CMPE12cCyrus Bazeghi 28 Why use a pipeline? Seems so complicated. What are some of the overheads? Summary of Pipelines Pipelines

CMPE12cCyrus Bazeghi 29 Computer Architectures Have discussed MIPS and HC11 Other common ones: –Motorola –Intel x86 –Sun SPARC

CMPE12cCyrus Bazeghi 30 Other Architectures Motorola Family (circa late ’70’s) Instructions can specify 8, 16, or 32 bit operands. Has 32 bit registers. Has 2 register files A and D both have 8 32-bit registers. (A is for addresses and D is for Data). Used to save bits in the opcode since implicit use of A or D is in the instruction. Uses a two-address instruction set. Instructions are usually 16-bits but can have up to 4 more 16-bit words. Supports lots of addressing modes.

CMPE12cCyrus Bazeghi 31 Other Architectures Intel x86 Family (circa late ’70’s) Probably the least elegant design available. Uses 16-bit words, so only 64k unique address can be provided. Used segment addressing to overcome this limit. Basically shift the address by 4 (multiply by 16) and then add a 16-bit offset giving and effective 20-bit address, thus 1MB of memory. Has 4 registers (segment registers) used to hold segment addresses: CS, DS, SS, ES. Has 4 general registers: AX, BX, CX, and DX. Expanded to 32-bit addressing with the 386.

CMPE12cCyrus Bazeghi 32 Other Architectures Sun SPARC Architecture (late 1980’s) Scalable Processor ARChitecture Developed around the same time as the MIPS. Very similar to MIPS except for register usage. Has a set of global registers (8) like MIPS. Has a large set of registers from which only a subset can be accessed at any time, called a register file. Is a load/store architecture with 3-addressing instructions. Has 32-bit registers and all instructions are 32-bits. Designed to support procedure call and returns efficiently. Number of register in the register file vary with implementation but at most 24 can be accessed. Those 24 plus the 8 global give a register window.

CMPE12cCyrus Bazeghi 33 Summary of Computer Architectures Lots of computer architectures out there. Expensive to develop. Want to keep around as long as possible. RISC is easier to pipeline. Want to aid software development.