Tor Aamodt EECE 476: Computer Architecture Slide Set #6: Multicycle Operations.

Slides:



Advertisements
Similar presentations
ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.
COMP 4211 Seminar Presentation Based On: Computer Architecture A Quantitative Approach by Hennessey and Patterson Presenter : Feri Danes.
Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.
CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Pipelining - Hazards.
Lecture 6: Pipelining MIPS R4000 and More Kai Bu
Instruction-Level Parallelism (ILP)
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
COMP381 by M. Hamdi 1 Pipeline Hazards. COMP381 by M. Hamdi 2 Pipeline Hazards Hazards are situations in pipelining where one instruction cannot immediately.
1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
EECS 470 Pipeline Hazards Lecture 4 Coverage: Appendix A.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
COMP381 by M. Hamdi 1 Pipelining Control Hazards and Deeper pipelines.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
DLX Instruction Format
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )
Appendix A Pipelining: Basic and Intermediate Concepts
EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
Lecture 7: Pipelining Review Kai Bu
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
CSC 4250 Computer Architectures September 26, 2006 Appendix A. Pipelining.
Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.
CMPE 421 Parallel Computer Architecture
EECE 476: Computer Architecture Slide Set #5: Implementing Pipelining Tor Aamodt Slide background: Die photo of the MIPS R2000 (first commercial MIPS microprocessor)
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
Dynamic Scheduling Why go out of style?
Instruction-Level Parallelism
Images from Patterson-Hennessy Book
CDA3101 Recitation Section 8
Lecture 07: Pipelining Multicycle, MIPS R4000, and More
Single Clock Datapath With Control
Appendix C Pipeline implementation
Pipelining: Advanced ILP
Morgan Kaufmann Publishers The Processor
CS 5513 Computer Architecture Pipelining Examples
Lecture 6: Advanced Pipelines
Pipelining Multicycle, MIPS R4000, and More
CSC 4250 Computer Architectures
CS 704 Advanced Computer Architecture
Data Hazards Data Hazard
How to improve (decrease) CPI
Instruction Level Parallelism (ILP)
The Processor Lecture 3.5: Data Hazards
Overview What are pipeline hazards? Types of hazards
Pipelining Multicycle, MIPS R4000, and More
CS203 – Advanced Computer Architecture
Lecture 4: Advanced Pipelines
Pipelining.
CMSC 611: Advanced Computer Architecture
CS 3853 Computer Architecture Pipelining Examples
Presentation transcript:

Tor Aamodt EECE 476: Computer Architecture Slide Set #6: Multicycle Operations

Learning Objectives By the end of this slide set, be able to: –Describe three types of data dependencies and hazards and explain the difference between a data dependency and a data hazard –Describe the challenges of adding multicycle execution units to a pipelined processor –Explain why hazard penalties are larger in the 8-stage MIPS R4000 pipeline 2

Data Dependencies and Hazards A data dependence is a restriction on the order instructions can execute so that they compute the same result they would if they were executed in the original program order (specified by the programmer). A data hazard is an ordering of instruction execution in hardware that violates a data dependency. “Dependence” = A property of the program completely independent of which processor it runs on. “Hazard” = a property exhibited by a program running on a particular microprocessor microarchitecture. 3

Below, Instrunction “J” has a true data dependence on Instruction “I” since “I” writes to R1, then “J” reads from R1: “true” dependence because a value was actually communicated. A true dependence leads to a “read-after-write” (RAW) hazard if the hardware can perform the read before the write. True Dependence I: DADD R1,R2,R3 J: DSUB R4,R1,R3 4 Clock Number DADD R1,R2,R3FDXMW DSUB R4,R1,R3FDXMW

A hazard is created whenever there is a dependence between instructions AND they are close enough that the overlap of execution would change the order of access to the operand involved in the dependence. Presence of dependence indicates potential for a hazard, but actual hazard and length of any stall is a property of the pipeline Data Dependence and Hazards 5

A name dependence occurs when two instructions use same register or memory location but no communication between the instructions is intended. Dependence is on data location (name) not value (contents at location). Name Dependence 6

Below, instruction “I” should read R1 before instruction “J” writes to R1. Reordering “I” and “J” would change result computed by “I”. This is called an “anti-dependence”. It results from reuse of the “name” “R1”. If anti-dependence causes a hazard in the pipeline, it is called a Write After Read (WAR) hazard I: DSUB R4,R1,R3 J: DADD R1,R2,R3 K: XOR R6,R1,R7 Name Dependence #1: Anti-dependence 7

Name Dependence #2: Output dependence Below, instruction “I” should write to R1 before instruction “J” writes to R1. Reordering I and J would leave the wrong value in R1 (causing K to compute the wrong value): Called an “output dependence” This also results from the reuse of name “R1” If an output dependence causes a hazard in the pipeline, called a Write After Write (WAW) hazard I: DSUB R1,R4,R3 J: DADD R1,R2,R3 K: XOR R6,R1,R7 8

Example Problem Find true, anti and output dependencies in this code: L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 True dependencies (possibly leading to RAW hazards) L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 Anti dependencies (possibly leading to WAR hazards) Output dependencies (possibly leading to WAW hazards) L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F

Multicycle Operations RAW hazards? WAW hazards? WAR hazards? Yes Yes (this is new) No! (why?) 10

Pipelined Function Units Clock Number MUL.DIFIDM1M2M3M4M5M6M7MEMWB ADD.DIFIDA1A2A3A4MEMWB L.DIFIDEXMEMWB S.DIFIDEXMEMWB To increase throughput, pipeline function units that are used frequently. Challenges: 1. Longer RAW-hazards, 2. Structural hazard for MEM, WB. 11

RAW hazards with pipelined function units Clock Number L.D F4,0(R2)IFIDEXMW MUL.D F0,F4,F6IFIDsm1m2m3m4m5m6m7MW ADD.D F2,F0,F8IFsIDssssssa1a2a3a4MW S.D F2,0(R2)IFssssssIDEXsssM (Figure A.33 in H&P 4 th edition) Above, timing assumes we stall instructions at the latest pipeline stage possible before operand value is used (rather than stalling in IF/ID). 12

Preventing RAW and WAW hazards with in-order Multicycle Pipelines To prevent RAW and WAW hazards, we can add a hardware table (memory) with a single bit for each register in the register file. This bit tracks if an instruction in the pipeline will write to the the associated register. Stall instruction in decode stage if it reads or will write a register with the bit set. Set bit for destination register when complete decode, clear when instruction reaches writeback. 13 Regs[R1] Regs[R2] Regs[R3] Regs[R31]

In-order Scoreboard Scoreboard: a bit-array, 1-bit for each GPR –If the bit is not set: the register has valid data –If the bit is set: the register has stale data i.e., some outstanding instruction is going to change it Issue in Order: RD  Fn (RS, RT) –If SB[RS] or SB[RT] is set  RAW, stall –If SB[RD] is set  WAW, stall –Else, dispatch to FU (Fn) and set SB[RD] Complete out-of-order –Update GPR[RD], clear SB[RD] Later in this slide set we will see a more complex “scoreboard” when we discuss “out-of-order execution”. To distinguish them, we will call the above hardware an “in-order” scoreboard. 14

Structural Hazards In theory, only L.D uses MEM on cycle 10, but definitely MUL.D, ADD.D and L.D all want to use WB on cycle 11 which causes a structural hazard. Let’s say we wish to eliminate this structural hazard by stalling. In the next slide we see a simple hardware mechanism for inserting correct number of stall cycles before letting instructions leave ID stage. The key idea is to use a “shift register” (just like the shift registers you saw how to write VHDL for in EECE 353). We use this shift register to maintain a “schedule” in hardware for when the WB stage is busy. If bit “N” is ‘1’ then the WB stage will be busy N+M cycles from now (where M is the minimum number of cycles between decode and writeback). Clock Number MUL.D F0,F4,F6IFIDM1M2M3M4M5M6M7MEMWB IFIDEXMEMWB...IFIDEXMEMWB ADD.D F2,F4,F6IFIDA1A2A3A4MEMWB IFIDEXMEMWB...IFIDEXMEMWB L.D F2,0(R2)IFIDEXMEMWB i1 i2 i3 i4 i5 i6 i7 15 Clock Number MUL.D F0,F4,F6IFIDM1M2M3M4M5M6M7MEMWB IFIDEXMEMWB...IFIDEXMEMWB ADD.D F2,F4,F6IFIDsA1A2A3A4MEMWB IFIDEXMEMWB...IFIDEXMEMWB L.D F2,0(R2)IFIDssEXMEMWB

Detection of Structural Hazard at ID Using Shift Register i1:MUL.DF0, F4, F6 Shift Register Direction of Movement Cycles to WBUsed? Cycles to WBUsed? cycle 1 2 i2:DADDR1,R2,R2 i3:DADDR3,R4,R5 i4:ADD.D F2, F4, F6 Cycles to WBUsed? Cycles to WBUsed? Cycles to WBUsed? ??? Latency from ID: MUL.D 9 cycles to WB ADD.D 6 cycles to WB DADD 3 cycles to WB 16 Cycles to WBUsed? Cycles to WBUsed? Cycles to WBUsed?

MIPS R4000 IF - first half of instruction fetch IS - second half of instruction fetch RF - instruction decode and register fetch EX - execution DF - data fetch, first half data cache access DS - second half of data cache access TC - tag check, determine whether the data cache access hit. WB - write back. 17 MIPS R4000 was introduced in 1991 (first 64-bit microprocessor). Used an 8-stage pipeline to achieve higher clock frequency.

MIPS R4000: 2-cycle load delay Clock Number LD R1,...IFISRFEXDFDSTCWB DADD R2,R1,...IFISRFstall EXDFDS DSUB R3,R1,...IFISstall RFEXDF OR R4,R1,...IFstall ISRFEX 18

MIPS R4000: 3-cycle branch penalty Delayed branch can “hide” one cycle out of three (if we fill branch delay slot). We could try to increase number of delay slots, but quickly becomes impossible to fill all of them. 19

MIPS R4000: Delayed Branch Execution TAKEN BRANCH Clock Number BranchIFISRFEXDFDSTCWB Branch inst + 1IFISRFEXDFDSTCWB Branch inst + 2IFISnop Branch inst + 3IFnop Branch targetIFISRFEXDF UNTAKEN BRANCH Clock Number BranchIFISRFEXDFDSTCWB Branch inst + 1IFISRFEXDFDSTCWB Branch inst + 2IFISRFEXDFDSTC Branch inst + 3IFISRFEXDFDS 20 squashed delay slot

MIPS R4000: Floating Point Pipeline Add, multiple, divide floating-point operations share some logic => Opportunity to save area. Terminology : Latency = Time for single operation to complete Initiation interval = How often a new operation of same type can start 21

Multiply followed by Add Structural Hazards! Add issued 4 cycles after Mult => 2 stall cycles Add issued 5 cycles after Mult => 1 stall cycles 22

Add followed by Mult No stall since the shorter operation (Add) clears shared stages before longer operation (Mult) reaches them. 23

MIPS R4000 Performance Evaluation Five bars on left are integer benchmarks; five on right are floating-point benchmarks FP structural hazards due to initiation interval restrictions and shared pipeline stages do not increase average CPI by much. This implies that increased pipelining of execute stage or reducing structural hazards by duplicating shared hardware units will not increase performance. 24

Superpipelining and Hazard Penalties Dividing up logical pipeline stage into multiple stages is sometimes referred to as “superpipelining” Deeper pipelines increase delay (in clock cycles) from stage producing result to stage consuming result. CPI increases with increasing pipeline depth due to increase in stall cycles Increase in stall cycles result from “loose loops” in which forwarding loops have an increased number of pipeline stages between begin and end of forwarding path. Aside: Can think of a branch a “forwarding” a value to PC register in fetch stage when predicted path is incorrect. 25