Pipelining the Beta bet·ta ('be-t&) n. Any of various species

Slides:



Advertisements
Similar presentations
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.
ELEN 468 Advanced Logic Design
Computer Organization and Architecture
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipeline Issues This pipeline stuff makes my head hurt! Maybe it’s that dumb hat.
CS Computer Architecture 1 CS 430 – Computer Architecture Pipelined Execution - Review William J. Taffe using slides of David Patterson.
Chapter 12 Pipelining Strategies Performance Hazards.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
Appendix A Pipelining: Basic and Intermediate Concepts
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
1 Manchester Mark I, This was the second (the first was a small- scale prototype) machine built at Cambridge. A production version of this computer.
Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
6.004 – Fall /29/0L15 – Building a Beta 1 Building the Beta.
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
L16 – Pipelining 1 Comp 411 – Spring /20/2011 Pipelining Between 411 problems sets, I haven’t had a minute to do laundry Now that’s what I call dirty.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Computer Organization and Design Pipelining Montek Singh Mon, Dec 2, 2013 Lecture 16.
Computer Organization and Design Pipelining Montek Singh Dec 2, 2015 Lecture 16 (SELF STUDY – not covered on the final exam)
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
PROCESSOR PIPELINING YASSER MOHAMMAD. SINGLE DATAPATH DESIGN.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
Real-World Pipelines Idea Divide process into independent stages
Chapter Six.
Advanced Architectures
Computer Organization
ELEN 468 Advanced Logic Design
Appendix C Pipeline implementation
How Computers Work Lecture 13 Details of the Pipelined Beta
Pipelining in more detail
Chapter Six.
Chapter Six.
November 5 No exam results today. 9 Classes to go!
CS203 – Advanced Computer Architecture
Control Hazards Branches (conditional, unconditional, call-return)
Interrupts and exceptions
Presentation transcript:

Pipelining the Beta bet·ta ('be-t&) n. Any of various species of small, brightly colored, long-finned freshwater fishes of the genus Betta, found in southeast Asia. be·ta (‘bA-t&, ‘bE-) n. 1. The second letter of the Greek alphabet. 2. The exemplary computer system used in 6.004. I don’t think they mean the fish... maybe they’ll give me partial credit... 6.004 – Fall 2002 10/31/0

Exception Processing Plan: • Interrupt running program • Invoke exception handler (like a procedure call) • Return to continue execution. We’d like RECOVERABLE INTERRUPTS for • Synchronous events, generated by CPU or system FAULTS (eg, Illegal Instruction, divide-by-0, illegal mem address) TRAPS & system calls (eg, read-a-character) • Asynchronous events, generated by I/O (eg, key struck, packet received, disk transfer complete) KEY: TRANSPARENCY to interrupted program. • Most difficult for asynchronous interrupts 6.004 – Fall 2002 10/31/0

Implementation… How exceptions work: • Don’t execute current instruction • Instead fake a “forced” procedure call • save current PC (actually current PC + 4) • load PC with exception vector • 0x4 for synch. exception, 0x8 for asynch. exceptions Question: where to save current PC + 4? • Our approach: reserve a register (R30, aka XP) • Prohibit user programs from using XP. Why? Example: DIV unimplemented LD(R31,A,R0) LD(R31,B,R1) DIV(R0,R1,R2) ST(R2,C,R31) IllOp: PUSH(XP) Fetch inst. at Mem[Reg[XP]–4] check for DIV opcode, get reg numbers perform operation in SW, fill result reg POP(XP) JMP(XP) Forced by hardware 6.004 – Fall 2002 10/31/0

Exceptions Other: Reg[XP] ← PC+4; PC ←“Xadr” Bad Opcode: Reg[XP] ← PC+4; PC ← “IllOp” Other: Reg[XP] ← PC+4; PC ←“Xadr” 6.004 – Fall 2002 10/31/0

Increasing CPU Performance MIPS = Millions of Instructions/Second Freq = Clock Frequency, MHz CPI = Clocks per Instruction To Increase MIPS: 1. DECREASE CPI. - RISC simplicity reduces CPI to 1.0. - CPI below 1.0? Tough... you’ll see multiple instruction issue machines in 6.823. 2. INCREASE Freq. - Freq limited by delay along longest combinational path; hence - PIPELINING is the key to improved performance through fast clocks. 6.004 – Fall 2002 10/31/0

Beta Timing Wanted: longest paths Complications: • some apparent paths aren’t “possible” • blobs have variable execution times (eg, ALU) • time axis is not to scale (eg, tPD,MEM is very big!) 6.004 – Fall 2002 10/31/0

Why this isn’t a 20-minute lecture? We’ve learned how to pipeline combinational circuits. What’s the big deal? 1. The Beta isn’t combinational… • Explicit state in register file, memory; • Hidden state in PC. 2. Consecutive operations – instruction executions – interact: • Jumps, branches dynamically change instruction sequence • Communication thru registers, memory Our goals: • Move slow components into separate pipeline stages, running clock faster • Maintain instruction semantics of unpipelined Beta as far as possible 6.004 – Fall 2002 10/31/0

Ultimate Goal: 5-Stage Pipeline GOAL: Maintain (nearly) 1.0 CPI, but increase clock speed to barely include slowest components (mems, regfile, ALU) APPROACH: structure processor as 5-stage pipeline: Instruction Fetch stage: Maintains PC, fetches one instruction per cycle and passes it to Register File stage: Reads source operands from register file, passes them to ALU stage: Performs indicated operation, passes result to Memory stage: If it’s a LD, use ALU result as an address, pass mem data (or ALU result if not LD) to Write-Back stage: writes result back into register file. 6.004 – Fall 2002 10/31/0

First Steps: A Simple 2-Stage Pipeline 6.004 – Fall 2002 10/31/0

2-Stage Pipelined Beta Operation Consider a sequence of instructions: Executed on our 2-stage pipeline: TIME (cycles) Pipeline 6.004 – Fall 2002 10/31/0

Pipeline Control Hazards BUT consider instead: This is the cycle where the branch decision is made… but we’ve already fetched the following instruction which should be executed only if branch is not taken! 6.004 – Fall 2002 10/31/0

Branch Delay Slots PROBLEM: One (or more) following instructions have been prefetched by the time a branch is taken. POSSIBLE SOLUTIONS: 1. Make hardware “annul” instructions following branches which are taken, e.g., by disabling WERF and WR. 2. “Program around it”. Either a) Follow each BR with a NOP instruction; or b) Make compiler clever enough to move USEFUL instructions into the branch delay slots i. Always execute instructions in delay slots ii. Conditionally execute instructions in delay slots 6.004 – Fall 2002 10/31/0

Branch Alternative 1 Make the hardware annul instructions in the branch delay slots of a taken branch. NOP Branch taken Pros: same program runs on both unpipelined and pipelined hardware Cons: in SPEC benchmarks 14% of instructions are taken branches → 12% of total cycles are annulled 6.004 – Fall 2002 10/31/0

Branch Annulment Hardware 6.004 – Fall 2002 10/31/0

Branch Alternative 2a Fill branch delay slots with NOP instructions (i.e., the software equivalent of alternative 1) Branch taken Pros: same as alternative 1 Cons: NOPs make code longer; 12% of cycles spent executing NOPs 6.004 – Fall 2002 10/31/0

Branch Alternative 2b(i) We need to add this silly instruction to UNDO the effects of that last ADD Put USEFUL instructions in the branch delay slots; remember they will be executed whether the branch is taken or not Branch taken Pros: only two “extra” instructions are executed (on last iteration) Cons: finding “useful” instructions that are always executed is difficult; clever rewrite may be required. Program executes differently on naive unpipelined implementation. 6.004 – Fall 2002 10/31/0

Branch Alternative 2b(ii) Put USEFUL instructions in the branch delay slots; annul them if branch doesn’t behave as predicted Branch taken Pros: only one instruction is annulled (on last iteration); about 70% of branch delay slots can be filled with useful instructions Cons: Program executes differently on naive unpipelined implementation; not really useful with more than one delay slot. 6.004 – Fall 2002 10/31/0

Architectural Issue: Branch Decision Timing BETA approach: • SIMPLE branch condition logic ... Test for Reg[Ra] = 0! • ADVANTAGE: early decision, single delay slot ALTERNATIVES: • Compare-and-branch... (eg, if Reg[Ra] > Reg[Rb]) • MORE powerful, but • LATER decision (hence more delay slots) Suppose decision were made in the ALU stage ... then there would be 2 branch delay slots (and instructions to annul!) 6.004 – Fall 2002 10/31/0

4-Stage Beta Pipeline Treat register file as two separate devices: combinational READ, clocked WRITE at end of pipe. What other information do we have to pass down pipeline? PC (return addresses) instruction fields (decoding) What sort of improvement should expect in cycle time? 6.004 – Fall 2002 10/31/0

4-Stage Beta Operation Consider a sequence of instructions: Executed on our 4-stage pipeline: TIME (cycles) Pipeline 6.004 – Fall 2002 10/31/0

Pipeline “Data Hazard” BUT consider instead: Oops! CMP is trying to read Reg[R3] during cycle I+2 but ADD doesn’t write its result into Reg[R3] until the end of cycle I+3! 6.004 – Fall 2002 10/31/0

Data Hazard Solution 1 “Program around it” ... document weirdo semantics, declare it a software problem. - Breaks sequential semantics! - Costs code efficiency. EXAMPLE: Rewrite How often can we do this? 6.004 – Fall 2002 10/31/0

Data Hazard Solution 2 Stall the pipeline: Freeze IF, RF stages for 2 cycles, inserting NOPs into ALU-stage instruction register Drawback: NOPs mean “wasted” cycles 6.004 – Fall 2002 10/31/0

Data Hazard Solution 3 Bypass (aka forwarding) Paths: Add extra data paths & control logic to re-route data in problem cases. Idea: the result from the ADD which will be written into the register file at the end of cycle I+3 is actually available at output of ALU during cycle I+2 – just in time for it to be used by CMP in the RF stage! 6.004 – Fall 2002 10/31/0

Bypass Paths (I) SELECT this BYPASS path if OpCodeRF = reads Ra and OpCodeALU = OP, OPC and RaRF = RcALU i.e., instructions which use ALU to compute result and RaRF != R31 6.004 – Fall 2002 10/31/0

Bypass Paths (II) SELECT this BYPASS path if OpCodeRF = reads Ra and RaRF != R31 and not using ALU bypass and WERF = 1 and RaRF = WA But why can’t we get It from the register file? It’s being written this cycle! 6.004 – Fall 2002 10/31/0