B10001 Pipelining Hazards ENGR xD52 Eric VanWyk Fall 2012.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

COMP381 by M. Hamdi 1 (Recap) Pipeline Hazards. COMP381 by M. Hamdi 2 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining - Hazards.
Instruction-Level Parallelism (ILP)
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Pipelining Datapath Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley) and Hank Walker (TAMU)
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Appendix A Pipelining: Basic and Intermediate Concepts
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Lecture 12: Pipeline Datapath Design Professor Mike Schulte Computer Architecture ECE 201.
Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
1 COMP541 Pipelined MIPS Montek Singh Mar 30, 2010.
COMP541 Multicycle MIPS Montek Singh Mar 25, 2010.

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1 COMP541 Pipelined MIPS Montek Singh Apr 9, 2012.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
B10010 Pipelining Hazards ENGR xD52 Eric VanWyk Fall 2013.
Lecture 18: Pipelining I.
CSCI206 - Computer Organization & Programming
Pipeline Implementation (4.6)
ECE232: Hardware Organization and Design
ECS 154B Computer Architecture II Spring 2009
CDA 3101 Spring 2016 Introduction to Computer Organization
ECE232: Hardware Organization and Design
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Chapter 4 The Processor Part 2
Pipelining review.
Pipelining in more detail
CSC 4250 Computer Architectures
CSCI206 - Computer Organization & Programming
The Processor Lecture 3.6: Control Hazards
Introduction to Computer Organization and Architecture
Guest Lecturer: Justin Hsia
Presentation transcript:

b10001 Pipelining Hazards ENGR xD52 Eric VanWyk Fall 2012

Today Review Pipelined CPUs Discuss Hazards of Pipelining Amdahl’s Law

Review Pipelining allows multiple instructions to be “in flight” in the data path at the same time Temporal Parallelism breaks instructions in to small tasks that run in multiple stages Potential Throughput Speedup = # Stages Hazards reduce these benefits – Can always be “solved” with a No-Op (but that sucks)

In Flight Entertainment What does “in flight” mean in this context? What state does each instruction need? Where is this state stored?

In Flight Entertainment What does “in flight” mean in this context? What state does each instruction need? Where is this state stored? Registers PC Data Memory Instr. Memory Register File Register File IF Instruction Fetch RF Register Fetch EX Execute MEM Data Memory WB Writeback

In Flight Entertainment One instruction is in stage at a time – No “smearing” across stages Entire instruction state is in the stage’s registers Registers PC Data Memory Instr. Memory Register File Register File IF Instruction Fetch RF Register Fetch EX Execute MEM Data Memory WB Writeback

Pipelined CPU w/ Controls Montek Singh, COMPS541

The Life and Death of State Control Signals are “Born” in the Decoder – Propagated until they are needed Data Signals are “Born” later – e.g. Reg File Reads, ALU Result Signals “Die” when they are no longer needed – Shed no tears for me. My glory lives forever.

State Check Annotate control signals on the 5 stage CPU – Spawn Point, Usage(s), Cull Point – Width WidthIF/IDID/EXEX/MEMMEM/WB Read Reg Addrs5+5 Read Reg Data A32 Read Reg Data B32 Write Reg Addr5 Write Reg Data32 ALU Cntl5 ALU Src1 RegWrite1 MemWrite1 ALU Result32 ALU Zero1

Jumping and Branching When does Jump update PC? Is this ok? Can we do better?

Jumping and Branching When does Jump update PC? Is this ok? Can we do better? A Control Hazard is when the wrong instruction gets executed because IFetch Fail

Jumping and Branching How about Branch? Register PC Data Memory Instr. Memory Register File Register File

Jumping and Branching How about Branch? Register PC Data Memory Instr. Memory Register File Register File + test Add hardware -> Update PC after RegFetch/Decode

Branch is still a Hazard PC is updated at the end of Reg/Dec What does this do to this sample program? Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWrR-type IfetchReg/DecMemWrbeq IfetchReg/DecExecMemWr load IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type Exec

Branch is still a Hazard PC is updated at the end of Reg/Dec What does this do to this sample program? Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWrR-type IfetchReg/DecMemWrbeq IfetchReg/DecExecMemWr load IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type Exec

What to do? LW is sneaking in past the branch!! How can we solve this problem? This is exactly why Comp Arch is so damn cool

Control Hazard Solution: Stall Delay Fetch/Decoding the next instruction What is the impact on performance? Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWrR-type IfetchReg/DecMemWrbeq IfetchReg/DecExecMemWr IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type Exec Bubbl e Stall

Control Hazard Solution: Embrace It Re-define not as a hazard, but as a feature! Compiler moves an instruction in to the “Branch Delay Slot” Very common in embedded / DSP processors – Total control over instruction set / compiler / etc

Control Hazard Solution: Guess&Check Easier to beg forgiveness than ask permission – Make an assumption, execute accordingly – If it was wrong, abort the speculative instructions I shall be telling this with a sigh Somewhere ages and ages hence: Two roads diverged in a wood, and I, I took the one less traveled by, And that has made all the difference. - Robert Frost

Control Hazard: Guess&Check How do we pick which way to go? Invent a scheme, apply it to example code – How many did you get right? – Does the nature of the code matter? – Does the nature of the inputs matter? How would this be implemented in HW?

Control Hazard: Guess&Check int num_positive(int[] sensor_values){ for(i =0; i< length; i++) if(sensor_values[i] >0) num += 1; return num; }

Control Hazard Summary Branch Penalty is Architecture Dependant – We reduced BEQ from 3 to 1 with extra hardware Uncertainty is expensive – Stalling costs time – Predicting costs power and area

Data Hazards What happens with the following code? add $t0, $t1, $t2 sub $t3, $t0, $t4 and $t5, $t0, $t7 or $t8, $t0, $s0 xor $s1, $t0, $s2 Mem WrExec Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWradd IfetchReg/DecMemsub IfetchReg/DecExecWr and IfetchReg/DecMemWror IfetchReg/DecMemWrxor Exec

Data Hazards What happens with the following code? add $t0, $t1, $t2 sub $t3, $t0, $t4 and $t5, $t0, $t7 or $t8, $t0, $s0 xor $s1, $t0, $s2 Mem WrExec Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWradd IfetchReg/DecMemsub IfetchReg/DecExecWr and IfetchReg/DecMemWror IfetchReg/DecMemWrxor Exec

Data Hazards: Forwarding Result isn’t committed until Writeback! – … but is available after Execute – … and really only needed in time for Execute Mem WrExec Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWradd IfetchReg/DecMemsub IfetchReg/DecExecWr and IfetchReg/DecMemWror IfetchReg/DecMemWrxor Exec

Data Hazards: Forwarding Result isn’t committed until Writeback! – … but is available after Execute – … and really only needed in time for Execute Mem WrExec Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWradd IfetchReg/DecMemsub IfetchReg/DecExecWr and IfetchReg/DecMemWror IfetchReg/DecMemWrxor Exec

Data Hazards: Forwarding Allows immediate use of a result Requires decoder to track where things are Try implementing forwarding in HW – What new registers are needed? – New Muxes? – Control logic? – Can you forward with LW?

In Groups Branch Prediction Forwarding Hardware Design Create a program to show a hazard – Calculate performance with ‘vanilla’ MIPS pipeline – Improve the pipeline – Calculate performance with ‘better’ MIPS pipeline

Feedback Give answers anonymously before class is over How many hours per week are you spending on Computer Architecture outside of class? How many should you be spending? What can I do to make these numbers match? What can you do?