CS/COE 1541 (term 2174) Jarrett Billingsley

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Chapter Six 1.
EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.
Chapter 12 Pipelining Strategies Performance Hazards.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Chapter Six Enhancing Performance with Pipelining
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Pipelining What is it? How does it work? What are the benefits? What could go wrong? By Derek Closson.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Pipelining Example Laundry Example: Three Stages
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
CS203 – Advanced Computer Architecture Pipelining Review.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Pipelining – Loop unrolling and Multiple Issue
Chapter Six.
Stalling delays the entire pipeline
Pipelining Chapter 6.
Pipelining – Out-of-order execution and exceptions
Morgan Kaufmann Publishers
Pipelining - Branch Prediction
Single Clock Datapath With Control
The Interconnect, Control, and Instruction Decoding
Pipeline Implementation (4.6)
Morgan Kaufmann Publishers The Processor
Lecture 6: Advanced Pipelines
Pipelining review.
Pipelining Chapter 6.
The processor: Pipelining and Branching
Lecture 5: Pipelining Basics
Pipelining in more detail
CSCI206 - Computer Organization & Programming
Data Hazards Data Hazard
Pipeline control unit (highly abstracted)
Chapter Six.
The Processor Lecture 3.6: Control Hazards
Chapter Six.
CS/COE 0447 Jarrett Billingsley
November 5 No exam results today. 9 Classes to go!
Instruction Execution Cycle
Multicycle and Microcode
Pipeline control unit (highly abstracted)
CS203 – Advanced Computer Architecture
Pipelining, Superscalar, and Out-of-order architectures
Lecture 4: Advanced Pipelines
Pipeline Control unit (highly abstracted)
Lecture 1 An Overview of High-Performance Computer Architecture
Presentation transcript:

CS/COE 1541 (term 2174) Jarrett Billingsley Pipelining - Details CS/COE 1541 (term 2174) Jarrett Billingsley

Class Announcements Today's the last day of add/drop! Though by now you'd probably have a hard time switching classes. Homework due today! If you have a physical copy, please hand it in now. If you're turning it in digitally, it's due by midnight tonight. Please type it up – don't scan a paper assignment, as that's kinda the worst of both worlds... If you're still working on it: Don't worry too much about whether to stall the ID or EX phases during data hazards. 1/18/2017 CS/COE 1541 term 2174

Forwarding considerations 1/18/2017 CS/COE 1541 term 2174

The hardware For every forwarding path, you've gotta add more wires/muxes. These muxes are switched when there are data hazards. The ID phase might have fetched wrong register values – but that's OK! But there's an important issue to consider... Them's some big wires. EX MEM 32 EX→EX ALU MEM→EX Memory 1/18/2017 CS/COE 1541 term 2174

Oh dear What if we had more pipeline stages? Maybe 5 EX stages and 6 MEM stages? Should we connect every stage to every other stage? If you want "full" forwarding, the interconnect becomes the limiting factor. And what happens when you have more circuitry per stage? The stage takes longer. Which means... Your clock has to run slower. Which means... You might end up reducing performance! More circuitry also uses more power. Engineering is all about tradeoffs. This applies to any design. Eventually you reach a point of diminishing returns, when more complexity doesn't get you any more performance. 1/18/2017 CS/COE 1541 term 2174

How stalls work 1/18/2017 CS/COE 1541 term 2174

Watching a stall IF ID EX MEM WB Suppose we have an add that depends on an lw. IF ID EX MEM sub add lw WAIT! Ins. Decoder Register File ALU Memory Memory WB 1/9/2017 CS/COE 1541 term 2174

How does a stall happen? If the control detects a stall condition, it does the following: It stops fetching instructions (doesn't update the PC). It stops clocking the pipeline registers for the stalled stages. The stages after the stalled instructions are filled with nops. Just change the control signals in the pipeline registers! In this way, the stalled instructions will sit still. What happens as we make the pipeline deeper? What if we had 6 memory stages? How many cycles would a memory stall cost us? Oh dear. 1/18/2017 CS/COE 1541 term 2174

How flushes work 1/18/2017 CS/COE 1541 term 2174

What's a flush? We saw an example of a flush last time. blt s0,10,top 1 2 3 4 5 6 7 blt s0,10,top la a0,done_msg jal printf s0 < 10... OOPS! move a0,s0 EX ID IF IF ID IF MEM IF POW BOOM WB ID EX MEM WB 1/18/2017 CS/COE 1541 term 2174

Watching a flush IF ID EX MEM WB Let's watch the previous example. la move jal blt nop nop Ins. Decoder Register File ALU Memory Memory WB 1/9/2017 CS/COE 1541 term 2174

How do flushes work? If the control detects a flushing situation: Any "newer" instructions (those already in the pipeline) are transformed into nops. Any "older" instructions (those that came BEFORE the branch) are left alone to finish executing as normal. And just like stalls... As the pipeline gets longer, flushes get costlier. If you have to flush 13 instructions after a wrong branch, well crap. (actually this is exactly what happens in modern CPUs) Again, it's a balancing act. Do you make the pipeline deeper for speed, but make wrong branches unreasonably costly? 1/18/2017 CS/COE 1541 term 2174

Two memories! 1/18/2017 CS/COE 1541 term 2174

Memory CPU Data Memory CPU Program Memory Von Neumann vs. Harvard Historically there were two types of memory arrangements: Memory CPU Von Neumann Data Memory CPU Program Memory Harvard 1/18/2017 CS/COE 1541 term 2174

Striking a balance Each memory arrangement has pros and cons. Number of physical memories? Von Neumann wins. Read from two places at once? Harvard wins. Change program code? Von Neumann wins. Caching? Harvard wins. Obviously we only have a single system memory today, but... Internally, the CPU pretends like it has two! 1/18/2017 CS/COE 1541 term 2174

Caching Having one memory for program code and one for data solves the structural hazard of wanting to read/write to two places at once. But there's another big problem: Access time for modern RAM: 12ns Cycle length of 3.2 GHz CPU: 0.3 ns This means it would take 40 cycles to access RAM! These two problems are solved by using caches: smaller, but much faster memories integrated into the CPU itself. Caching is extremely important for pipelining – without it, stalls would be the norm, not the exception. 1/18/2017 CS/COE 1541 term 2174

Branch prediction 1/18/2017 CS/COE 1541 term 2174

Predicting the future, poorly We said previously that we can use some kind of statistical analysis to decide whether or not a conditional branch is taken. So we looked at a bunch of programs, and ran them, and watched what happened, and... It was found that on average, conditional branches were taken 2/3 of the time. Based on this info, should we predict that branches will be taken, or not taken? 1/18/2017 CS/COE 1541 term 2174

Kinds of branches Loops. Conditionals. Calculated jumps (switch statements). Virtual method calls. Depending on the program, each of these kinds will be more or less prevalent and behave differently. Depending on the inputs, they can behave very differently! We need something more robust. 1/18/2017 CS/COE 1541 term 2174