EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining.

Slides:



Advertisements
Similar presentations
1 A latch is a pair of cross-coupled inverters –They can be NAND or NOR gates as shown –Consider their behavior (each step is one gate delay in time) –From.
Advertisements

Tutorial 2 Sequential Logic. Registers A register is basically a D Flip-Flop A D Flip Flop has 3 basic ports. D, Q, and Clock.
Lecture 13: Sequential Circuits
Chapter 7 Henry Hexmoor Registers and RTL
CS1104: Computer Organisation School of Computing National University of Singapore.
Chapter 9 Computer Design Basics. 9-2 Datapaths Reminding A digital system (or a simple computer) contains datapath unit and control unit. Datapath: A.
Registers and Counters. Register Register is built with gates, but has memory. The only type of flip-flop required in this class – the D flip-flop – Has.
Datorteknik DigitalCircuits bild 1 Combinational circuits Changes at inputs propagate at logic speed to outputs Not clocked No internal state (memoryless)
1 Introduction Sequential circuit –Output depends not just on present inputs (as in combinational circuit), but on past sequence of inputs Stores bits,
CS 151 Digital Systems Design Lecture 19 Sequential Circuits: Latches.
Digital Logic Design Lecture # 17 University of Tehran.
1 Lecture 28 Timing Analysis. 2 Overview °Circuits do not respond instantaneously to input changes °Predictable delay in transferring inputs to outputs.
CS 151 Digital Systems Design Lecture 25 State Reduction and Assignment.
Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates Invitation to Computer Science, C++ Version, Third Edition.
CS61C L21 State Elements : Circuits that Remember (1) Spring 2007 © UCB 161 Exabytes In 2006  In 2006 we created, captured, and replicated 161 exabytes.
CS61C L21 State Elements : Circuits that Remember (1) Garcia, Fall 2006 © UCB One Laptop per Child  The OLPC project has been making news recently with.
COMP Clockless Logic and Silicon Compilers Lecture 3
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
Lecture 4: Computer Memory
CS61C L17 Combinational Logic (1) Chae, Summer 2008 © UCB Albert Chae, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #17.
ENGIN112 L25: State Reduction and Assignment October 31, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 25 State Reduction and Assignment.
CS61C L15 Synchronous Digital Systems (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.
ENGIN112 L26: Shift Registers November 3, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 26 Shift Registers.
CS 151 Digital Systems Design Lecture 28 Timing Analysis.
The Processor Data Path & Control Chapter 5 Part 1 - Introduction and Single Clock Cycle Design N. Guydosh 2/29/04.
KU College of Engineering Elec 204: Digital Systems Design
Sequential circuit design
Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.
Sequential Circuits Chapter 4 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S.
Lecture 10 Topics: Sequential circuits Basic concepts Clocks
CS3350B Computer Architecture Winter 2015 Lecture 5.2: State Circuits: Circuits that Remember Marc Moreno Maza [Adapted.
How circuits acquire memory: Sequential & Clocked Circuits. COS 116, Spring 2011 Sanjeev Arora.
ENGSCI 232 Computer Systems Lecture 5: Synchronous Circuits.
Lecture 5. Sequential Logic 3 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education & Research.
Rabie A. Ramadan Lecture 3
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
ENG241 Digital Design Week #8 Registers and Counters.
EE3A1 Computer Hardware and Digital Design Lecture 6 supplement Common misunderstandings about VHDL processes.
Introduction to State Machine
Abdullah Said Alkalbani University of Buraimi
SEQUENTIAL LOGIC By Tom Fitch. Types of Circuits Combinational: Gates Combinational: Gates Sequential: Flip-Flops Sequential: Flip-Flops.
Digital Logic Design Lecture # 19 University of Tehran.
Timing Analysis Section Delay Time Def: Time required for output signal Y to change due to change in input signal X Up to now, we have assumed.
MicroComputer Engineering DigitalCircuits slide 1 Combinational circuits Changes at inputs propagate at logic speed to outputs Not clocked No internal.
COUNTERS Why do we need counters?
Edge Detection. 256x256 Byte image UART interface PC FPGA 1 Byte every a few hundred cycles of FPGA Sobel circuit Edge and direction.
Computer Organization and Design Pipelining Montek Singh Mon, Dec 2, 2013 Lecture 16.
Memory Buffering Techniques Greg Stitt ECE Department University of Florida.
CS61C L24 State Elements : Circuits that Remember (1) Garcia, Fall 2014 © UCB Senior Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
CS 61C: Great Ideas in Computer Architecture Sequential Elements, Synchronous Digital Systems 1 Instructors: Vladimir Stojanovic & Nicholas Weaver
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Dr. Shi Dept. of Electrical and Computer Engineering.
CS 110 Computer Architecture Lecture 9: Finite State Machines, Functional Units Instructor: Sören Schwertfeger School of.
Buffering Techniques Greg Stitt ECE Department University of Florida.
Buffering Techniques Greg Stitt ECE Department University of Florida.
Flip Flops Lecture 10 CAP
Combinational circuits
CS Spring 2008 – Lec #17 – Retiming - 1
State Circuits : Circuits that Remember
Digital System Design Review.
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #21 State Elements: Circuits that Remember Hello to James Muerle in the.
COE 202: Digital Logic Design Sequential Circuits Part 4
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN
ECE 545—Digital System Design with VHDL Lecture 1
ECE 352 Digital System Fundamentals
Flip-Flops.
Registers Today we’ll see some common sequential devices: counters and registers. They’re good examples of sequential analysis and design. They are also.
ECE 352 Digital System Fundamentals
ECE 352 Digital System Fundamentals
Instructor: Michael Greenbaum
Presentation transcript:

EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining

Introduction  Pipelining  Technique to speed up hardware  Used in u ASICs u Microprocessors

Flip flops  On rising_edge(clk): u Value of D at that precise instant is read into memory u New value appears at Q delta later.  At all other times u Q holds old value Memory bit

Registers  On rising_edge(clk): u Value of D (7 downto 0) at that precise instant is read into memory u New value appears at Q (7 downto 0) delta later.  At all other times u Q holds old value Memory bits

A Chain of Registers  Feed in a stream of numbers 8, 3, 7, 4, … Before the first rising clock edge 8

A Chain of Registers  Feed in a stream of numbers 8, 3, 7, 4, … After the first rising clock edge 3 8

A Chain of Registers  Feed in a stream of numbers 8, 3, 7, 4, … After the second rising clock edge 7 3 8

A Chain of Registers  Feed in a stream of numbers 8, 3, 7, 4, …  Function is shift register  Shift one stage on each clock cycle After the third rising clock edge

This is NOT what happens  Feed in a stream of numbers 8, 3, 7, 4, … Before the first rising clock edge 8

This is NOT what happens  Feed in a stream of numbers 8, 3, 7, 4, … 8 After the first rising clock edge

Why not? 8 After the first rising clock edge  Stage 1 has finite (delta) delay  Stage 2 is no longer be reading by the time stage 1 outputs the 8 Blocked Stage 1 Stage 2

Pipelines  We add some useful logic between the shift register stages

Pipelines Before the first rising clock edge 1 st inputs

Pipelines  We add some useful logic between the shift register stages After the first rising clock edge 2 nd inputs 1 st inputs

Pipelines  We add some useful logic between the shift register stages After the second rising clock edge 3 rd inputs 2 nd inputs 1 st inputs

Pipelines  We add some useful logic between the shift register stages After the third rising clock edge 4 th inputs 3 rd inputs 2 nd inputs 1 st inputs

Abbreviated notation  Registers shown by dashed line

Abbreviated notation  Registers shown by dashed line Before the first rising clock edge 1 st inputs

Abbreviated notation  Registers shown by dashed line After the first rising clock edge 2 nd inputs1 st inputs

Abbreviated notation  Registers shown by dashed line After the second rising clock edge 3 rd inputs2 nd inputs1 st inputs

Abbreviated notation  Registers shown by dashed line After the third rising clock edge 4 th inputs3 rd inputs2 nd inputs1 st inputs  Inputs proceed one stage per clock cycle

Registered Logic  Register gives clean output  Output is valid one cycle after corresponding inputs  Worst case output settling time for this adder is 1.9 ns  We could use a faster clock c is glitchy c_reg is clean 4 ns 1.9 ns

Measures of Speed  Latency u Time between the inputs appearing and corresponding outputs appearing u Latency is 1 clock cycle: 4 ns  Throughput u Rate at which we put new inputs into our circuit u 1 / 4 ns = 250 MHz. 4 ns

Measures of Speed  Latency measures delay (in seconds): u high latency means slow  Throughput measures rate (in Hz): u high throughput means fast.  For this circuit:

Turning up the speed  Now use 2 ns clock  No problem, but  Worst case input has only just settled before clock edge 1.9 ns worst case delay 2 ns

Turning up the speed too high  Now use 1.8 ns clock  Answer is sometimes wrong  Our adder does not add: unacceptable 1.9 ns worst case delay 1.8 ns

Timing Diagrams 1.9 ns worst case delay  Notation used on data sheets and in text books  c is untrustworthy until 1.9 ns after transition  c is shown as X 2 ns

Timing Diagrams 1.9 ns worst case delay  For this clock speed c never becomes trustworthy  Need to interpret this diagram with care  If you inspect output of a real device it looks mostly normal, with just a few wrong results 1.8 ns

Timing Diagrams 1.9 ns worst case delay  For this clock speed c never becomes trustworthy  Need to interpret this diagram with care  If you inspect output of a real device it looks mostly normal, with just a few wrong results 1.8 ns

Datapaths  Datapath: u Data flows in one end u Flows out the other end u Is modified on the way  A simple datapath: an adder tree u g <= a + b + d + e

Speed of a combinational datapath  Combinational: u No memory; no registers  Settling time from a to g is u (Time a  c) plus (time c  g) u 1.9 ns ns = 3.8 ns  Overall settling time is slowest of (a  g,b  g,d  g,e  g) u = 3.8 ns Worst case settling time 1.9 ns

Speed of a Registered datapath  Outputs appear 1 cycle after corresponding inputs  Clock edge must come after circuit has settled  Clock period must be > 3.8 ns; let’s use 4 ns  Latency = 4 ns  Throughput = 1 / 4 ns = 250 MHz This settles 1.9 ns after a or b change This settles 1.9 ns after c changes Apply clock edge after at least 3.8 ns

Speed of a Pipelined datapath  We can do better  c settles after only 1.9 ns  Catch this value in a register that holds it stable for a cycle  Can use a 2 ns clock  How does this help? This settles 1.9 ns after a or b change This settles 1.9 ns after c changes

Comparison: Sequence of inputs 4 ns clock 2 ns clock 1 Time = 0 1

Comparison: Sequence of inputs 4 ns clock 2 ns clock 1 Time = 2 ns 2 1

Comparison: Sequence of inputs 4 ns clock 2 ns clock 2 Time = 4 ns

Comparison: Sequence of inputs 4 ns clock 2 ns clock 2 Time = 6 ns

Comparison: Sequence of inputs 4 ns clock 2 ns clock 3 Time = 8 ns

Comparison: Sequence of inputs 4 ns clock 2 ns clock 3 Time = 10 ns

Comparison: Sequence of inputs 4 ns clock 2 ns clock Output valid 2 cycles after inputs Output valid 1 cycle after inputs

Comparison: Sequence of inputs 4 ns clock 2 ns clock Each item takes 4 ns to traverse datapath Output valid 2 cycles after inputs Output valid 1 cycle after inputs

Comparison: Sequence of inputs 4 ns clock 2 ns clock Each item takes 4 ns to traverse datapath Latency = 4 ns Output valid 2 cycles after inputs Output valid 1 cycle after inputs

Comparison: Sequence of inputs 4 ns clock 2 ns clock Each item takes 4 ns to traverse datapath Latency = 4 ns Insert new item every 2 ns Insert new item every 4 ns Output valid 2 cycles after inputs Output valid 1 cycle after inputs

Comparison: Sequence of inputs 4 ns clock 2 ns clock Each item takes 4 ns to traverse datapath Latency = 4 ns Insert new item every 2 ns Insert new item every 4 ns Throughput = 1 / 2 ns = 500 MHz Throughput = 1 / 4 ns = 250 MHz Output valid 2 cycles after inputs Output valid 1 cycle after inputs

Comparison: Sequence of inputs 4 ns clock 2 ns clock Each item takes 4 ns to traverse datapath Latency = 4 ns Insert new item every 2 ns Insert new item every 4 ns Throughput = 1 / 2 ns = 500 MHz Throughput = 1 / 4 ns = 250 MHz Output valid 2 cycles after inputs Output valid 1 cycle after inputs 2-stage pipeline 1-stage pipeline

Waveforms for 1-stage Pipeline Outputs valid 1 cycle after inputs. 4 ns clock 1-stage

2-stage Waveforms for 2-stage Pipeline 1-stage

Waveforms for 2-stage Pipeline Outputs valid 2 cycles after inputs. Latency is same Throughput is double 1-stage 2-stage

Speed of n-Stage Datapath  n-stage datapath with no pipelinining:

Speed of 1-Stage Pipeline  Register input and output  1-stage pipeline  1-stage is not normally regarded as “true” pipeline

Speed of n-Stage Pipeline  n-stage pipelined datapath  Clock rate is n times higher  Throughput is higher by factor of n  Latency is unchanged

Data Rate on an n-Stage Pipeline  Suppose we have m data items to process.  Time taken to process m items is

Numerical example  10,000 items to process  10 stage pipeline  Clock rate of 500 MHz (i.e. a clock period of 2 ns).  Pipeline latency is 10 stages x 2 ns clock period = 20 ns.  It takes 20 ns to fill the pipeline.  Then the answers emerge at a rate of one per clock cycle.  Total time is

Summary  Pipelining places registers at intermediate points in datapath  This means that new inputs can be inserted before previous inputs have emerged  n-stage pipeline is n times faster than non-pipelined datapath