Presentation is loading. Please wait.

Presentation is loading. Please wait.

EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining.

Similar presentations


Presentation on theme: "EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining."— Presentation transcript:

1 EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining

2 Introduction  Pipelining  Technique to speed up hardware  Used in u ASICs u Microprocessors

3 Flip flops  On rising_edge(clk): u Value of D at that precise instant is read into memory u New value appears at Q delta later.  At all other times u Q holds old value Memory bit

4 Registers  On rising_edge(clk): u Value of D (7 downto 0) at that precise instant is read into memory u New value appears at Q (7 downto 0) delta later.  At all other times u Q holds old value Memory bits

5 A Chain of Registers  Feed in a stream of numbers 8, 3, 7, 4, … Before the first rising clock edge 8

6 A Chain of Registers  Feed in a stream of numbers 8, 3, 7, 4, … After the first rising clock edge 3 8

7 A Chain of Registers  Feed in a stream of numbers 8, 3, 7, 4, … After the second rising clock edge 7 3 8

8 A Chain of Registers  Feed in a stream of numbers 8, 3, 7, 4, …  Function is shift register  Shift one stage on each clock cycle 4 7 38 After the third rising clock edge

9 This is NOT what happens  Feed in a stream of numbers 8, 3, 7, 4, … Before the first rising clock edge 8

10 This is NOT what happens  Feed in a stream of numbers 8, 3, 7, 4, … 8 After the first rising clock edge

11 Why not? 8 After the first rising clock edge  Stage 1 has finite (delta) delay  Stage 2 is no longer be reading by the time stage 1 outputs the 8 Blocked Stage 1 Stage 2

12 Pipelines  We add some useful logic between the shift register stages

13 Pipelines Before the first rising clock edge 1 st inputs

14 Pipelines  We add some useful logic between the shift register stages After the first rising clock edge 2 nd inputs 1 st inputs

15 Pipelines  We add some useful logic between the shift register stages After the second rising clock edge 3 rd inputs 2 nd inputs 1 st inputs

16 Pipelines  We add some useful logic between the shift register stages After the third rising clock edge 4 th inputs 3 rd inputs 2 nd inputs 1 st inputs

17 Abbreviated notation  Registers shown by dashed line

18 Abbreviated notation  Registers shown by dashed line Before the first rising clock edge 1 st inputs

19 Abbreviated notation  Registers shown by dashed line After the first rising clock edge 2 nd inputs1 st inputs

20 Abbreviated notation  Registers shown by dashed line After the second rising clock edge 3 rd inputs2 nd inputs1 st inputs

21 Abbreviated notation  Registers shown by dashed line After the third rising clock edge 4 th inputs3 rd inputs2 nd inputs1 st inputs  Inputs proceed one stage per clock cycle

22 Registered Logic  Register gives clean output  Output is valid one cycle after corresponding inputs  Worst case output settling time for this adder is 1.9 ns  We could use a faster clock c is glitchy c_reg is clean 4 ns 1.9 ns

23 Measures of Speed  Latency u Time between the inputs appearing and corresponding outputs appearing u Latency is 1 clock cycle: 4 ns  Throughput u Rate at which we put new inputs into our circuit u 1 / 4 ns = 250 MHz. 4 ns

24 Measures of Speed  Latency measures delay (in seconds): u high latency means slow  Throughput measures rate (in Hz): u high throughput means fast.  For this circuit:

25 Turning up the speed  Now use 2 ns clock  No problem, but  Worst case input has only just settled before clock edge 1.9 ns worst case delay 2 ns

26 Turning up the speed too high  Now use 1.8 ns clock  Answer is sometimes wrong  Our adder does not add: unacceptable 1.9 ns worst case delay 1.8 ns

27 Timing Diagrams 1.9 ns worst case delay  Notation used on data sheets and in text books  c is untrustworthy until 1.9 ns after transition  c is shown as X 2 ns

28 Timing Diagrams 1.9 ns worst case delay  For this clock speed c never becomes trustworthy  Need to interpret this diagram with care  If you inspect output of a real device it looks mostly normal, with just a few wrong results 1.8 ns

29 Timing Diagrams 1.9 ns worst case delay  For this clock speed c never becomes trustworthy  Need to interpret this diagram with care  If you inspect output of a real device it looks mostly normal, with just a few wrong results 1.8 ns

30 Datapaths  Datapath: u Data flows in one end u Flows out the other end u Is modified on the way  A simple datapath: an adder tree u g <= a + b + d + e

31 Speed of a combinational datapath  Combinational: u No memory; no registers  Settling time from a to g is u (Time a  c) plus (time c  g) u 1.9 ns + 1.9 ns = 3.8 ns  Overall settling time is slowest of (a  g,b  g,d  g,e  g) u = 3.8 ns Worst case settling time 1.9 ns

32 Speed of a Registered datapath  Outputs appear 1 cycle after corresponding inputs  Clock edge must come after circuit has settled  Clock period must be > 3.8 ns; let’s use 4 ns  Latency = 4 ns  Throughput = 1 / 4 ns = 250 MHz This settles 1.9 ns after a or b change This settles 1.9 ns after c changes Apply clock edge after at least 3.8 ns

33 Speed of a Pipelined datapath  We can do better  c settles after only 1.9 ns  Catch this value in a register that holds it stable for a cycle  Can use a 2 ns clock  How does this help? This settles 1.9 ns after a or b change This settles 1.9 ns after c changes

34 Comparison: Sequence of inputs 4 ns clock 2 ns clock 1 Time = 0 1

35 Comparison: Sequence of inputs 4 ns clock 2 ns clock 1 Time = 2 ns 2 1

36 Comparison: Sequence of inputs 4 ns clock 2 ns clock 2 Time = 4 ns 3 2 1 1

37 Comparison: Sequence of inputs 4 ns clock 2 ns clock 2 Time = 6 ns 4 3 2 1

38 Comparison: Sequence of inputs 4 ns clock 2 ns clock 3 Time = 8 ns 5 4 3 2

39 Comparison: Sequence of inputs 4 ns clock 2 ns clock 3 Time = 10 ns 6 5 4 2

40 Comparison: Sequence of inputs 4 ns clock 2 ns clock 3 6 5 4 2 Output valid 2 cycles after inputs Output valid 1 cycle after inputs

41 Comparison: Sequence of inputs 4 ns clock 2 ns clock 3 6 5 4 2 Each item takes 4 ns to traverse datapath Output valid 2 cycles after inputs Output valid 1 cycle after inputs

42 Comparison: Sequence of inputs 4 ns clock 2 ns clock 3 6 5 4 2 Each item takes 4 ns to traverse datapath Latency = 4 ns Output valid 2 cycles after inputs Output valid 1 cycle after inputs

43 Comparison: Sequence of inputs 4 ns clock 2 ns clock 3 6 5 4 2 Each item takes 4 ns to traverse datapath Latency = 4 ns Insert new item every 2 ns Insert new item every 4 ns Output valid 2 cycles after inputs Output valid 1 cycle after inputs

44 Comparison: Sequence of inputs 4 ns clock 2 ns clock 3 6 5 4 2 Each item takes 4 ns to traverse datapath Latency = 4 ns Insert new item every 2 ns Insert new item every 4 ns Throughput = 1 / 2 ns = 500 MHz Throughput = 1 / 4 ns = 250 MHz Output valid 2 cycles after inputs Output valid 1 cycle after inputs

45 Comparison: Sequence of inputs 4 ns clock 2 ns clock 3 6 5 4 2 Each item takes 4 ns to traverse datapath Latency = 4 ns Insert new item every 2 ns Insert new item every 4 ns Throughput = 1 / 2 ns = 500 MHz Throughput = 1 / 4 ns = 250 MHz Output valid 2 cycles after inputs Output valid 1 cycle after inputs 2-stage pipeline 1-stage pipeline

46 Waveforms for 1-stage Pipeline Outputs valid 1 cycle after inputs. 4 ns clock 1-stage

47 2-stage Waveforms for 2-stage Pipeline 1-stage

48 Waveforms for 2-stage Pipeline Outputs valid 2 cycles after inputs. Latency is same Throughput is double 1-stage 2-stage

49 Speed of n-Stage Datapath  n-stage datapath with no pipelinining:

50 Speed of 1-Stage Pipeline  Register input and output  1-stage pipeline  1-stage is not normally regarded as “true” pipeline

51 Speed of n-Stage Pipeline  n-stage pipelined datapath  Clock rate is n times higher  Throughput is higher by factor of n  Latency is unchanged

52 Data Rate on an n-Stage Pipeline  Suppose we have m data items to process.  Time taken to process m items is

53 Numerical example  10,000 items to process  10 stage pipeline  Clock rate of 500 MHz (i.e. a clock period of 2 ns).  Pipeline latency is 10 stages x 2 ns clock period = 20 ns.  It takes 20 ns to fill the pipeline.  Then the answers emerge at a rate of one per clock cycle.  Total time is

54 Summary  Pipelining places registers at intermediate points in datapath  This means that new inputs can be inserted before previous inputs have emerged  n-stage pipeline is n times faster than non-pipelined datapath


Download ppt "EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining."

Similar presentations


Ads by Google