Download presentation
Presentation is loading. Please wait.
Published byAmani Tidball Modified over 9 years ago
1
1 ECE734 VLSI Arrays for Digital Signal Processing Chapter 3 Parallel and Pipelined Processing
2
2 ECE734 VLSI Arrays for Digital Signal Processing (C) 1997-2006 by Yu Hen Hu Basic Ideas Parallel processingPipelined processing a1a2a3a4 b1b2b3b4 c1c2c3c4 d1d2d3d4 a1b1c1d1 a2b2c2d2 a3b3c3d3 a4b4c4d4 P1 P2 P3 P4 P1 P2 P3 P4 time Colors: different types of operations performed a, b, c, d: different data streams processed Less inter-processor communication Complicated processor hardware time More inter-processor communication Simpler processor hardware
3
3 ECE734 VLSI Arrays for Digital Signal Processing (C) 1997-2006 by Yu Hen Hu Data Dependence Parallel processing requires NO data dependence between processors Pipelined processing will involve inter-processor communication P1 P2 P3 P4 P1 P2 P3 P4 time
4
4 ECE734 VLSI Arrays for Digital Signal Processing (C) 1997-2006 by Yu Hen Hu Usage of Pipelined Processing By inserting latches or registers between combinational logic circuits, the critical path can be shortened. Consequence: –reduce clock cycle time, –increase clock frequency. Suitable for DSP applications that have (infinity) long data stream. Method to incorporate pipelining: Cut-set retiming Cut set: –A cut set is a set of edges of a graph. If these edges are removed from the original graph, the remaining graph will become two separate graphs. Retiming: –The timing of an algorithm is re-adjusted while keeping the partial ordering of execution unchanged so that the results correct
5
5 ECE734 VLSI Arrays for Digital Signal Processing (C) 1997-2006 by Yu Hen Hu Graphic Transpose Theorem The transfer function of a signal flow graph remain unchanged if –The directions of each arc is reversed –The input and output labels are switched. z1z1 z1z1 x[n] y[n] h[2] h[1] h[0] z1z1 z1z1 y[n] x[n] h[2] h[1] h[0] u[n] = ?
6
6 ECE734 VLSI Arrays for Digital Signal Processing (C) 1997-2006 by Yu Hen Hu Data broadcast structure Algorithm transform may lead to pipelined structure without adding additional delays. Given a FIR filter SFG Critical path T M +2T A Use graph transposition theorem: –Reverse all arcs –Reverse input/output We obtain Critical path T M + T A No additional delay added!
7
7 ECE734 VLSI Arrays for Digital Signal Processing (C) 1997-2006 by Yu Hen Hu Fine-grain pipelining To further reduce T M. Critical Path = Max {T M1, T M2, T A }
8
8 ECE734 VLSI Arrays for Digital Signal Processing (C) 1997-2006 by Yu Hen Hu Block Processing One form of vectorized parallel processing of DSP algorithms. (Not the parallel processing in most general sense) Block vector: [x(3k) x(3k+1) x(3k+2)] Clock cycle: can be 3 times longer Original (FIR filter): Rewrite 3 equations at a time: Define block vector Block formulation:
9
9 ECE734 VLSI Arrays for Digital Signal Processing (C) 1997-2006 by Yu Hen Hu Block Processing
10
10 ECE734 VLSI Arrays for Digital Signal Processing (C) 1997-2006 by Yu Hen Hu General approach for block processing
11
11 ECE734 VLSI Arrays for Digital Signal Processing (C) 1997-2006 by Yu Hen Hu Original formulation: Rewrite Define block vectors Then Block Processing for IIR Digital Filter Time indices –n: sampling period –k: clock period (processor) –k = 2n Note: –Pipelining: clock period = sampling period. Block (parallel): clock period not equal to sampling period.
12
12 ECE734 VLSI Arrays for Digital Signal Processing (C) 1997-2006 by Yu Hen Hu Block IIR Filter D D S/PP/S + + x(2k) x(2k+1) y(2k+1) y(2k) x(n)y(n) y(2(k 1)) y(2(k 1)+1)
13
13 ECE734 VLSI Arrays for Digital Signal Processing (C) 1997-2006 by Yu Hen Hu Timing Comparison Pipelining Block processing 1234 x(1)x(2)x(3)x(4) y(1)y(2)y(3)y(4) 12345678 x(1)x(2)x(3)x(4)x(5)x(6)x(7) MAC 12345678 y(1)y(2)y(3)y(4)y(5)y(6)y(7) Add a y(1) Mul 11335577 22446688 x(2)x(4)x(6)x(8) x(1)x(3)x(5)x(7)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.