Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College

Similar presentations


Presentation on theme: "Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College"— Presentation transcript:

1 Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College
VLSI Signal Processing 1 Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College Dept. of ECE

2 256K Memory Chip Dept. of ECE, RMK Engineering College

3 VL7101 VLSI SIGNAL PROCESSING
OBJECTIVES  To understand the various VLSI architectures for digital signal processing.  To know the techniques of critical path and algorithmic strength reduction in the filter structures.  To study the performance parameters, viz. area, speed and power. OUTCOMES  To be able to design architectures for DSP algorithms.  To be able to optimize design in terms of area, speed and power.  To be able to incorporate pipeline based architectures in the design.  To be able to carry out HDL simulation of various DSP algorithms. Dept. of ECE, RMK Engineering College

4 UNIT I INTRODUCTION 6 Overview of DSP – FPGA Technology – DSP Technology requirements – Design Implementation . UNIT II METHODS OF CRITICAL PATH REDUCTION 12 Binary Adders – Binary Multipliers – Multiply-Accumulator (MAC) and sum of product (SOP) – Pipelining and parallel processing – retiming – unfolding – systolic architecture design UNIT III ALGORITHMIC STRENGTH REDUCTION METHODS AND RECURSIVE FILTER DESIGN 9 Fast convolution-pipelined and parallel processing of recursive and adaptive filters – fast IIR filters design. UNIT IV DESIGN OF PIPELINED DIGITAL FILTERS 9 Designing FIR filters – Digital lattice filter structures – bit level arithmetic architecture – redundant arithmetic – scaling and round-off noise. UNIT V SYNCHRONOUS ASYNCHRONOUS PIPELINING AND PROGRAMMABLE DSP 9 Numeric strength reduction – synchronous – wave and asynchronous pipelines – low power design – programmable DSPs – DSP architectural features/alternatives for high performance and low power. TOTAL: 45 PERIODS Dept. of ECE, RMK Engineering College

5 REFERENCES: 1. Keshab K.Parhi, “VLSI Digital Signal Processing Systems, Design and Implementation”, John Wiley, Indian Reprint, 2007. 2. U. Meyer – Baese, "Digital Signal Processing with Field Programmable Arrays", Springer, Second Edition, Indian Reprint, 2007. 3. S.Y.Kuang, H.J. White house, T. Kailath, “VLSI and Modern Signal Processing”, Prentice Hall, 1995. Dept. of ECE, RMK Engineering College

6 Applications Dept. of ECE, R M K Engineering College

7 Processors for DSP system
Need for VLSI DSP System Processors for DSP system General Purpose Microprocessors/Microcontrollers General Purpose DSPs Custom Processors in VLSI- FPGA, ASIC Real time throughput Sampling rates from 20KHz to 500 MHz Present sample is to be processed before the arrival of the next sample; if not buffered Processing rate upto 100 GOPs/sec is required Dept. of ECE, R M K Engineering College

8 Data Driven property Reduced size
Need for VLSI DSP system …. Data Driven property Systems are synchronized by data and not by clock Asynchronous operation possible Reduced size For portable and mobile applications High density circuits available - 90MnTr/cm2 Increases according to Moore’s Law Submicron fabrication technology feasible 0.07µm to 22nm Dept. of ECE, R M K Engineering College

9 Filtering FIR, IIR filters y(n) = ∑kak y(n-k) + ∑kbk x(n-k)
Typical DSP Algorithms Filtering FIR, IIR filters y(n) = ∑kak y(n-k) + ∑kbk x(n-k) With (Recursive) and without feedback Convolution and Correlation y(n) = ∑ x(k) h (n-k) y(n) = ∑ a(k) x (n+k) n= 1 to ∞ Non-terminating programs – Execute the same code repetitively Adaptive Filters –LMS Algorithm Dept. of ECE, R M K Engineering College

10 Typical DSP Algorithms …
Transforms FFT, DCT, DWT FFT : X(k) = ∑n x(n) e -j2πkn/N Real and imaginary components Decomposition SVD, LU Matrix factorization, QR decomposition Operations involved Arithmetic – Multiplication, Addition MAC operation Logic – Shifting, barrel shifting – Delay Dot Product/ Matrix-Vector operations Dept. of ECE, R M K Engineering College

11 Consider the following IIR filter
Data Flow Graph A DSP program is often represented using a Data Flow Graph (DFG), which is a directed graph that describes the program Consider the following IIR filter y[n] = x[n] + a y[n − 1] Dept. of ECE, RMK Engineering College

12 Each task is associated with its corresponding execution time
Data Flow Graph …. In the DFG, nodes represent the tasks or computations (Multiplication/Addition) Each task is associated with its corresponding execution time The edges represent the communications between the nodes e.g.A → B Associated with each edge is a non-negative number representing the delay An iteration of the node is the execution of the node, exactly once Dept. of ECE, RMK Engineering College

13 Each edge describes a precedence constraint between two nodes
Data Flow Graph …. Each edge describes a precedence constraint between two nodes The precedence constraint is an intra-iteration constraint if the edge has zero delays (i.e. computations at nodes connecting the edge occur in the same clock cycle) The precedence constraint is an inter-iteration constraint if the edge has one or more delays (i.e. computations at nodes connecting the edge occur in different clock cycles) A1 → B1 => A2 → B2 => A3 … Dept. of ECE, RMK Engineering College

14 VLSI Performance metrics
Speed – Highest frequency of operation Area – minimum area required Power – minimum power to operate All three may not be satisfied simultaneously Dept. of ECE, RMK Engineering College

15 Critical path x(n) Critical Path
the path with the longest computation time among all paths that contain zero delays Critical path length is 26 units Critical path: the lower bound on clock period To achieve high-speed, the length of the critical path should be reduced 4 10 y(n) D x(n) 14 26 18 22 26 Dept. of ECE, RMK Engineering College

16 Loop Bound A recursive DFG has one or more loops A loop bound for the L-th loop is defined as tL / wL tL is the loop computation time wL is the number of delays in the loop Iteration bound T∞ Iteration bound is the maximum loop bound of all loops in the DFG The loop that gives the iteration bound is called the critical loop The iteration bound determines the minimum critical path of a recursive system represented by that DFG structure! In other words, no matter how you pipeline or retime the DFG, you cannot get a circuit with lower critical path than the iteration bound! Dept. of ECE, RMK Engineering College

17 Loops Critical Loop Iteration Bound Loop 1: ADBA Loop 2: AECBA
Example of Iteration Bound (1) (2) 2D D A B C E F Loops Loop 1: ADBA Loop bound = 4/2 Loop 2: AECBA Loop bound = 5/3 Loop 3: AFCB Loop bound = 5/4 Critical Loop Loop 1 Iteration Bound Max{4/2,5/3,5/4} = 4/2 = 2 T∞=2 units of time. That is the minimum clock period (max frequency) this circuit can operate at after pipelining and retiming Dept. of ECE

18 Longest path matrix algorithm-1
Let d be the number of delays in DFG. Define K = [1, 2, · · · , d] Form the matrix L(1) as follows max tqd i → dj if at least one path exists L(1)i,j = q -1 if no such path exists where max tqd i → dj is the maximum of the longest computation time between delay element di to delay element dj L(m)i,j is the longest computation time of all paths from di to dj that pass thro’ m-1 delays Dept. of ECE, RMK Engineering College

19 Longest path matrix algorithm-2
Compute the successive matrices L(m+1)i,j = max ( -1, L(1)i,k + L(m)k,j ) kS in which Si,j = { k  K |(li,k  -1) & (lk,j  -1)} K -set of integers k in the interval 1 to d The iteration bound is computed from L(m)i,i T∞ = max i,mK m Dept. of ECE, RMK Engineering College

20 Longest path matrix algorithm-3
L(1) = L2,1(2) = max ( -1, L(1)2,k + L(1)k,1) k{1,2,3,4} Dept. of ECE, RMK Engineering College

21 Longest path matrix algorithm-4
L2,1(2) = max( -1, L(1)2,k + L(1)k,1) k{1,2,3,4} = max( -1,0+5) = 5 L2,2(2) = max( -1, L(1)2,k + L(1)k,2) = max( -1,4+0 ) = 4 L2,3(2) = max( -1, L(1)2,k + L(1)k,3) = max(-1) = -1 L2,4(2) = max ( -1, L(1)2,4 + L(1)k,4) = max(-1,0+0) = 0 Dept. of ECE, RMK Engineering College

22 Longest path matrix algorithm-5
L(2) = L(3) = T∞ = max 4/2, 4/2, 5/3, 5/3, 5/3, 8/4, 8/4, 5/4, 5/ = L(4) = Dept. of ECE, RMK Engineering College

23 Data Dependence Graph x’=x y x0 x1 x2 x3 x4 x5 b b’=b y’= y+bx x b2 b1
x b2 2 b1 1 b0 y1 y2 y3 y4 y5 y0 1 2 3 4 5 6 y(n)= b0 x(n) + b1 x(n-1) + b2 x(n-2) Dept. of ECE, RMK Engineering College Dept. of ECE

24 Pipelining in FIR filters
Reduce the critical path Increase the clock speed or sample speed Reduce power consumption Introduce pipelining latches along the data path Dept. of ECE, RMK Engineering College

25 Pipelining in FIR filters
Critical path : TM+2TA => TM+TA Dept. of ECE, RMK Engineering College

26 Signal Flow Graph Block Diagram Signal Flow Graph Source node
Sink node Dept. of ECE, RMK Engineering College

27 General Method of Pipelining
Pipelining latches can only be placed across any feed- forward cutset of the graph without affecting of the structure Cutset: A cutset is a set of edges of a graph such that if these edges are removed from the graph, the graph becomes disjoint. Feed-forward cutset: A cutset is called a feed- forward cutset if the data move in the forward direction on all the edges of the cutset Limitations of Pipelining Increase in Latency : The difference in the availability of the first output in the pipelined and the sequential system Increase in the number of latches Dept. of ECE, RMK Engineering College

28 General Method of Pipelining
Critical path: 4 Feed forward cutset Not Correct ! Critical Path: 2 Dept. of ECE, RMK Engineering College

29 Transposition Theorem
x x x + + x(n) Z-1 y(n) X c X b Xa + x + Reverse the direction of all edges in a given SFG and interchanging the input and output ports preserve the functionality of the system Critical Path : TM+2TA => TM+TA Dept. of ECE, RMK Engineering College

30 Fine-Grain Pipelining
Multiplier with processing time of 10 is split into two units with processing times 6 and 4 Critical path: 12 => 6 Dept. of ECE, RMK Engineering College

31 Parallel processing FIR Filters
y(n)= ax(n)+bx(n-1)+cx(n-2) y(3k) = ax(3k)+bx(3k-1)+cx(3k-2) Parallel System y(3k+1)= ax(3k+1)+bx(3k)+cx(3k-1) y(3k+2)= ax(3k+2)+bx(3k+1)+cx(3k) Sample speed is increased since multiple samples are processed at the same time. Clock speed remains the same Dept. of ECE, RMK Engineering College

32 Parallel processing FIR Filters
Tclock=LTsample Iteration Time= Tsample  1/3 (TM+2TA ) Used 3 sets of resources for 3-parallel system 3 samples are processed in 1 clock D produces an effective delay of L clock cycles in a L-parallel system at the sample rate Dept. of ECE, RMK Engineering College

33 Pipelining for Low Power
Ccharge V0 Propagation delay = k(V0- Vt)2 Power consumption = Ctotal V02 f Pipelining reduces the critical path and hence the capacitance to be charged / discharged in 1 clock period For M Level pipelining Ccharge is reduced by 1/M. But Ctotal does not change Keeping f same, reduce V0 by β V0 where β 0 to 1 Ppip = Ctotal β2 V02 f = β2 Pseq Ccharge/M β V0 Propagation delaypip = k(βV0- Vt)2 If the clock period is kept the same Ccharge V Ccharge/M β V0 = k(V0- Vt) k(βV0- Vt)2 M (βV0 - Vt) = β (V0 - Vt) Solve for β If the same clock speed (clock frequency f) is maintained, only a fraction (1/M) of the original capacitance is charged/discharged in the same amount of time. This implies that the supply voltage can be reduced to βVo (0<β <1) Dept. of ECE, RMK Engineering College

34 Example on Pipelining Consider an original 3-tap FIR filter and its fine- grain pipeline. Assume TM=10 ut, TA=2 ut, Vt=0.6V, Vo=5V, and CM=5CA.In fine-grain pipeline filter, the multiplier is broken into 2 parts, m1 and m2 with computation time of 6 u.t. and 4 u.t. respectively, with capacitance 3 times and 2 times that of an adder, respectively. (a) What is the supply voltage of the pipelined filter if the clock period remains unchanged? (b) What is the power consumption of the pipelined filter as a percentage of the original filter? Dept. of ECE, RMK Engineering College

35 Solution Solution: Original : C charge = CM + CA = 6 CA Pipelining : M=2 C charge = 3 C A 2 (5 β - 0.6)2 = β ( )2 β = or ( not valid) Vpip = Volt Ppip = Pseq Dept. of ECE, RMK Engineering College

36 Parallel System for Low power
In Parallel Processing charging capacitance does not change and the total capacitance increased by L times In order to maintain same sample rate, the clock period is increased to LTseq This means that the charging capacitance is charged/discharged L times longer (i.e., LTseq). In other words, the supply voltage can be reduced to β Vo since there is more time to charge the same capacitance Power consumption : Ppar = (L Ctotal) (β V0)2 f / L = β2 Pseq for L- Parallel System Propagation delay: Ccharge V0 Ccharge β V0 Tseq = Tpar = k(V0- Vt) k(βV0- Vt)2 For same sample rate, L Tseq = Tpar β(V0- Vt)2 = L (βV0- Vt)2 Solve for β Dept. of ECE, RMK Engineering College

37 Example on Parallel system
Consider a 4-tap FIR filter shown in Fig. 3.18(a) and its 2-parallel version in 3.18(b). The two architectures are operated at the sample period 9 u.t. Assume TM=8, TA=1, Vt=0.45V, Vo=3.3V, CM=8CA (a) What is the supply voltage of the 2-parallel filter? (b) What is the power consumption of the 2- parallel filter as a percentage of the original filter? Dept. of ECE, RMK Engineering College

38 Solution 2- parallel: Ccharge = CM + 2CA = 10CA 9 CA Vo 10 CA β Vo
Original: Ccharge = CM + CA = 9 CA 2- parallel: Ccharge = CM + 2CA = 10CA 9 CA Vo CA β Vo Tseq=  Tpar =  k(V0- Vt) k(β V0- Vt)2 Tpar = 2Tseq 9 (β )2 = 5 β ( )2 β = or (not valid) Vpar = Volts Ppar = Pseq Dept. of ECE, RMK Engineering College

39 Problems & Assignments
Prob (a) Prob Assignment Design a Low pass filter with sample rate of 48KHz and order 40 with cut off frequency of 10KHz. Write VHDL/Verilog code and simulate Hint: Use Matlab to find the coefficients and test the filter functionality by testing the impulse response 2) Implement a 4-tap filter in direct form and in transpose form. Introduce pipelining and compare the performance Dept. of ECE, RMK Engineering College


Download ppt "Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College"

Similar presentations


Ads by Google