Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College

Slides:

Advertisements

Similar presentations

1 A latch is a pair of cross-coupled inverters –They can be NAND or NOR gates as shown –Consider their behavior (each step is one gate delay in time) –From.

Advertisements

DSPs Vs General Purpose Microprocessors

ADSP Lecture2 - Unfolding VLSI Signal Processing Lecture 2 Unfolding Transformation.

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.

1 ECE734 VLSI Arrays for Digital Signal Processing Chapter 3 Parallel and Pipelined Processing.

Chapter 4 Retiming.

Modern VLSI Design 2e: Chapter 8 Copyright  1998 Prentice Hall PTR Topics n High-level synthesis. n Architectures for low power. n Testability and architecture.

VLSI Communication SystemsRecap VLSI Communication Systems RECAP.

ELEC692 VLSI Signal Processing Architecture Lecture 4

ECE734 VLSI Arrays for Digital Signal Processing Algorithm Representations and Iteration Bound.

Applications of Systolic Array FTR, IIR filtering, and 1-D convolution. 2-D convolution and correlation. Discrete Furier transform Interpolation 1-D and.

Sequential Logic 1  Combinational logic:  Compute a function all at one time  Fast/expensive  e.g. combinational multiplier  Sequential logic:  Compute.

Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.

Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.

20 October 2003WASPAA New Paltz, NY1 Implementation of real time partitioned convolution on a DSP board Enrico Armelloni, Christian Giottoli, Angelo.

Digital Kommunikationselektronik TNE027 Lecture 4 1 Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic analog.

CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.

VLSI DSP 2008Y.T. Hwang3-1 Chapter 3 Algorithm Representation & Iteration Bound.

ELEC692 VLSI Signal Processing Architecture Lecture 6

ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Retiming.

Algorithmic Transformations

Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College

Chapter 5 Unfolding.

1 IOE/MFG 543 Chapter 7: Job shops Sections 7.1 and 7.2 (skip section 7.3)

1 Real time signal processing SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 13, 2008 Retiming.

Lecture 9: Structure for Discrete-Time System XILIANG LUO 2014/11 1.

ELEC692 VLSI Signal Processing Architecture Lecture 1

Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.

Introduction to Adaptive Digital Filters Algorithms

CHAPTER 8 DSP Algorithm Implementation Wang Weilian School of Information Science and Technology Yunnan University.

Chapter 6 Digital Filter Structures

L7: Pipelining and Parallel Processing VADA Lab..

Copyright © 2001, S. K. Mitra Digital Filter Structures The convolution sum description of an LTI discrete-time system be used, can in principle, to implement.

Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.

Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.

ECE 448: Lab 6 DSP and FPGA Embedded Resources (Digital Downconverter)

ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.

1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.

ELEC692 VLSI Signal Processing Architecture Lecture 2 Pipelining and Parallel Processing.

Folding Technique: Compromising in Special Purpose Hardware Design

ELEC692 VLSI Signal Processing Architecture Lecture 3

Pipelining and Retiming

L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수

Exploiting Parallelism

Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.

Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Neural Networks Laboratory Slide 1 DISCRETE SIGNALS AND SYSTEMS.

Principles of Linear Pipelining

L9 : Low Power DSP Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab.

Analysis of Linear Time Invariant (LTI) Systems

1 VLSI Algorithm & Computing Structures Chapter 1. Introduction to DSP Systems Younglok Kim Dept. of Electrical Engineering Sogang University Spring 2007.

Chapter 4 Structures for Discrete-Time System Introduction The block diagram representation of the difference equation Basic structures for IIR system.

DSP Design – Lecture 7 Unfolding cont. & Folding Fredrik Edman fredrik

Digital Logic Design Alex Bronstein Lecture 2: Pipelines.

EEE4176 Applications of Digital Signal Processing

By: Mohammadreza Meidnai Urmia university, Urmia, Iran Fall 2014

102-1 Under-Graduate Project Techniques in VLSI design

{ Storage, Scaling, Summation }

Tsung-Hao Chen and Kuang-Ching Wang May

101-1 Under-Graduate Project Techniques in VLSI design

Multiplier-less Multiplication by Constants

CS184a: Computer Architecture (Structures and Organization)

Signal Processing First

Adaptive Filter A digital filter that automatically adjusts its coefficients to adapt input signal via an adaptive algorithm. Applications: Signal enhancement.

Zhongguo Liu Biomedical Engineering

Real time signal processing

Murugappan Senthilvelan May 4th 2004

Presentation transcript:

Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College VLSI Signal Processing 1 Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College Dept. of ECE

256K Memory Chip Dept. of ECE, RMK Engineering College

Applications Dept. of ECE, R M K Engineering College

Syllabus Anna University syllabus for VL9253 VLSI Signal processing Text Keshab K. Parhi, ‘VLSI Digital Signal Processing Systems, Design and implementation’, Wiley India Pvt. Ltd., 2009 Dept. of ECE, RMK Engineering College

Processors for DSP system Need for VLSI DSP System Processors for DSP system General Purpose Microprocessors/Microcontrollers General Purpose DSPs Custom Processors in VLSI- FPGA, ASIC Real time throughput Sampling rates from 20KHz to 500 MHz Present sample is to be processed before the arrival of the next sample; if not buffered Processing rate upto 100 GOPs/sec is required Dept. of ECE, R M K Engineering College

Need for VLSI DSP system …. Data Driven property Systems are synchronized by data and not by clock Asynchronous operation possible Reduced size For portable and mobile applications High density circuits available - 90MnTr/cm2 Increases according to Moore’s Law Submicron fabrication technology feasible 0.07µm Dept. of ECE, R M K Engineering College

Typical DSP Algorithms Filtering FIR, IIR filters y(n) = ∑kak y(n-k) + ∑kbk x(n-k) With (Recursive) and without feedback Convolution and Correlation y(n) = ∑ x(k) h (n-k) y(n) = ∑ a(k) x (n+k) n= 1 to ∞ Non-terminating programs – Execute the same code repetitively Adaptive Filters –LMS Algorithm Dept. of ECE, R M K Engineering College

Typical DSP Algorithms … Transforms FFT, DCT, DWT FFT : X(k) = ∑n x(n) e -j2πkn/N Real and imaginary components Decomposition SVD, LU Matrix factorization, QR decomposition Operations involved Arithmetic – Multiplication, Addition MAC operation Logic – Shifting, barrel shifiting – Delay Dot Product/ Matrix-Vector operations Dept. of ECE, R M K Engineering College

Data Flow Graph A DSP program is often represented using a Data Flow Graph (DFG), which is a directed graph that describes the program Consider the following IIR filter y[n] = x[n] + a y[n − 1] Dept. of ECE, RMK Engineering College

Data Flow Graph …. In the DFG, nodes represent the tasks or computations (Multiplication/Addition) Each task is associated with its corresponding execution time The edges represent the communications between the nodes A → B Associated with each edge is a non-negative number representing the delay An iteration of the node is the execution of the node, exactly once Dept. of ECE, RMK Engineering College

Data Flow Graph …. Each edge describes a precedence constraint between two nodes The precedence constraint is an intra-iteration constraint if the edge has zero delays (i.e. computations at nodes connecting the edge occur in the same clock cycle) The precedence constraint is an inter-iteration constraint if the edge has one or more delays (i.e. computations at nodes connecting the edge occur in different clock cycles) A1 → B1 => A2 → B2 => A3 … Dept. of ECE, RMK Engineering College

Data Flow Graph …. x(n) Critical Path the path with the longest computation time among all paths that contain zero delays Critical path length is 26 units Critical path: the lower bound on clock period To achieve high-speed, the length of the critical path should be reduced x(n) D D D D 10 10 10 10 10 y(n) 4 4 4 4 14 26 18 22 26 Dept. of ECE, RMK Engineering College

Loop Bound A recursive DFG has one or more loops A loop bound for the L-th loop is defined as tL / wL tL is the loop computation time wL is the number of delays in the loop Iteration bound T∞ Iteration bound is the maximum loop bound of all loops in the DFG The loop that gives the iteration bound is called the critical loop The iteration bound determines the minimum critical path of a recursive system represented by that DFG structure! In other words, no matter how you pipeline or retime the DFG, you cannot get a circuit with lower critical path than the iteration bound! Dept. of ECE, RMK Engineering College

Example of Iteration Bound (1) Loops Loop 1: ADBA Loop bound = 4/2 Loop 2: AECBA Loop bound = 5/3 Loop 3: AFCB Loop bound = 5/4 Critical Loop Loop 1 Iteration Bound Max{4/2,5/3,5/4} = 4/2 = 2 T∞=2 units of time. A 2D (1) (2) B D D (1) (2) C E D (2) F That is the minimum clock period (max frequency) this circuit can operate at after pipelining and retiming Dept. of ECE

Longest path matrix algorithm-1 Let d be the number of delays in DFG. Define K = [1, 2, · · · , d] Form the matrix L(1) as follows max tqd i → dj if at least one path exists L(1)i,j = q -1 if no such path exists where max tqd i → dj is the maximum of the longest computation time between delay element di to delay element dj Dept. of ECE, RMK Engineering College

Longest path matrix algorithm-2 Compute the successive matrices L(m+1)i,j = max ( -1, L(1)i,k + L(m)k,j ) kS in which Si,j = { k  K |(li,j  -1) & (lk,j  -1)} The iteration bound is computed from L(m)i,i T∞ = max ---------- i,mK m Dept. of ECE, RMK Engineering College

Longest path matrix algorithm-3 -1 0 0 -1 4 -1 0 -1 L(1) = 5 -1 -1 0 5 -1 -1 -1 L2,1(2) = max ( -1, L(1)2,k + L(1)k,1) k{1,2,3,4} Dept. of ECE, RMK Engineering College

Longest path matrix algorithm-4 L2,1(2) = max( -1, L(1)2,k + L(1)k,1) k{1,2,3,4} = max( -1,0+5) = 5 L2,2(2) = max( -1, L(1)2,k + L(1)k,2) = max( -1,4+0 ) = 4 L2,3(2) = max( -1, L(1)2,k + L(1)k,3) = max(-1) = -1 L2,4(2) = max ( -1, L(1)2,4 + L(1)k,4) = max(-1,0+0) = 0 Dept. of ECE, RMK Engineering College

Longest path matrix algorithm-5 4 -1 0 -1 5 4 -1 0 L(2) = 5 5 -1 -1 -1 5 -1 -1 5 4 -1 0 8 5 4 -1 L(3) = 9 5 5 -1 T∞ = max 4/2, 4/2, 5/3, 5/3, 5/3, 8/4, 8/4, 5/4, 5/4 9 -1 5 -1 = 2 9 8 5 4 L(4) = 10 9 5 5 10 9 -1 5 Dept. of ECE, RMK Engineering College

Data Independence Graph x’=x y x0 x1 x2 x3 x4 x5 b b’=b y’= y+bx x b2 2 b1 1 b0 y1 y2 y3 y4 y5 y0 1 2 3 4 5 6 y(n)= b0 x(n) + b1 x(n-1) + b2 x(n-2) Dept. of ECE, RMK Engineering College Dept. of ECE

Pipelining in FIR filters Reduce the critical path Increase the clock speed or sample speed Reduce power consumption Introduce pipelining latches along the data path Dept. of ECE, RMK Engineering College

Pipelining in FIR filters Critical path : TM+2TA => TM+TA Dept. of ECE, RMK Engineering College

General Method of Pipelining Pipelining latches can only be placed across any feed-forward cutset of the graph without affecting of the structure Cutset: A cutset is a set of edges of a graph such that if these edges are removed from the graph, the graph becomes disjoint. Feed-forward cutset: A cutset is called a feed- forward cutset if the data move in the forward direction on all the edges of the cutset Limitations of Pipelining Increase in Latency : The difference in the availability of the first output Increase in the number of latches Dept. of ECE, RMK Engineering College

General Method of Pipelining Critical path: 4 Feed forward cutset Not Correct ! Critical Path: 2 Dept. of ECE, RMK Engineering College

Transposition Theorem x(n) c b a Z-1 Z-1 y(n) Reverse the direction of all edges in a given SFG and interchanging the input and output ports preserve the functionality of the system Critical Path : TM+2TA => TM+TA Dept. of ECE, RMK Engineering College

Fine-Grain Pipelining Multiplier with processing time of 10 is split into two units with processing times 6 and 4 Critical path: 12 => 6 Dept. of ECE, RMK Engineering College

Parallel processing FIR Filters y(n)= ax(n)+bx(n-1)+cx(n-2) y(3k) = ax(3k)+bx(3k-1)+cx(3k-2) y(3k+1)= ax(3k+1)+bx(3k)+cx(3k-1) y(3k+2)= ax(3k+2)+bx(3k+1)+cx(3k) Sample speed is increased since multiple samples are processed at the same time. Clock speed remains the same Dept. of ECE, RMK Engineering College

Parallel processing FIR Filters Iteration Time= 1/3 (TM+2TA ) Used 3 sets of resources for 3-parallel system Dept. of ECE, RMK Engineering College

Pipelining for Low Power Ccharge V0 Propagation delay = --------------- k(V0- Vt)2 Power consumption = Ctotal V02 f For M Level pipelining Ccharge is reduced by 1/M Keeping f same reduce V0 by β V0 where β 0 to 1 Ppip = Ctotal β2 V02 f = β2 Pseq Ccharge/M β V0 Propagation delaypip = -------------------- k(βV0- Vt)2 If the clock period is kept the same Ccharge V0 Ccharge/M β V0 ------------ = ------------------- k(V0- Vt)2 k(βV0- Vt)2 (βV0- Vt)2 = β (V0- Vt)2 Solve for β Dept. of ECE, RMK Engineering College

Example on Pipelining Consider an original 3-tap FIR filter and its fine- grain pipeline. Assume TM=10 ut, TA=2 ut, Vt=0.6V, Vo=5V, and CM=5CA.In fine-grain pipeline filter, the multiplier is broken into 2 parts, m1 and m2 with computation time of 6 u.t. and 4 u.t. respectively, with capacitance 3 times and 2 times that of an adder, respectively. (a) What is the supply voltage of the pipelined filter if the clock period remains unchanged? (b) What is the power consumption of the pipelined filter as a percentage of the original filter? Dept. of ECE, RMK Engineering College

Solution Solution: Original : C charge = CM + CA = 6 CA Pipelining : C charge = 3 C A (5 β - 0.6)2 = β (5 - 0.6)2 β = 0.6033 or 0.0239 ( not valid) Vpip = 3.0165V0 Ppip = 0.364 Pseq Dept. of ECE, RMK Engineering College

Parallel System for Low power Power consumption : Ppar = (L Ctotal) (β V0)2 f / L = P seq for L- Parallel System Propagation delay: Ccharge V0 Ccharge β V0 Tseq = --------------- Tpar = ---------------- k(V0- Vt)2 k(βV0- Vt)2 L Tseq = Tpar β(V0- Vt)2 = L (βV0- Vt)2 Solve for β Dept. of ECE, RMK Engineering College

Example on Parallel system Consider a 4-tap FIR filter shown in Fig. 3.18(a) and its 2-parallel version in 3.18(b). The two architectures are operated at the sample period 9 u.t. Assume TM=8, TA=1, Vt=0.45V, Vo=3.3V, CM=8CA (a) What is the supply voltage of the 2-parallel filter? (b) What is the power consumption of the 2- parallel filter as a percentage of the original filter? Dept. of ECE, RMK Engineering College

Solution 2- parallel: Ccharge = CM + 2CA = 10CA 9 (β 3.3 - 0.45)2 = 5 β (3.3 - 0.45)2 β = 0.6585 or 0.0282 (not valid) Vpar = 2.1743 Vo Ppar = 0.4341 P Dept. of ECE, RMK Engineering College

Problems & Assignments Prob. 2.7.1 (a) Prob. 2.7.4 Assignment Design a Low pass filter with sample rate of 48KHz and order 40 with cut off frequency of 10KHz. Write VHDL/Verilog code and simulate Hint: Use Matlab to find the coefficients and test the filter functionality by testing the impulse response 2) Implement a 4-tap filter in direct form and in transpose form. Introduce pipelining and compare the performance Dept. of ECE, RMK Engineering College