ELEC692 VLSI Signal Processing Architecture Lecture 1

ELEC692 VLSI Signal Processing Architecture Lecture 1
Introduction to DSP Systems

Issues of VLSI Signal Processing Architecture
Performance Area/Cost Speed of execution, throughput and clock rate Power dissipation or amount of energy required to perform a given task Fixed-point DSP systems- finite wordlength performance Quantization and roundoff noise Special features of DSP systems Real-time throughput requirements Data-driven property

Typical DSP algorithm and applications (I)
Speech coding and decoding, Speech encryption and decryption Cell phones, cordless phone,multimedia computer, secure communications Speech recognition Advanced user interface, phones, consumer products, machine/human interface Speech synthesis Advanced user interface, consumer products, machine/human interface Modem algorithms Phones, wireless communications, data/fax modems, secure communications

Typical DSP algorithm and applications (II)
Noise cancellation Audio applications, wireless communications Audio Equalization Audio applications Image compression and decompression Digital camera, video, multimedia applications Beamforming Navigation, radar/sonar, wireless communications Echo cancellation Speakerphones, modems, telephone switches

Issues in wireless system design
Ubiquitous services put wireless system spectrum at a premium Current spectral efficiency far below theoretical limits Emerging solutions Adoption of better spectrum utilization techniques E.g. interference cancellation, multiple antenna, MIMO system Multi-functional, adaptive systems Even higher bit-rate wireless applications IEEE a, wireless IEEE 1394

Improving Spectral Density and higher bit rate comes at a performance and power cost
Digital baseband processing requirements From Jan Rabaey of UC Berkeley Wide-band CDMA FDMA with multiple antenna Match Filter Blind MMSE Exact Decorrelator SVD Performance Bits/sec/Hz 1 2 6 Multiplications 124 496 230,000 736 Memory 248 1240 640,000 2120 ALU 502 240,000 800 Word-length 8-bit 12-bit 16-bit

Shannon beats Moore’s Law

Energy plays a critical role
Battery capacity

Programmable processor vs. ASIC
DSP Selection guide for mobile multimedia

DSP computation - Convolution
Describe and analyze linear time-invariant (LTI) systems, which are completely characterized by their unit-sample( or impluse) response h(n) Finite impulse response (FIR) – systems containing a finite number of nonzero samples, i.e. h(n) is of finite duration infinite impulse response (IIR) –h(n) is of infinite duration A system is causal of y(n0) depends only on the past input samples x(k) , k<= n0.

DSP computation - Correlation
Widely used in digital communication Correlation of 2 sequences a(n) and x(n): It can be described as a convolution as follows: If a(n) and x(n) have finite length N, these are nonzero for n=0,1,…,N-1, the digital correlation operations is given as:

DSP computation – Digital Filters
Properties of a causal digital filter is characterized by its unit-sample response h(n) or its frequency response H(ejw) or by difference equations. A linear, time-invariant, and causal filter is given by If ak=0 for 1<= k <= N, we have This is a non-recursive M-tap finite impulse response (FIR) Filter, where h(k) = bk. If one of the is ak>0, then this is a recursive filter and its corresponding unit-sample response has infinite duration. This is referred as IIR filter

DSP computation – Digital Filters
Linear-phase FIR filter Unit-sample responses are symmetric and require only half the number of multiplications For a M-tap linear phase FIR filter: h(n)=h(M-n). E.g. 7-tap linear phase FIR filter with impulse response h(0)=h(6)=b0 h(1)=h(5)=b1, h(2)=h(4)= b2, h(3)= b3, Y(n)= b0x(n)+ b1x(n-1)+ b2x(n-2)+ b3x(n-3)+ b2x(n-4)+ b1x(n-5)+ b0x(n-6)

DSP computation – Adaptive Filter
The filter coefficient is changing and updated at each iteration. Used for applications such as echo cancellation, channel equalization, voiceband modem and many others. It predict one random process y(n) from observations of another random process x(n) using linear models such as digital filters. Coefficients are updated in order to minimize the difference between the filter output and the desried signal. Updating process continues until the coefficient converges. Consists of two blocks: a general filter block and a coefficient updating block.

DSP computation – LMS Adaptive Filter
Notations: WT(n) = [w1(n), w2(n),..,wN(n)]=weighted vector UT(n) = [u(n),u(n-1),…,u(n-N+1)]= vector of current and past input samples is the estimated signal and e(n) is the estimation error. We have

In the n-th iteration, the LMS algorithm selects WT(n) which minimizes the square error e(n)2 LMS adaptive filters consists of an FIR filter block with coefficient vector WT(n) and input sequence u(n) and a weight update block.

Weight update algorithm

Other common DSP computations
Motion estimation Used in interframe predictive coding Discrete Cosine Transform Frequency transform used in image processing Fast Fourier Transform Frequency transform used in communication and audio/voice processing Vector Quantization Used for data compression in speech, image and video coding Viterbi algorithm Error control coding, used for communication and other data correction applications. Decimator and Expanding Multirate systems for image compression, digital audio and adaptive signal processing

Implementation of DSP algorithms
A lot of applications can be implemented in programmable DSP processor or media-microprocessor For some applications, due to complexity and power issue, special VLSI architecture or ASICs are still required E.g. – MPEG2 encoder – Block Matching for ME for HDTV frame needs ~370 GOPs/sec - 2D-DCT for HDTV = 3.84 GOPs/sec

DSP representation Non-terminating programs and iteration based
Input x(n) Output y(n) DSP For n=1 to n= Iteration period – time required to execute one iteration Sampling rate (throughput) – number of samples processed per second Latency – difference between the time an output is generated and the time at which its corresponding input was received Critical path delay Clock period (clock rate is not equal to sampling rate)

DSP representation Mathematical formulation
Behavioral descriptive Language Applicative language Set of equations Prescriptive languages Specify order of assignment statement E.g. Pascal, C, SystemC Descriptive Languages Represent structure of the DSP system E.g. VHDL, Verilog Graphical Representation For investigating and analyzing data flow properties Exhibit parallelism and data-driven (dependency) properties, provide insight for space-time tradeoff. Mapping DSP algorithms to hardware implementation Block diagram, Signal-Flow Graph (SFG), Data-Flow Graph (DFG), and dependence graph (DG).

Block Diagram Consists of functional blocks connected with directed edges, which represents the data flow from its input block to output block. Edges may or may not contain delay elements

Signal Flow Graph (SFG)
SFG is a graph whose nodes represent computations/tasks and directed edge e(j,k) denotes a branch from node j and terminating at node k. With input signal at node j and output signal at node k, e(j,k) denotes a linear transformation from the signal at node j to the signal at node k. In digital network, the edges are usually restricted to constant gain multipliers, or delay elements Adders and multipliers are described by a node with multiple incoming edges and one outgoing edge. 2 special nodes – sink and source

Example SFG of a direct-form 3-tap FIR filter

Transposition of SFG Linear SFGs can be transformed into different forms Flow graph reversal or transposition for Single-input-single-output (SISO) systems Transform operations Reversing the direction of all edges Exchanging the input and output nodes while keeping the edge gain or edge delay unchanged Resulting SFG maintains the same functionality

Data Flow Graph (DFG) Graph G = (N,E) where nodes represent computations (or functions or subtasks) and directed edges represent data paths (communications between nodes). Each edge has a non-negative number of delays associated.

Data Flow Graph (DFG) DFG captures the data-driven property
Node can execute only when all the input data are available. Concurrency execution A node with multiple input edges can only execute when all its precedent nodes have executed, thus, describing the precedence constraints If edge has zero delay – intra-iteration precedence If edge has non-zero delay – inter-iteration precedence DFG are generally used for high-level synthesis, map concurrent implementation of DSP applications onto parallel hardware Task scheduling and resource allocation

Example of DFG

Synchronous Data Flow graph (SDFG)
Special case of DFG Number of data samples produced or consumed by each node in each execution is specified a priori Both for single-rate and multi-rate systems Unrolling (unfolding) multirate systems to single-rate.

Dependence Graph A directed graph that shows the dependence of the computation Nodes represent computations and edges represent precedence constraints Similar to DFG except nodes in DFG only cover the computations in one iteration, where as DG contains computations for all iterations. DFG contains delay elements that store and pass data between iterations while DG does not contain delay elelments

Example of a DG

Critical Path of a DFG Critical path – path with the longest computation time among all paths that contain zero delay (i.e. without delay element) The minimum clock period of the DSP system depends on the critical path delay In DSP systems, e.g. filter element, the critical path depends on the delay of the following: Input to the delay element Input to the output Delay element to the output Delay element to delay element E.g. 1 1 Out 1 X + + 2 X 2 2 X X In D D D D

Critical path comparison
D X + X(n) y(n) Critical Path = Delay(mult)+(N-1) delay(add) Delay element: shorter bitwidth Direct Form 4-tap FIR D X + X(n) y(n) Critical Path = Delay(mult+ delay(add) Delay element: longer bitwidth - Fanout of the input is larger Transposed Form 4-tap FIR

Iteration Period Iteration: execution of all computations of an algorithm once Iteration period: the time required for execution of an iteration E.g. y(n) = ay(n-1) + x(n) y(n) X(n) D (2) (4) a A B X(n) (2) y(n-1) D (4) a

Loop Bound Loop: a directed path that begins and ends at the same nodes. Loop Bound of the loop Lower bound on the loop computation time Defined as tl/wl, where tl is the loop computation time and wl is the number of delays in the loop E.g. y(n) X(n) D (2) (4) a A B A,B, A is a loop and Tl = 2+ 4, Wl = 1 And hence loop bound =6

Loop Bound Another example Another example A,B, A is a loop and
Tl = 2+ 4, Wl = 2 (since 2D) And hence loop bound =3 It means one iteration of loop can be executed in 3 time unit. This can be done in two independent set of precedence constraints y(n) X(n) 2D (2) (4) a A B Another example D Two loops A->B->A: T = 6, W = 2, bound = 3 A->B->C->A, T = 11, W = 1, bound = 11 Hence the loop bound of this is Max{3,11} = 11 (2) (5) (4) A B C 2D

Iteration Bound Critical Loop- the loop with maximum loop bound
Iteration bound (Tit)- the loop bound of the critical loop, Not possible to achieve iteration period lower than iteration bound even with infinite processing power E.g. Loop(A->B->A) (T/W=7/1=7 Loop(A-B->C->A) T/W = 9/2=4.5 Loop(B->C->D->B) T/W = 9/3=3 Iteration Bound= max(7,4.5,3)=7 D (2) (4) (3) (4) A B D C D D 2D

Algorithms for computing iteration bound
Longest Path Matrix Algorithm Minimum Cycle Mean Algorithm

Longest Path Matrix Algorithm (LPM)
Construct a series of matrix, iteration bound is found by examining the diagonal elements of the matrices Let d be the number of delay element in the DFG, and di be the ith delay element. Construct matrix L(m), where m =1,2,…,d such that the value of is the longest computation time of all paths from delay element di to dj that pass through exactly m-1 delays. =-1 if no such path. L(m+1) can be obtained form L(1) and L(m) recursively by, if there is k such that , otherwise =-1

LPM algorithm The diagonal element represents the longest computation time of all loops with m delays contains di. Then the iteration bound is equal to

LPM algorithm (example)
(1) 1 e.g. All paths form d3 to d1 that pass Through exactly zero delay: Path: d3->5->3->2->1->d1, D d1 (2) D d2 (1) 2 4 e.g. = =5 D d3 (2) (1) 3 5 d4 D (2) 6

LPM algorithm (another example)
(1) (2) (1) (1) (2) (1) 1 2 3 4 5 6 D D d1 d2 (1) 7

ELEC692 VLSI Signal Processing Architecture Lecture 1

Similar presentations

Presentation on theme: "ELEC692 VLSI Signal Processing Architecture Lecture 1"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ELEC692 VLSI Signal Processing Architecture Lecture 1

Similar presentations

Presentation on theme: "ELEC692 VLSI Signal Processing Architecture Lecture 1"— Presentation transcript:

Similar presentations

About project

Feedback