1 ECE734 VLSI Arrays for Digital Signal Processing Chapter 3 Parallel and Pipelined Processing.

Slides:



Advertisements
Similar presentations
25 July, 2014 Martijn v/d Horst, TU/e Computer Science, System Architecture and Networking 1 Martijn v/d Horst
Advertisements

1 ECE734 VLSI Arrays for Digital Signal Processing Loop Transformation.
DSPs Vs General Purpose Microprocessors
Programmable FIR Filter Design
Convolution circuits synthesis Perkowski. FIR-filter like structure b4b3 b2b1 +++ a4000 a4*b4.
ADSP Lecture2 - Unfolding VLSI Signal Processing Lecture 2 Unfolding Transformation.
DSP-CIS Chapter-5: Filter Realization
Chapter 4 Retiming.
Modern VLSI Design 2e: Chapter 8 Copyright  1998 Prentice Hall PTR Topics n High-level synthesis. n Architectures for low power. n Testability and architecture.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
VLSI Communication SystemsRecap VLSI Communication Systems RECAP.
ECE 331 – Digital System Design
ECE734 VLSI Arrays for Digital Signal Processing Algorithm Representations and Iteration Bound.
Frame-Level Pipelined Motion Estimation Array Processor Surin Kittitornkun and Yu Hen Hu IEEE Trans. on, for Video Tech., Vol. 11, NO.2 FEB, 2001.
The Control Unit: Sequencing the Processor Control Unit: –provides control signals that activate the various microoperations in the datapath the select.
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
Spring 08, Feb 28 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2008 Retiming Vishwani D. Agrawal James J. Danaher.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Digital Kommunikationselektronik TNE027 Lecture 4 1 Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic analog.
VHDL Coding Exercise 4: FIR Filter. Where to start? AlgorithmArchitecture RTL- Block diagram VHDL-Code Designspace Exploration Feedback Optimization.
Spring 07, Apr 5 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Retiming Vishwani D. Agrawal James J. Danaher Professor.
Distributed Arithmetic: Implementations and Applications
VLSI DSP 2008Y.T. Hwang3-1 Chapter 3 Algorithm Representation & Iteration Bound.
ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Retiming.
Algorithmic Transformations
EECS 20 Chapter 9 Part 21 Convolution, Impulse Response, Filters Last time we Revisited the impulse function and impulse response Defined the impulse (Dirac.
Chapter 5 Unfolding.
EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
Lecture 9: Structure for Discrete-Time System XILIANG LUO 2014/11 1.
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
High Speed, Low Power FIR Digital Filter Implementation Presented by, Praveen Dongara and Rahul Bhasin.
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
Chapter 6 Digital Filter Structures
Amalgam: a Reconfigurable Processor for Future Fabrication Processes Nicholas P. Carter University of Illinois at Urbana-Champaign.
L7: Pipelining and Parallel Processing VADA Lab..
Husheng Li, UTK-EECS, Fall  Study how to implement the LTI discrete-time systems.  We first present the block diagram and signal flow graph. 
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Area: VLSI Signal Processing.
LIST OF EXPERIMENTS USING TMS320C5X Study of various addressing modes of DSP using simple programming examples Sampling of input signal and display Implementation.
Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College
ELEC692 VLSI Signal Processing Architecture Lecture 2 Pipelining and Parallel Processing.
Implementing and Optimizing a Direct Digital Frequency Synthesizer on FPGA Jung Seob LEE Xiangning YANG.
Folding Technique: Compromising in Special Purpose Hardware Design
ELEC692 VLSI Signal Processing Architecture Lecture 3
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Exploiting Parallelism
Structures for Discrete-Time Systems
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Synchronous Sequential Logic A digital system has combinational logic as well as sequential logic. The latter includes storage elements. feedback path.
CS 61C: Great Ideas in Computer Architecture Sequential Elements, Synchronous Digital Systems 1 Instructors: Vladimir Stojanovic & Nicholas Weaver
Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
EENG 751 3/16/ EENG 751: Signal Processing I Class # 9 Outline Signal Flow Graph Implementation l Fundamentals l System Function l Graph Construction.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
L9 : Low Power DSP Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
1 VLSI Algorithm & Computing Structures Chapter 1. Introduction to DSP Systems Younglok Kim Dept. of Electrical Engineering Sogang University Spring 2007.
Chapter 4 Structures for Discrete-Time System Introduction The block diagram representation of the difference equation Basic structures for IIR system.
Pipelining and Retiming 1
Embedded Systems Design
By: Mohammadreza Meidnai Urmia university, Urmia, Iran Fall 2014
Serial Multipliers Prawat Nagvajara
لجنة الهندسة الكهربائية
ELEC 7770 Advanced VLSI Design Spring 2012 Retiming
Multiplier-less Multiplication by Constants
Pipeline Principle A non-pipelined system of combination circuits (A, B, C) that computation requires total of 300 picoseconds. Comb. logic.
VLSI Programming 2IMN35 Lab 1 Questionnaire
ELEC 7770 Advanced VLSI Design Spring 2016 Retiming
Fixed-point Analysis of Digital Filters
Presentation transcript:

1 ECE734 VLSI Arrays for Digital Signal Processing Chapter 3 Parallel and Pipelined Processing

2 ECE734 VLSI Arrays for Digital Signal Processing (C) by Yu Hen Hu Basic Ideas Parallel processingPipelined processing a1a2a3a4 b1b2b3b4 c1c2c3c4 d1d2d3d4 a1b1c1d1 a2b2c2d2 a3b3c3d3 a4b4c4d4 P1 P2 P3 P4 P1 P2 P3 P4 time Colors: different types of operations performed a, b, c, d: different data streams processed Less inter-processor communication Complicated processor hardware time More inter-processor communication Simpler processor hardware

3 ECE734 VLSI Arrays for Digital Signal Processing (C) by Yu Hen Hu Data Dependence Parallel processing requires NO data dependence between processors Pipelined processing will involve inter-processor communication P1 P2 P3 P4 P1 P2 P3 P4 time

4 ECE734 VLSI Arrays for Digital Signal Processing (C) by Yu Hen Hu Usage of Pipelined Processing By inserting latches or registers between combinational logic circuits, the critical path can be shortened. Consequence: –reduce clock cycle time, –increase clock frequency. Suitable for DSP applications that have (infinity) long data stream. Method to incorporate pipelining: Cut-set retiming Cut set: –A cut set is a set of edges of a graph. If these edges are removed from the original graph, the remaining graph will become two separate graphs. Retiming: –The timing of an algorithm is re-adjusted while keeping the partial ordering of execution unchanged so that the results correct

5 ECE734 VLSI Arrays for Digital Signal Processing (C) by Yu Hen Hu Graphic Transpose Theorem The transfer function of a signal flow graph remain unchanged if –The directions of each arc is reversed –The input and output labels are switched. z1z1 z1z1 x[n] y[n] h[2] h[1] h[0] z1z1 z1z1 y[n] x[n] h[2] h[1] h[0] u[n] = ?

6 ECE734 VLSI Arrays for Digital Signal Processing (C) by Yu Hen Hu Data broadcast structure Algorithm transform may lead to pipelined structure without adding additional delays. Given a FIR filter SFG Critical path T M +2T A Use graph transposition theorem: –Reverse all arcs –Reverse input/output We obtain Critical path T M + T A No additional delay added!

7 ECE734 VLSI Arrays for Digital Signal Processing (C) by Yu Hen Hu Fine-grain pipelining To further reduce T M. Critical Path = Max {T M1, T M2, T A }

8 ECE734 VLSI Arrays for Digital Signal Processing (C) by Yu Hen Hu Block Processing One form of vectorized parallel processing of DSP algorithms. (Not the parallel processing in most general sense) Block vector: [x(3k) x(3k+1) x(3k+2)] Clock cycle: can be 3 times longer Original (FIR filter): Rewrite 3 equations at a time: Define block vector Block formulation:

9 ECE734 VLSI Arrays for Digital Signal Processing (C) by Yu Hen Hu Block Processing

10 ECE734 VLSI Arrays for Digital Signal Processing (C) by Yu Hen Hu General approach for block processing

11 ECE734 VLSI Arrays for Digital Signal Processing (C) by Yu Hen Hu Original formulation: Rewrite Define block vectors Then Block Processing for IIR Digital Filter Time indices –n: sampling period –k: clock period (processor) –k = 2n Note: –Pipelining: clock period = sampling period. Block (parallel): clock period not equal to sampling period.

12 ECE734 VLSI Arrays for Digital Signal Processing (C) by Yu Hen Hu Block IIR Filter D D S/PP/S + +   x(2k) x(2k+1) y(2k+1) y(2k) x(n)y(n) y(2(k  1)) y(2(k  1)+1)

13 ECE734 VLSI Arrays for Digital Signal Processing (C) by Yu Hen Hu Timing Comparison Pipelining Block processing 1234 x(1)x(2)x(3)x(4) y(1)y(2)y(3)y(4) x(1)x(2)x(3)x(4)x(5)x(6)x(7) MAC y(1)y(2)y(3)y(4)y(5)y(6)y(7) Add a y(1) Mul x(2)x(4)x(6)x(8) x(1)x(3)x(5)x(7)