Copyright  1995-1999 SCRA 1 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Scheduling and Assignment for.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

ADSP Lecture2 - Unfolding VLSI Signal Processing Lecture 2 Unfolding Transformation.
ECE 667 Synthesis and Verification of Digital Circuits
1 ECE734 VLSI Arrays for Digital Signal Processing Chapter 3 Parallel and Pipelined Processing.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Chapter 4 Retiming.
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
VLSI Communication SystemsRecap VLSI Communication Systems RECAP.
Sequential Timing Optimization. Long path timing constraints Data must not reach destination FF too late s i + d(i,j) + T setup  s j + P s i s j d(i,j)
Digital Signal Processing – Chapter 11 Introduction to the Design of Discrete Filters Prof. Yasser Mostafa Kadah
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
ELEC692 VLSI Signal Processing Architecture Lecture 4
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
Digital Kommunikationselektronik TNE027 Lecture 4 1 Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic analog.
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
VLSI DSP 2008Y.T. Hwang3-1 Chapter 3 Algorithm Representation & Iteration Bound.
ELEC692 VLSI Signal Processing Architecture Lecture 6
ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Retiming.
Algorithmic Transformations
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
1 Real time signal processing SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
GPGPU platforms GP - General Purpose computation using GPU
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
May 2004 Department of Electrical and Computer Engineering 1 ANEW GRAPH STRUCTURE FOR HARDWARE- SOFTWARE PARTITIONING OF HETEROGENEOUS SYSTEMS A NEW GRAPH.
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
Chapter 6 Digital Filter Structures
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
L7: Pipelining and Parallel Processing VADA Lab..
High Performance Scalable Base-4 Fast Fourier Transform Mapping Greg Nash Centar 2003 High Performance Embedded Computing Workshop
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Modern VLSI Design 3e: Chapters 3 & 8Partly from 2002 Prentice Hall PTR week6-1 Lectures 16 Transfer Characteristics (Delay and Power) Feb. 10, 2003.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.
1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
ELEC692 VLSI Signal Processing Architecture Lecture 2 Pipelining and Parallel Processing.
ELEC692 VLSI Signal Processing Architecture Lecture 3
Static Process Scheduling
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
Basic VHDL RASSP Education & Facilitation Module 10 Version 2.02 Copyright  RASSP E&F All rights reserved. This information is copyrighted by.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Carnegie Mellon Lecture 8 Software Pipelining I. Introduction II. Problem Formulation III. Algorithm Reading: Chapter 10.5 – 10.6 M. LamCS243: Software.
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
L9 : Low Power DSP Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
1 Chapter 1 Basic Structures Of Computers. Computer : Introduction A computer is an electronic machine,devised for performing calculations and controlling.
DSP Design – Lecture 7 Unfolding cont. & Folding Fredrik Edman fredrik
Register Transfer Specification And Design
VLSI Testing Lecture 5: Logic Simulation
By: Mohammadreza Meidnai Urmia university, Urmia, Iran Fall 2014
Multiplier-less Multiplication by Constants
Zhongguo Liu Biomedical Engineering
Real time signal processing
Presentation transcript:

Copyright  SCRA 1 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Scheduling and Assignment for DSP RASSP Education & Facilitation Program Module 22 Version Copyright SCRA All rights reserved. This information is copyrighted by the SCRA, through its Advanced Technology Institute (ATI), and may only be used for non-commercial educational purposes. Any other use of this information without the express written permission of the ATI is prohibited. Certain parts of this work belong to other copyright holders and are used with their permission. All information contained, may be duplicated for non-commercial educational use only provided this copyright notice and the copyright acknowledgements herein are included. No warranty of any kind is provided or implied, nor is any liability accepted regardless of use. The United States Government holds “Unlimited Rights” in all data contained herein under Contract F C Such data may be liberally reproduced and disseminated by the Government, in whole or in part, without restriction except as follows: Certain parts of this work to other copyright holders and are used with their permission; This information contained herein may be duplicated only for non-commercial educational use. Any vehicle, in which part or all of this data is incorporated into, shall carry this notice.

Copyright  SCRA 2 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Rapid Prototyping Design Process Scheduling/Assignment SYSTEM DEF. FUNCTION DESIGN HW & SW PART. HW DESIGN SW DESIGN HW FAB SW CODE INTEG. & TEST VIRTUAL PROTOTYPE RASSP DESIGN LIBRARIES AND DATABASE Primarily software Primarily hardware HW & SW CODESIGN

Copyright  SCRA 3 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Goals l To present a state of the art review of scheduling and assignment methodologies for DSP l Examine and study the impact of these methods on RASSP design environments l Consolidate understanding through examples

Copyright  SCRA 4 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Outline l Introduction l FSFG as a Representation of DSP Algorithms l Measures of Performance l Retiming l Iteration Period and Iteration Period Bounds l Formal Methods for FSFGs l Perfect-rate FSFGs and Unfolding l Lookahead l Scheduling Approaches l Optimal Scheduling and Assignment Considering communication Delays and Finite Resources l Summary

Copyright  SCRA 5 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Outline l Introduction l FSFG as a Representation of DSP Algorithms l Measures of Performance l Retiming l Iteration Period and Iteration Period Bounds l Formal Methods for FSFGs l Perfect-rate FSFGs and Unfolding l Lookahead l Scheduling Approaches l Optimal Scheduling and Assignment Considering Communication Delays and Finite Resources l Summary

Copyright  SCRA 6 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Introduction l Graphical capture of algorithms via block- oriented environments is equivalent to a fully- specified flow graph representation of the timing l A formal approach to scheduling allows the transformation of the timing graphs to optimize execution times, processor utilization, and sample periods l Most commercial and research tools begin with the flow graph methodology that follows but only recently has communications and I/O been included within the cost constraints

Copyright  SCRA 7 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Outline l Introduction l FSFG as a Representation of DSP Algorithms l Measures of Performance l Retiming l Iteration Period and Iteration Period Bounds l Formal Methods for FSFGs l Perfect-rate FSFGs and Unfolding l Lookahead l Scheduling Approaches l Optimal Scheduling and Assignment Considering Communication Delays and Finite Resources l Summary

Copyright  SCRA 8 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Scheduling and Assignment for DSP l Common representation of DSP algorithms 1. Language-driven executable specification 2. Graphics/flow graph-driven specification l Option 2 is preferred by most optimization algorithms l Goals of optimization m Provide efficient implementation with respect to sampling rate, clock period, latency, area, power, etc., of a given algorithm m Provide modular and scaleable implementation capable of being upgraded in performance and functionality m Include communications cost and cost of memories and other peripherals

Copyright  SCRA 9 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Difference Between DSP and General-purpose Schedulers l DSP schedules are optimized over multiple iterations of data (essentially a semi-infinite stream of data); GP schedules are executed only once l Real-time and memory/buffer constraints exist for DSP l DSP uses numerically intensive arithmetic with high data bandwidths (most arithmetic is deterministic) l Library-based synthesis and design procedure is suitable for DSP applications

Copyright  SCRA 1010 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP FSFG Used to Represent DSP Functional/Timing l A convolution - A fine-grain depiction l IRST - A coarse-grain depiction h(0)h(1)h(2) x(1) x(0) y(1) y(0) DD x(.) = data input y(.) = data output h(.) = coefficients D = unit delays Velocity Filter Scene Register Scene Segment Clutter Filter Restore Registration Video In D FSFG = Fully Specified Flow Graph (Data/Control Flow Graph)

Copyright  SCRA1 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Example: FSFG for an Adaptive Filter Z  x[n] x[n-N+1] h[0] h[1] h[2] h[N-2] h[N-1] d[n] y[n] Feedback d[n] = desired signal e[n] = error signal

Copyright  SCRA 1212 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Second Order IIR Filter x[n] y[n] Nodes 5, 7, 9, 11 = Multipliers (computational delay) Nodes 1, 3, 4, 8 = Adders (computational delay) Nodes 2, 6, 10 = Data Routers (zero delay)

Copyright  SCRA 1313 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Outline l Introduction l FSFG as a Representation of DSP Algorithms l Measures of Performance l Retiming l Iteration Period and Iteration Period Bounds l Formal Methods for FSFGs l Perfect-rate FSFGs and Unfolding l Lookahead l Scheduling Approaches l Optimal Scheduling and Assignment Considering Communication Delays and Finite Resources l Summary

Copyright  SCRA 1414 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Performance Measures l Sample period = average time between arrival/acceptance of two successive data samples l Latency = the time for the DSP to produce a result y[0] in response to input x[0] l In DSP applications, one usually considers minimization of sample period to be more important than latency DSP x[2] x[1] x[0] y[2] y[1] y[0] Latency Sample Period Input Data Output Data Time

Copyright  SCRA 1515 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Fully Specified Flow Graph l FSFG = m V i - set of computational nodes m E i - set of directed edges (implying data precedence) m D i - initial assignment of delays l Goal Transform Improve Performance without changing functionality i = initial f = final

Copyright  SCRA 1616 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Clock Period D x[n] 2 units 1 unit clock 2 units y[n] Minimum clock period = = 3 units D x[n] 2 units clock 1 unit clock 2 units y[n] D Minimum clock period = 2 units Minimum Clock Period = Computational delay of the longest zero-delay path

Copyright  SCRA 1717 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Clock Period (Cont.) l Formally where m T(u) = computational latency (in wall-clock time) of node u m p = path of edges m P l = set of paths with zero delays (purely combinatorial) m T c = smallest clock period Goal: Optimize the clock period T C = max T(p), p  P l

Copyright  SCRA 1818 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Outline l Introduction l FSFG as a Representation of DSP Algorithms l Measures of Performance l Retiming l Iteration Period and Iteration Period Bounds l Formal Methods for FSFGs l Perfect-rate FSFGs and Unfolding l Lookahead l Scheduling Approaches l Optimal Scheduling and Assignment Considering Communication Delays and Finite Resources l Summary

Copyright  SCRA 1919 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Cutset Definition l A Cutset consists of the minimal set of edges that divide an FSFG into two disjoint parts  D-D A A B B C C A-A, B-B, D-D are valid cutsets C-C is not a valid cutset Edge 1-2 and edge 4-5 are directed in opposite directions with respect to cutset A-A

Copyright  SCRA 2020 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Time Scaling l All delays in an FSFG can be scaled by a positive number “s” without changing the functionality (except the input/output rates) l Scaling is useful (as will be seen) but leads to loss in efficiency D x[2] x[1] x[0] y[2] y[1] y[0] 2D x[2] - x[1] - x[0] y[1] _ y[0] c1 SCALING spacing in the input

Copyright  SCRA 2121 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Nodal Transfer vu 3D D 2D D D r(u) = number of delays REMOVED from EACH input arc and moved to each output arc (can be +ve or -ve) vu 2D 4D Edge e r(v) = 1 r(u) = -1 4 = (-1) D f (e) = D i (e) + r(v) - r(u) l Same functionality, different timing - RETIMING

Copyright  SCRA2 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Legal Retiming l D f (e) >= 0 for all edges l D f (e) >= 1 for a SYSTOLIC FSFG m Systolic FSFGs have minimum clock period, or the highest clock frequency l Total number of delays in a loop remains constant with and without retiming (no loop can have a zero delay)  2D D r(1) = 1  D 2D r(1) = 0 r(2) = 1 r(3) = 0 Retiming

Copyright  SCRA 2323 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Delay Transfer Across Cutsets  2D 4D D 2D D A A  3D D A A Delay Transfer l Remove k delays from positive edges and move them to negative edges (k can be positive or negative) l No algorithmic change in behavior

Copyright  SCRA 2424 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Results from Retiming l Theorem I: A systolic FSFG can always be realized l Theorem II: If the number of delays in a loop is less than the number or arcs, then time scaling is required for obtaining systolic schedules l Theorem III: The delays in a retimed circuit can be minimized by setting up the following Linear Integer Program for r(v) l Where e is a directed edge from v to u Minimize  r  v r(v) {Fanout(v) - Fanin(v)} subject to r(u) - r(v) <= D i (e) -1

Copyright  SCRA 2525 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Outline l Introduction l FSFG as a Representation of DSP Algorithms l Measures of Performance l Retiming l Iteration Period and Iteration Period Bounds l Formal Methods for FSFGs l Perfect-rate FSFGs and Unfolding l Lookahead l Scheduling Approaches l Optimal Scheduling and Assignment Considering Communication Delays and Finite Resources l Summary

Copyright  SCRA 2626 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Iteration Period and Delays in a Loop l Case 1 m Bound on iteration period = = (2+2+2)/3 = 3 units m Data sample can be accessed every 3 units (on average) l Case 2 m Bound on iteration period = = (2+2+2)/3 = 2 units  x[.] T(1) = 2 T(2) = 2 y[.] 2D T(3) = 2  x[.] T(1) = 2 T(2) = 2 y[.] 2D T(3) = 2 D Case 1 and Case 2 are different algorithms, though!

Copyright  SCRA 2727 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Iteration Period Bound l An FSFG is bounded below by the Iteration Period Bound (IPB) l Thus the IPB can be reduced by increasing the delays in the loops or by reducing the computational latency of the loops IPB = max Total computational latency of a loop Total number of delays in the loop all loops Scheduling and assignment efficiency is limited by loops in the DSP flow graph!

Copyright  SCRA 2828 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Retiming as a Mechanism to Reduce Area/Conserve Resources x[n] A A D D D D D Cycle Multiply Addition 1 5, , , Large area, low utilization Too many multiplies in one cycle Initial Schedule Re-timed Schedule [Madisetti95] D © IEEE 1995

Copyright  SCRA 2929 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Top-down Methodology for Scheduling DSP Algorithm Dataflow Specifications Transformations Estimation of Performance Scheduling Assignment Allocation Objective Functions Constraints Low PowerSmall Area High Clock Speed Small Latency

Copyright  SCRA 3030 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Partial Summary of Results l Flow graph transformations can result in improved efficiency of implementation for the same algorithm l Retiming cannot improve upon the bound, but it can help achieve the bound on the iteration bound l A formal approach is necessary for top down design, optimization, and implementation

Copyright  SCRA 3131 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Outline l Introduction l FSFG as a Representation of DSP Algorithms l Measures of Performance l Retiming l Iteration Period and Iteration Period Bounds l Formal Methods for FSFGs l Perfect-rate FSFGs and Unfolding l Lookahead l Scheduling Approaches l Optimal Scheduling and Assignment Considering Communication Delays and Finite Resources l Summary

Copyright  SCRA 3232 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP FSFG Transformation and Scheduling l Various transformations exist that lead to efficient implementation of flow graphs m Unfolding m Retiming m Lookahead m Loop shrinking l Approach m Step 1: Classify different types of schedules m Step 2: Try to find methods to construct these schedules, if they exist

Copyright  SCRA3 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Example  D  C 2D D IPB = 2 P1 P2 D1 A1 D2 A2 D3 A3 C1 B1 C2 B2 C3 B3 P1 P2 D1 A1 B1 C2 D3 A3 B3 C1 D2 A2 B2 C3 Processors P1, P P1 P2 D1 A1 B1 D2 A2 B2 D3 C1 C2 Time IP = 3 Time IP = 2 a) b) c) a) Non-overlapping static schedule b) Overlapping static schedule c) Cyclostatic schedule

Copyright  SCRA 3434 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Questions To Be Answered at This Point l Can every FSFG achieve the iteration period bound? l Can every FSFG achieve the IPB in addition to the processor bound? l Can transformations assist in achieving the IPB? l Are optimal schedules static or cyclostatic? l Can the IPB itself be reduced? l Can transformations reduce the processor bound?

Copyright  SCRA 3535 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Outline l Introduction l FSFG as a Representation of DSP Algorithms l Measures of Performance l Retiming l Iteration Period and Iteration Period Bounds l Formal Methods for FSFGs l Perfect-rate FSFGs and Unfolding l Lookahead l Scheduling Approaches l Optimal Scheduling and Assignment Considering Communication Delays and Finite Resources l Summary

Copyright  SCRA 3636 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP P1 P2 Perfect-rate FSFGs ACDEB D D D P1 P2 D1 C1 A1 D2 C2 A2 E1 B1 E2 B2 Time IP = IPB ACDEB D D A1 D1 C1 A2 D2 C2 B1 E1 B2 E2 Time IP = IPB Optimal static schedules are possible for PERFECT-RATE FSFGs

Copyright  SCRA 3737 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Perfect-rate FSFGs (Cont.) Conclusion l Because perfect-rate FSFGs have static rate- optimal schedules, can all FSFGs be transformed to PR-FSFGs? l Answer: Yes, all FSFGs can be transformed to PR-FSFGs m Note communication cost and processor cost is not optimal, only sample period is optimized

Copyright  SCRA 3838 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Unfolding l How does one convert FSFGs to perfect-rate FSFGs? l Answer: Through UNFOLDING! (method 1)

Copyright  SCRA 3939 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Example of Unfolding ABC D D B1C1A2 D A1B2C2 D Original FSFG Unfolded-by-2 FSFG By unfolding-by-2 we expose two iterations of the DSP algorithm at the same time

Copyright  SCRA 4040 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Optimum Unfolding l Unfold the flow graph by the LCM of the number of delays in each loop l Example: Unfold by 6 results in IP = 16 units on 4 processors AB D EDC D D D D A = 20 units B = 5 units C = 10 units D = 10 units E = 2 units IP = 20 units IPB = 16 units Note: Node A = 20 units which is larger than IPB

Copyright  SCRA 4141 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Outline l Introduction l FSFG as a Representation of DSP Algorithms l Measures of Performance l Retiming l Iteration Period and Iteration Period Bounds l Formal Methods for FSFGs l Perfect-rate FSFGs and Unfolding l Lookahead l Scheduling Approaches l Optimal Scheduling and Assignment Considering Communication Delays and Finite Resources l Summary

Copyright  SCRA 4242 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Method 2: Lookahead l Lookahead actually reduces the iteration period bound m Pipelining and unfolding reduce the iteration period, but do not affect the iteration period bound l Design Flow m Step 1: Check if sample period is satisfied by IPB m Step 2: Lookahead can be used to reduce IPB m Step 3: Retiming, unfolding, or pipelining can then be used to reduce the iteration period to that of the IPB

Copyright  SCRA 4343 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP What is Lookahead? Statement 1: x[n] = ax[n-1] + u[n] Statement 2: x[n+1] = ax[n] + u[n+1] Statement 2 cannot be executed until x[n] is computed Rewriting the equations again (with a lookahead of two) Statement 1: x[n] = bx[n-2] + au[n] + u[n-1] Statement 2: x[n+1] = a(ax[n-1]+u[n]) + u[n+1] = a 2 x[n-1] + au[n] + u[n-1] where b = a 2 Note that x[n+1] depends on x[n-1] and not on x[n] Therefore, these statements could be executed in parallel

Copyright  SCRA4 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP What is Lookahead? (Cont.) Read Ops x y z Multiply x y z Add 2 x y z Add 1 x y z Store State x y z Read Ops x y Multiply x y Store State x y Add x y First instruction Second Instruction begins after x[n] is ready Original first order Filter Filter with lookahead of 2 Note: Factor sample period due to increased lookahead Time

Copyright  SCRA 4545 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP What is Lookahead? (Cont.) l Lookahead reduces the IPB, but at the cost of additional hardware (adders and multipliers) l Lookahead adds delay (or memory) into the feedback loop m E.g., x[n] depends on x[n-2] => Two delays x[n] depends on x[n-1] => One delay l Iteration period bound = m Lookahead increases the denominator l Alternative Total Computational Latency in Loop Total Number of Delays A D A 10D Lookahead by 10

Copyright  SCRA 4646 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Flowchart l Step 1: Use Lookahead by M l Step 2: Retime to distribute delays l Step 3: Obtain a static rate-optimal schedule ABC MD ABC D D Flowchart for optimization with Lookahead and Retiming

Copyright  SCRA 4747 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Outline l Introduction l FSFG as a Representation of DSP Algorithms l Measures of Performance l Retiming l Iteration Period and Iteration Period Bounds l Formal Methods for FSFGs l Perfect-rate FSFGs and Unfolding l Lookahead l Scheduling Approaches l Optimal Scheduling and Assignment Considering Communication Delays and Finite Resources l Summary

Copyright  SCRA 4848 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Taxonomy of Scheduling Approaches SchedulingStaticSub-OptimalOptimalHeuristicApproximateDynamic Physically Non-Distrib Physically Distributed Non-CooperativeCooperativeSub-OptimalOptimalHeuristicApproximate

Copyright  SCRA 4949 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP FSFG for Second Order IIR Filter I1I1 I2I2 x[n] y[n] DDDD d_add = 1 unit time d_mult = 2 unit time Flow Graph Bound Definition Value  j  l d j Iteration Period IPB = max, l  L 3 Bound n l  j  l d j Iteration Period PB = 4 Bound IPB Period Delay Bound D i/o = max  j  p d j - n p IPB, r  P 3

Copyright  SCRA 5050 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Second Order Filter- A Cyclostatic Schedule Cycling Vector, C Period Matrix, S Iteration Tile Iteration Tile repeated in P x T space T P

Copyright  SCRA 5151 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Comparison of Several Scheduling Algorithms SchedulingIteration Period No. of Processors Throughput Delay Comm. Considered Single Iteration not optimal no Direct Blocking not optimal no Maximum- Spanning tree not optimaloptimalnot optimalno Optimum Unfolding optimizedoptimalnot optimalno Cyclo-Static optimal no CSPP optimal no Generalized PSSIMD optimal yes Scheduling- Range | Fixed rate Max TP optimized fixed optimal optimized not optimal yes DSMP-C1 optimal yes MULTIPROC optimal yes

Copyright  SCRA 5252 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Outline l Introduction l FSFG as a Representation of DSP Algorithms l Measures of Performance l Retiming l Iteration Period and Iteration Period Bounds l Formal Methods for FSFGs l Perfect-rate FSFGs and Unfolding l Lookahead l Scheduling Approaches l Optimal Scheduling and Assignment Considering Communication Delays and Finite Resources l Summary

Copyright  SCRA 5353 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Optimal Scheduling and Assignment l Focus on an optimal scheduling and assignment methodology l Optimize objective function subject to m Constraints

Copyright  SCRA 5454 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP IPB Model l Minimize IPB subject to  Precedence constraints: Node v precedes node w (arc v->w) t w -t v > d w -n vw IPB, for e(v,w)  E m Non-negativity constraints: IPB, t i > 0, for i = 1,..., N l For the second order IIR filter, this problem becomes m Minimize IPB m Subject to: q Precedence Constraints t 1 - t 3 > 1 t 2 - t 1 > 1 t 2 - t 4 > 1 t 4 - t 2 + IPB > 2 t 3 - t IPB > 2 t 8 - t 2 > 1 t 8 - t 7 > 1 t 7 - t 6 > 1 t 7 - t 5 > 1 t 6 - t 2 + IPB > 2 t 5 - t IPB > 2 q Non-negativity constraints t 1, t 2, t 3, t 4, t 5, t 6, t 7, t 8, IPB > 0

Copyright  SCRA5 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Finite Resources Model l Minimize the number of processors P and periodic throughput delay D i/o T  j=0 to U 2 j p j + D i/o l Subject to  Precedence constraints: Node v precedes node w (arc v-> w). t w - t v > d w - n vw IP, for e(v,w)  E where t i =  t=1 to T t x it  Job completion constraints: All nodes must be scheduled  t=1 to T x it = 1, for i = 1,..., N  Processor modulo and non-preemption constraints: At most P processors can be scheduled at a time  i=1 to N  Qn x iq <  j=0 to U 2 j p j, t=1,...,T n = 0,...IPB-1 where Qn = {q | q mod IP = n, for q = t to t+d i -1}  0-1 constraints: x it  {0,1} for i = 1,..., N and t = 1,...,T p j  {0,1} for j = 1,...,u

Copyright  SCRA 5656 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Adding Communications 12P-1P 12 P Originating PE j Destination PE k cp jk Originating PE j Destination PE k Edge e i 1P1P 1 2 P 1 2 E Communication paths Decision variable x ejk

Copyright  SCRA 5757 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Multi-processor System Period matrix, S x[n] y[n] DDDD d_add = 1 unit time d_mult = 1 unit time Second order IIR filter PE 1 PE 2 PE 3 PE c jk Communication network

Copyright  SCRA 5858 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Second Order IIR Filter Edge Comm. Path > > > > > > > > > > > 4 PE Nodes 1 3, 1 2 4, 2 3 6, 7 4 5, PE 1 PE 2 PE 4 PE 3 Edge mapping PE assignmentCommunications

Copyright  SCRA 5959 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Multi-processor System Period matrix, S x[n] y[n] DDDD d_add = 1 unit time d_mult = 1 unit time Second order IIR filter PE 1 PE 2 PE 3 PE c jk Communication network

Copyright  SCRA 6060 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Randomly Connected Cost (RCC) Model for Second Order IIR Filter Edge Comm. Path > > > > > > > > > > > PE 1 PE 2 PE 4 PE 3 Edge mapping Communications Comm. that increase the IP RCC Model = Randomly Connected Cost Model

Copyright  SCRA 6161 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Scheduled Communications for Second Order IIR Filter PE 1 PE 2 PE 3 PE 4 2->3,4,8.3->1 5->7 2->5,6 Inserted cycle T1A PE 1 PE 2 PE 3 PE 4 Inserted cycle T1B Route 2->8 through PE2 2->8 PE 1 PE 2 PE 3 PE 4 Inserted cycle T1C 2->8 PE 1 PE 2 PE 3 PE 4 4->2 6->7 Inserted cycle after T2

Copyright  SCRA 6262 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Including Memory and Registers j k Edge e i R1 R2 M1 M x1 x2 1 2 E +x+x RMRM PE 1 Network Decision variable

Copyright  SCRA 6363 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Modified FSFG for Second Order IIR Filter

Copyright  SCRA 6464 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Comparison of Processor Mapping Models Model Min Max Num Proc Network Multiple Reg & Increase IP of Comm Mem Fully connected cost Yes No Given Full No No Fully connected cost Yes Yes Given Full No No capacity Randomly connected Yes No Given Random No No cost Randomly connected Yes Yes Given Random No No cost & capacity Randomly connected Yes Yes Given Random Yes No cost & capacity with Multiple FU Randomly connected Yes Yes Given Random Yes Yes cost & capacity with Multiple FU & Memory & Reg Combined Scheduler Min IP No Given Random No No Combined Scheduler Min IP Yes Given Random Yes Yes with Multiple FU

Copyright  SCRA 6565 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Module Outline l Introduction l FSFG as a Representation of DSP Algorithms l Measures of Performance l Retiming l Iteration Period and Iteration Period Bounds l Formal Methods for FSFGs l Perfect-rate FSFGs and Unfolding l Lookahead l Scheduling Approaches l Optimal Scheduling and Assignment Considering Communication Delays and Finite Resources l Summary

Copyright  SCRA6 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Review of Approach to Scheduling DSP Algorithm Dataflow Specifications Transformations Estimation of Performance Scheduling Assignment Allocation Objective Functions Constraints Low PowerSmall Area High Clock Speed Small Latency

Copyright  SCRA 6767 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP Summary l A formal approach to recent results to the scheduling and assignment of non-ideal systems has been presented l This framework is very powerful and flexible for a variety of computational granularities found in signal processing l RASSP is currently developing a number of constraint-based schedulers

Copyright  SCRA 6868 Methodology Reinventing Electronic Design Architecture Infrastructure DARPA Tri-Service RASSP References [IEEE] All referenced IEEE material is used with permission. [Madisetti95] V.Madisetti, “VLSI Digital Signal Processors”, IEEE Press, 1995; © IEEE 1995 [Richards97] Richards, M., Gadient, A., Frank, G., eds. Rapid Prototyping of Application Specific Signal Processors, Kluwer Academic Publishers, Norwell, MA, 1997.