Design and Synthesis of Image Processing Systems using Reconfigurable Dataflow Graphs Mainak Sen and Shuvra S. Bhattacharyya Department of Electrical and.

Slides:



Advertisements
Similar presentations
Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.
Advertisements

Chapter 4 Loops Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved
Requirements Engineering Processes – 2
Adders Used to perform addition, subtraction, multiplication, and division (sometimes) Half-adder adds rightmost (least significant) bit Full-adder.
Analysis of Computer Algorithms
Distributed Systems Architectures
Chapter 10 Architectural Design.
Chapter 7 System Models.
Requirements Engineering Process
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
Processes and Operating Systems
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.
Balanced Device Characterization. Page 2 Outline Characteristics of Differential Topologies Measurement Alternatives Unbalanced and Balanced Performance.
1 Introducing the Specifications of the Metro Ethernet Forum MEF 19 Abstract Test Suite for UNI Type 1 February 2008.
Objectives To introduce software project management and to describe its distinctive characteristics To discuss project planning and the planning process.
Scalable Routing In Delay Tolerant Networks
1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.
Programming Language Concepts
Robust Window-based Multi-node Technology- Independent Logic Minimization Jeff L.Cobb Kanupriya Gulati Sunil P. Khatri Texas Instruments, Inc. Dept. of.
Announcements Homework 6 is due on Thursday (Oct 18)
Figure 12–1 Basic computer block diagram.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 5 Slide 1 Project management.
Chapter 11: Models of Computation
Turing Machines.
Database Performance Tuning and Query Optimization
Testing Workflow Purpose
1 Quality of Service Issues Network design and security Lecture 12.
Chapter 1 Object Oriented Programming 1. OOP revolves around the concept of an objects. Objects are created using the class definition. Programming techniques.
Other Gate Types COE 202 Digital Logic Design Dr. Aiman El-Maleh
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Use Case Diagrams.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
Copyright  2003 Dan Gajski and Lukai Cai 1 Transaction Level Modeling: An Overview Daniel Gajski Lukai Cai Center for Embedded Computer Systems University.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Hardware/Software Codesign.
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Executional Architecture
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Improved Census Transforms for Resource-Optimized Stereo Vision
PSSA Preparation.
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 13 Slide 1 Application architectures.
From Model-based to Model-driven Design of User Interfaces.
L6:CSC © Dr. Basheer M. Nasef Lecture #6 By Dr. Basheer M. Nasef.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
DATAFLOW PROCESS NETWORKS Edward A. Lee Thomas M. Parks.
Scheduling for Embedded Real-Time Systems Amit Mahajan and Haibo.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Dataflow Process Networks Lee & Parks Synchronous Dataflow Lee & Messerschmitt Abhijit Davare Nathan Kitchen.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
Router modeling using Ptolemy Xuanming Dong and Amit Mahajan May 15, 2002 EE290N.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz
Voicu Groza, 2008 SITE, HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Hardware/Software Codesign of Embedded Systems Voicu Groza SITE Hall, Room.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
University of Maryland at College Park Smart Dust Digital Processing, 1 Digital Processing Platform Low power design and implementation of computation.
Marilyn Wolf1 With contributions from:
The Dataflow Interchange Format (DIF): A Framework for Specifying, Analyzing, and Integrating Dataflow Representations of Signal Processing Systems Shuvra.
John Ford2, Andrew Harris3, and Shuvra S. Bhattacharyya1
Introduction to cosynthesis Rabi Mahapatra CSCE617
Digital Processing Platform
Presentation transcript:

Design and Synthesis of Image Processing Systems using Reconfigurable Dataflow Graphs Mainak Sen and Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland at College Park Maryland DSPCAD Research Group November 22, 2005 Leiden University, The Netherlands

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 2 Outline Dataflow-based model of computation for modeling the behavior of DSP applications Decidable dataflow models Example: use of decidable dataflow as a model of computation for modeling the mapping of (decidable) dataflow behaviors onto embedded multiprocessors Structured reconfiguration of dataflow graphs Examples of meta-modeling techniques that can be classified as structured, reconfigurable dataflow Parameterized dataflow and its application to SDF Homogeneous-parameterized dataflow and its application to SDF and CSDF Experiments on a gesture recognition application Summary

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 3 Dataflow-based design for DSP (Example from Agilent ADS tool)

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 4 DSP-oriented Dataflow Models of Computation Used widely in design tools for DSP Application is modeled as a directed graph Nodes (actors) represent functions Edges represent communication channels between functions Nodes produce and consume data from edges Edges buffer data in FIFO (first-in first-out) fashion Data-driven execution model A node can execute whenever it has sufficient data on its input edges The order in which nodes execute is not part of the specification The order is typically determined by the compiler, the hardware, or both Iterative execution Body of loop to be iterated a large or infinite number of times

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 5 Dataflow Features and Advantages Exposes coarse-grain parallelism. Exposes high-level structure that facilitates analysis, verification, and optimization. Captures multi-rate behavior. Complementary to ongoing advances in DSP compiler technology for procedural languages, such as C and MATLAB. Encourages desirable software engineering practices: modularity and code reuse Amenable also to aspect-oriented design. Intuitive to DSP algorithm designers: signal flow graphs.

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 6 Evolution of Dataflow Models for DSP Synchronous dataflow: static multirate behavior Agilent ADS, Cadence SPW, etc. Well-behaved dataflow: schemas for bounded dynamics Boolean/integer dataflow: Turing complete models Multidimensional synchronous dataflow: image and video Scalable synchronous dataflow: block processing Synopsys COSSAP Cyclo-static dataflow: phased behavior Synopsys El Greco, Eonic Systems Virtuoso Synchro, System Canvas Bounded dynamic dataflow : bounded dynamics The processing graph method: reconfigurable dynamic DF US Naval Research Laboratory, MCCI Autocoding Toolset Parameterized dataflow: dynamically-reconfigurable static DF Blocked dataflow: image and video in terms of reconfigurable dataflow

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 7 Modeling Design Space E x p r e s s i v e p o w e r Verification / synthesis power X C, BDF, DDF X SDF X CSDF X CSDF, SSDF MDSDF, WBDF X X PSDF X PCSDF (Third dimension: simplicity and intuitive appeal)

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 8 Decidable Dataflow Models Modeling flow for representing static flowgraph behavior: Cyclo-static dataflow (CSDF), multiphase modeling Synchronous dataflow (SDF), multirate modeling Homogeneous synchronous dataflow (HSDF) Acyclic homogeneous synchronous dataflow (task graphs) These are in decreasing order or generality Designs represented in the more general models can be converted to equivalent representations in the less general ones e.g., CSDF SDF HSDF task graph HSDF: each actor (graph node) produces/consumes exactly one data value to/from each incident output/input edge Suitable for exposing parallelism Not the best model for minimizing memory requirements

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 9 Synthesis Techniques for Decidable Models Static scheduling: low overhead, predictability Performance analysis through synchronization graphs Loop scheduling Implicit repetition in the dataflow graph (through changes in sample rate) needs to be translated into explicit repetition in the form of loops on the execution target. Complex design space exists for such translation Complementary to procedural language techniques for nested loop compilation Loop scheduling techniques Simulation speedup (minimization of scheduling complexity) Code/data minimization Hierarchical parallel scheduling Block processing Task scheduling for latency/throughput optimization Probabilistic design: exploiting tolerances to deadline misses

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 10 Example: Intermediate representations for synthesis from decidable dataflow models Consider a decidable dataflow behavior that is to be implemented on a self-timed, embedded multiprocessor Natural way to implement DSP multiprocessors from decidable dataflow Actor assignment and ordering are performed statically Invocation (dispatch) of actors is performed dynamically, through synchronization Candidate mappings of the behavior onto the architecture can be represented through an intermediate representation that also has decidable dataflow semantics This representation is useful for understanding the performance, communication overhead, and synchronization structure associated with the candidate mapping Facilitates the separation of communication and synchronization functionality This is a useful modeling methodology for design space exploration

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 11 Interprocessor Communication Graph (G ipc ) 2r 1 4s 1 4s 2 4s 3 5s 1 7r 1 8r 1 9r IPC Graph Every edge (v i, v j ) induces the precedence constraint Self-Timed Schedule Proc 1: (1, 2, 3, 4, 6) Proc 2: (5, 7, 8) Proc 3: (9) Proc 1 Proc 2 Proc 3 Self-timed schedule and its IPC graph

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 12 The synchronization graph G s Derived from the interprocessor communication graph Synchronization edges are distinguished from interprocessor communication (IPC) edges Synchronization edges represent precedence constraints that are enforced by synchronization protocols IPC edges represent data transfers Interprocessor connections Coincident synchronization and IPC edges communication together with synchronization protocol (conventional approach) IPC edge only communication without synch. protocol Synchronization edge only synchronization protocol only

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 13 Applications of Synchronization Graphs Simulation Throughput estimation through cycle mean analysis Removal of redundant synchronizations Resynchronization Conversion to more efficient synchronization protocols (strongly connected synchronization graphs) Statically determining and minimizing the sizes of interprocessor communication buffers All are post-processing methods that can be applied to improve a wide range of existing task graph scheduling techniques on a wide range of multiprocessor architectures. These techniques benefit from good execution time estimates, but do not depend on exact execution time values to deliver useful results.

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 14 Beyond Decidable Models Limited expressive power: DSP applications increasingly employ high-level dynamics in their behavior User interface functionality Mode changes Adaptive algorithms Reconfiguration of processing resources/parameters However, key subsystems still exhibit large amounts of quasi- static structure --- structure that stays fixed across significant windows of time. Various dynamic dataflow models have been proposed that address the limitation above by abandoning most or all restrictions related to decidable dataflow However, these methods are correspondingly limited in their ability to exploit the quasi-static structure described above

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 15 Parameterized Dataflow: Structured Control of Dynamic Parameters The Key discipline that is imposed on reconfiguration is that each subsystem must have a consistent view of each of its actors (hierarchical or primitive) throughout any given iteration of that subsystem.

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 16 Parameterized Dataflow Hierarchical modeling subsystem parent graph subinitinit body parameter n,... writes n reads n Parameterized DF subsystem is composed of 3 parmeterized DF graphs: init, subinit, body Subsystem parameters configured in init/subinit, used in body Dynamically reconfigurable

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 17 Meta-modeling with parameterized dataflow Parameterized dataflow can be applied to any dataflow model of computation (base model) to augment that model with dynamic reconfiguration capabilities in a structured way Provides for efficient quasi-static scheduling Enables execution to be viewed in terms of a sequence of dataflow graphs in the base model Parameterized dataflow + XYZ Parameterized XYZ Examples of parameterized dataflow models of computation that we are developing and experimenting with parameterized synchronous dataflow (PSDF) parameterized cyclo-static dataflow (PCSDF)

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 18 Parameterized Synchronous Dataflow (PSDF) Locally synchrony conditions can be formulated and checked in a quasi-static fashion to ensure that bounded token production and consumption along with bounded delays lead to bounded memory requirements overall. This is not true of unstructured dynamic dataflow models, such as general dynamic dataflow, boolean dataflow, and bounded dynamic dataflow Techniques for construction of streamlined looped schedules for synchronous dataflow graphs have natural and efficient extensions to the construction of parameterized looped schedules for PSDF graphs.

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 19 PSDF Example: CD to DAT Conversion initChild setFac (sets i 1,…d 4 ) CD PF1 1 1 d 1 i 4 i1 i3 d 2 d 4 i 2 d 3 PF2 preamble PF3 PF4 DAT params i 1, d 1, …., i 4, d 4 init body repeat 5 times { fire setFac /* sets i 1, d 1, i 2, d 2, i 3, d 3, i 4, d 4 */ int _g1 = gcd(i 1, d 2 ); int _g2=gcd((i 2 x i 1 )/_g1, d 3 ) int _g3=gcd((i 3 x i 2 x i 1 )/(_g2 x _g1), d 4 ); repeat (d 4 /_g3) times { repeat (d 3 /_g2) times { repeat (d 2 /_g1) times { repeat (d 1 ) times {fire CD} fire PF1 } repeat (i 1 /_g1) times {fire PF2} } repeat ((i 2 x i 1 )/(_g2 x _g1)) times {fire PF3} } repeat ((i 3 x i 2 x i 1 )/(_g3 x _g2 x _g1)) times { fire PF4 } repeat (i 4 ) times {fire DAT} }

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 20 PSDF Example: Speech Compression

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 21 PCSDF Version of Speech Compression

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 22 Outline Dataflow-based model of computation for modeling the behavior of DSP applications Decidable dataflow models Example: use of decidable dataflow as a model of computation for modeling the mapping of (decidable) dataflow behaviors onto embedded multiprocessors Structured reconfiguration of dataflow graphs Examples of meta-modeling techniques that can be classified as structured, reconfigurable dataflow Parameterized dataflow and its application to SDF Homogeneous-parameterized dataflow and its application to SDF and CSDF Experiments on a gesture recognition application Summary

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 23 Homogeneous Parameterized Dataflow (HPDF) Parameterized dataflow model that can encapsulate dynamicity of application. Meta-modeling technique. Hierarchical actors can have any other underlying dataflow model (SDF, CSDF, PSDF etc.) Data production & consumption rates though dynamic are equal across an edge for a large number of applications - thus the name homogeneous. Reconfiguration can be performed without introducing hierarchy when more natural to do so (advantage over parameterized dataflow). Parameterized dataflow is a more powerful technique and thus can be used to represent a wider set of applications.

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 24 Applications Applications with dynamic run-time data and aggregated final-stage processes perform especially well for HPDF over SDF semantics. Many applications in image and speech processing seem well suited for our model. We applied the model on two applications – - A real-time video processing algorithm for smart camera developed at Princeton - A face detection algorithm developed at CFAR labs in UMD.

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 25 Application characteristics ABMN Dynamic but balanced amount of data Aggregating final-stage This structure seems to be abundant in many audio/video applications. Our HPDF model is a natural fit for applications with the above structure.

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 26 Gesture recognition algorithm Real-time video processing for gesture recognition. Does low-level (red oval) and high-level processing. Low-level processing recognizes body parts and identifies movements. High-level processing recognized actions. We concentrate on low-level processing. Ref : W. Wolf, B. Ozer, T. LV. Smart cameras as embedded systems. IEEE Computer Magazine Vol 35, Iss 9, Sept 2002, Pages 48-53

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 27 HPDF model of gesture recognition algorithm Region finding Contour following Ellipse Fitting Graph Matching Dynami c data Aggregating final-stage Dynami c data n p Ptolemy II implementation

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 28 Modeling with HPDF/CSDF VIDEO INPUT REGION EXTRACTION CONTOUR FOLLOWING (s 1) (s 1) (X i, Y i ) ELLIPSE FITTING ( I 0,I k i ) (n 1) MATCH p (p i 1, q i 0) p phases with 1 token and (n-p) phases with 0 token production #phases = #pixels = s

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 29 Integrating HPDF and CSDF Number of phases in a fundamental period can vary dynamically. Number of tokens produced or consumed in a given phase can also vary dynamically. HPDF constraint: the total number of tokens produced by a source actor of a given edge in a given invocation (a fundamental period) must equal the total number of tokens consumed by the sink in its corresponding invocation.

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 30 Each frame has 384x240 pixels, so we model the input as a CSDF actor with = s phases. Model captures pixel level parallelism present in Region. It also captures the frame level parallelism through the number of phases in Input (s). Finer granularity and Input modeling VIDEO INPUT REGION EXTRACTION (s 1) #phases = #pixels = s

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 31 Modeling dynamicity - Contour 2 phases for Contour First one scans until finds a contour. Output = 0 tokens Second one follows this contour and all the overlapping ones. Output = k i tokens, each token is a list of pixels from a contour Homogeneous condition remains: =s

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 32 Scheduling VRCEM (s V)(s R)(2 I C)(n E)M (s VR)(2 I C)(n E)M

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 33 We applied HPDF to successfully model a face detection algorithm also. We developed a TI DSP implementation of the HPDF model of the gesture recognition algorithm. The application was run on a TMS320C64xx fixed point processor. When implemented with our HPDF model, the runtime was cycles. With a 40ns cycle period, execution time for the application was 0.86 sec. Results

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 34 Results (contd.) Scheduling overhead was minimal as imperatively highly streamlined quasi-static schedule was obtained. Worst case buffer size 642 Kb when the input images were 384X240 pixels. HPDF modeling suggested buffer reuse between the edges. Original C code had runtime of cycles, execution time was 1.11 sec with the same clock period of 40 ns. HPDF improved runtime by 23%. Efficient hardware code generation is being looked into using hardware synthesis framework developed in our research group.

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 35 Summary Dataflow-based model of computation for is attractive for modeling the behavior of DSP applications Decidable dataflow models are useful for exposing and exploiting static structure in synthesis tools for DSP Decidable dataflow models in conjunction with structured reconfigurable techniques allow for efficient handling of application dynamics Examples of structured, reconfigurable dataflow techniques that we discussed: Parameterized dataflow and its application to SDF Homogeneous-parameterized dataflow and its application to SDF and CSDF Experiments on a gesture recognition application Other examples include dynamic configuration of graph topologies, and blocked dataflow modeling.

University of Maryland at College Park Design and Synthesis of Image Processing Systems, 36 References B. Bhattacharya and S. S. Bhattacharyya. Parameterized dataflow modeling for DSP systems. IEEE Transactions on Signal Processing, 49(10): , October 2001 S. S. Bhattacharyya, R. Leupers, and P. Marwedel. Software synthesis and code generation for DSP. IEEE Transactions on Circuits and Systems --- II: Analog and Digital Signal Processing, 47(9): , September G. Bilsen, M. Engels, R. Lauwereins, and J. A. Peperstraete. Cyclo-static dataflow. IEEE Transactions on Signal Processing, 44(2): , February D. Ko and S. S. Bhattacharyya. Dynamic configuration of dataflow graph topology for DSP system design. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pages V-69-V-72, Philadelphia, Pennsylvania, March E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous dataflow programs for digital signal processing. IEEE Transactions on Computers, February S. Neuendorffer and E. Lee. Hierarchical reconfiguration of dataflow models. In Proceedings of the International Conference on Formal Methods and Models for Codesign, June M. Sen, S. S. Bhattacharyya, T. Lv, and W. Wolf. Modeling image processing systems with homogeneous parameterized dataflow graphs. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pages V-133-V-136, Philadelphia, Pennsylvania, March 2005