Courseware Force-Directed Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads,

Slides:



Advertisements
Similar presentations
ECE 667 Synthesis and Verification of Digital Circuits
Advertisements

Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Courseware Integer Linear Programming approach to Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark.
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Courseware Scheduling of Distributed Real-Time Systems Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Courseware List-Based Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads,
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Winter 2005ICS 252-Intro to Computer Design ICS 252 Introduction to Computer Design Lecture 5-Scheudling Algorithms Winter 2005 Eli Bozorgzadeh Computer.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Courseware Power-aware scheduling Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building 321.
Courseware Scheduling Uniprocessor Real-Time Systems Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Courseware Path-Based Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads,
Cpeg421-08S/final-review1 Course Review Tom St. John.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
Courseware Basics of Real-Time Scheduling Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building.
High Level Synthesis. Tasks of the Designer/Manager These ideas apply to the design itself, must be reproduced in the chip.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.
ICS 252 Introduction to Computer Design
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
Data Structures and Programming.  John Edgar2.
Article Title: Optimization model for resource assignment problems of linear construction projects ShuShun Liu & ChangJung Wang, National Yunlin University.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms Authous: Al’ecio P. D. Binotto, Carlos.
HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNES: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof.
A Unified Modeling Framework for Distributed Resource Allocation of General Fork and Join Processing Networks in ACM SIGMETRICS
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Lecture 1 Introduction Figures from Lewis, “C# Software Solutions”, Addison Wesley Richard Gesick.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
M. Balakrishnan Dept of Computer Science & Engg. I.I.T. Delhi
ELEC692 VLSI Signal Processing Architecture Lecture 3
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
Clustering by soft-constraint affinity propagation: applications to gene- expression data Michele Leone, Sumedha and Martin Weight Bioinformatics, 2007.
Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.
Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.
INFOMGP Student names and numbers Papers’ references Title.
1 Hardware-Software Co-Synthesis of Low Power Real-Time Distributed Embedded Systems with Dynamically Reconfigurable FPGAs Li Shang and Niraj K.Jha Proceedings.
HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof.
Architecture and algorithm for synthesizable embedded programmable logic core Noha Kafafi, Kimberly Bozman, Steven J. E. Wilton 2003 Field programmable.
SEMI-SYNTHETIC CIRCUIT GENERATION FOR TESTING INCREMENTAL PLACE AND ROUTE TOOLS David GrantGuy Lemieux University of British Columbia Vancouver, BC.
Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 117 Penalty and Barrier Methods General classical constrained.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Introduction to Algorithms: Brute-Force Algorithms.
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
Scheduling with Constraint Programming
Affiliation of presenter
Objective of This Course
STUDY AND IMPLEMENTATION
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
Architecture Synthesis
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

courseware Force-Directed Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building 321 DK2800 Lyngby, Denmark

SoC-MOBINET courseware[M-1] High-Level Synthesis2 Overview / Agenda  Motivation  Introduction  Defining problem and model  Solution  Results  Conclussion

SoC-MOBINET courseware[M-1] High-Level Synthesis3 Motivation  The mapping from a behavioral model to a physical Register-Transfer level description is a NP-complete problem.  Therefore an exhaustive comparison on all possible solutions is infeasible.  An algorithm finding an efficient mapping is required.

SoC-MOBINET courseware[M-1] High-Level Synthesis4 Introduction  The paper was written in  The interest on High-Level Languages like VHDL is increasing at that time.  Synthesis tools capable of generating a RTL realization from a behavioral model is needed.

SoC-MOBINET courseware[M-1] High-Level Synthesis5 Introduction implementation design specification Physical domain Mathematical domain specification create a model of the physical problem synthesis create an alogorithm to solve the problem implementation Transform the optimized model back to the physical domain

SoC-MOBINET courseware[M-1] High-Level Synthesis6 Introduction implementation design specification Physical domain Mathematical domain specification create a model of the physical problem synthesis create an alogorithm to solve the problem implementation Transform the optimized model back to the physical domain

SoC-MOBINET courseware[M-1] High-Level Synthesis7 Defining problem and model  The problem: For a given function, find an optimal solution. An optimal solution can be constrainted by area-, power-, performance- or flexibility requirements depending on the application.  The model: Scheduling : Determine for each operation the time at which it should be performed without violating any precedence contraints.

SoC-MOBINET courseware[M-1] High-Level Synthesis8 Solution Common Approach New Approach List Scheduling Force-Directed Scheduling Resources given Time contraints given Minimize delay Minimize required resources

SoC-MOBINET courseware[M-1] High-Level Synthesis9 Solution Common Approach New Approach List Scheduling Force-Directed Scheduling Resources given Time contraints given Minimize delay Minimize required resources

SoC-MOBINET courseware[M-1] High-Level Synthesis10 Force-Directed Scheduling The Force-Directed Scheduling approach reduces the amount of: Functional Units Registers Interconnect This is achieved by balancing the concurrency of operations to ensure a high utilization of each unit.

SoC-MOBINET courseware[M-1] High-Level Synthesis11 Force-Directed Scheduling The Force-Directed Scheduling algorithm consists of 3 steps: 1.Determine a time frame of each operation 2.Create a distribution graph 3.Calculate the force (a new metric)

SoC-MOBINET courseware[M-1] High-Level Synthesis12 Scheduling – An example  Solve the differential equation y’’ + 3zy’ + 3y = 0  This can be calculated using this iterative algorithm while(z < a) repeat zl := z + dz; ul := u – (3 · z · u · dz) – (3 · y · dz); yl := y + (u · dz); z := zl; u := ul; y := yl;

SoC-MOBINET courseware[M-1] High-Level Synthesis13 Scheduling – An example Data-Flow and Control-Flow Graphs ASAP SchedulingALAP Scheduling

SoC-MOBINET courseware[M-1] High-Level Synthesis14 Scheduling – An example Data-Flow and Control-Flow Graphs ASAP SchedulingALAP Scheduling

SoC-MOBINET courseware[M-1] High-Level Synthesis15 Scheduling – An example Step 1 : Determine a time frame of each operation Error in Figure

SoC-MOBINET courseware[M-1] High-Level Synthesis16 Scheduling – An example Step 2 : Create a distribution graph DG(1) = 2.833DG(3) = DG(2) = 2.333DG(4) = 0

SoC-MOBINET courseware[M-1] High-Level Synthesis17 Scheduling – An example A metric called force is introduced. The force is used to optimize the utilization of units. A high positive force value indicates a poor utilization. Step 3 : Calculate the force (a new metric)

SoC-MOBINET courseware[M-1] High-Level Synthesis18 Scheduling – An example Step 3 : Calculate the force (a new metric) Fixed Free

SoC-MOBINET courseware[M-1] High-Level Synthesis19 Scheduling – An example Step 3 : Calculate the force (a new metric) With the operation x’ in control-step 1. DG(1) = 2.833DG(3) = DG(2) = 2.333DG(4) = 0 Poor utilization

SoC-MOBINET courseware[M-1] High-Level Synthesis20 Scheduling – An example Step 3 : Calculate the force (a new metric) With the operation x’ in control-step 2. (x’’ must be in control-step 3) DG(1) = 2.833DG(3) = DG(2) = 2.333DG(4) = 0 Good utilization Direct force (calculated as before) Indirect force (on x’’ in control-step 3)

SoC-MOBINET courseware[M-1] High-Level Synthesis21 Scheduling – An example By repeatedly assigning operations to various control-steps and calculating the force associated with the choice several force values will be available. The Force-directed scheduling algorithm chooses the assignment with the lowest force value, which also balances the concurrency of operations most efficiently.

SoC-MOBINET courseware[M-1] High-Level Synthesis22 Force-Directed Scheduling The Force-Directed Scheduling approach reduces the amount of: Functional Units Registers Interconnect By introducing Registers and Interconnect as storage operations, the force is calcuted for these as well. The 3 steps of the algorithm are carried out for these operations also.

SoC-MOBINET courseware[M-1] High-Level Synthesis23 Force-Directed Scheduling

SoC-MOBINET courseware[M-1] High-Level Synthesis24 Force-Directed Scheduling Introducing Registers and Interconnect as operations. Since the Registers and Interconnect area consumption is reduced in solution b, it might be optimal.

SoC-MOBINET courseware[M-1] High-Level Synthesis25 Force-Directed List Scheduling List Scheduling Force Directed Scheduling Resources given Time contraints given Minimize delay Minimize required resources If the problem is the other way around, ie. if the resources are given a Force-Directed List Scheduling algorithm can be applied.

SoC-MOBINET courseware[M-1] High-Level Synthesis26 Force-Directed List Scheduling The Force-Directed List Scheduling utilizes the strengths of the: Force-Directed Scheduling and List Scheduling Force-Directed List Scheduling is similar to List Scheduling except force is the priority function, not mobility.

SoC-MOBINET courseware[M-1] High-Level Synthesis27 Force-Directed List Scheduling List SchedulingForce-Directed List Scheduling We have 1 adder and 1 multiplier Mobility is a poor metric Force is a better metric

SoC-MOBINET courseware[M-1] High-Level Synthesis28 Results Test Application : Fifth-order Elliptic Wave Filter x p is a pipelined multiplier Many optimal results, depending on the application One result, ASAP is not optimal.

SoC-MOBINET courseware[M-1] High-Level Synthesis29 Results – The Design Space The Force-Directed List Scheduling gives better means for exploring the design space. It offers many results and depending on the application the designer can choose to use more or less resources and see what implications it will have on the delay.

SoC-MOBINET courseware[M-1] High-Level Synthesis30 One more optimization The Force-Directed List Scheduling is implemented in a system called HAL. HAL also uses techniques called register merging and multiplexer merging. These techniques minimizes the cost on registers and interconnections.

SoC-MOBINET courseware[M-1] High-Level Synthesis31 Results Comparison on various syntesis tools normalized to HAL86 Test Application : Differential Equation Non-pipelined Multiplier Pipelined Multiplier

SoC-MOBINET courseware[M-1] High-Level Synthesis32 Results Physical unit and interconnection requirements Test Application : Wave Filter

SoC-MOBINET courseware[M-1] High-Level Synthesis33 Conclussions The Force-Directed Scheduling and Force-Directed List Scheduling algorithms allows the designer to explore the design space. By taking into account the cost of interconnection, registers and multiplexers a more precise algorithm is realized. The HAL system, using these algorithms, shows promising results compared to other systems.