Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
Solving IPs – Cutting Plane Algorithm General Idea: Begin by solving the LP relaxation of the IP problem. If the LP relaxation results in an integer solution,
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
PradeepKumar S K Asst. Professor Dept. of ECE, KIT, TIPTUR. PradeepKumar S K, Asst.
Design & Co-design of Embedded Systems Distributed System Co-synthesis (1) Maziar Goudarzi.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
Crew Scheduling Housos Efthymios, Professor Computer Systems Laboratory (CSL) Electrical & Computer Engineering University of Patras.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
1 of 30 June 14, 2000 Scheduling and Communication Synthesis for Distributed Real-Time Systems Paul Pop Department of Computer and Information Science.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
A Platform-based Design Flow for Kahn Process Networks Abhijit Davare Qi Zhu December 10, 2004.
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
Winter-Spring 2001Codesign of Embedded Systems1 Introduction to HW/SW Co-Synthesis Algorithms Part of HW/SW Codesign of Embedded Systems Course (CE )
Design & Co-design of Embedded Systems Distributed System Co-synthesis (2) Maziar Goudarzi.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
1 of 14 1 / 18 An Approach to Incremental Design of Distributed Embedded Systems Paul Pop, Petru Eles, Traian Pop, Zebo Peng Department of Computer and.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
SBSE Course 4. Overview: Design Translate requirements into a representation of software Focuses on –Data structures –Architecture –Interfaces –Algorithmic.
Decision Procedures An Algorithmic Point of View
An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms Authous: Al’ecio P. D. Binotto, Carlos.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE )
Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.
SOFTWARE / HARDWARE PARTITIONING TECHNIQUES SHaPES: A New Approach.
Hardware-Software Co-partitioning for Distributed Embedded Systems.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 2: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
High Performance Embedded Computing © 2007 Elsevier Lecture 18: Hardware/Software Codesign Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Chapter 5B: Hardware/Software Codesign / Partitioning EECE **** Embedded System Design.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
Tao Lin Chris Chu TPL-Aware Displacement- driven Detailed Placement Refinement with Coloring Constraints ISPD ‘15.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Operational Research & ManagementOperations Scheduling Economic Lot Scheduling 1.Summary Machine Scheduling 2.ELSP (one item, multiple items) 3.Arbitrary.
Static Process Scheduling
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil.
1 Hardware-Software Co-Synthesis of Low Power Real-Time Distributed Embedded Systems with Dynamically Reconfigurable FPGAs Li Shang and Niraj K.Jha Proceedings.
Introduction to Integer Programming Integer programming models Thursday, April 4 Handouts: Lecture Notes.
Task Mapping and Partition Allocation for Mixed-Criticality Real-Time Systems Domițian Tămaș-Selicean and Paul Pop Technical University of Denmark.
Winter-Spring 2001Codesign of Embedded Systems1 Essential Issues in Codesign: Architectures Part of HW/SW Codesign of Embedded Systems Course (CE )
Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.
Pradeep Konduri Static Process Scheduling:  Proceedance process model  Communication system model  Application  Dicussion.
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
Introduction to cosynthesis Rabi Mahapatra CSCE617
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course (CE )

Winter-Spring 2001Codesign of Embedded Systems2 Topics Introduction Preliminaries Hardware/Software Partitioning Distributed System Co-Synthesis

Winter-Spring 2001Codesign of Embedded Systems3 Topics Introduction A Integer Linear Programming Model A Heuristic Algorithm

Winter-Spring 2001Codesign of Embedded Systems4 Introduction to Distributed System Co-Syn. Does not use an architectural template Instead, creates a multiprocessor architecture during co-synthesis Usually heterogeneous multiprocessor in Processing Elements Communication Channels Topologies Less emphasis on the design of ASICs More emphasis on the design of multiprocessor topology

Winter-Spring 2001Codesign of Embedded Systems5 Introduction to Distrib. Sys. CoSyn. (cont’d) Very common in practice Specially large CPU + small microcontrollers + small ASICs

Winter-Spring 2001Codesign of Embedded Systems6 Co-Synthesis Algorithms: Distributed System Co- Synthesis Integer Linear Programming Model

Winter-Spring 2001Codesign of Embedded Systems7 ILP Model Introduction Linear Programming (LP): Minimizing/maximizing a Linear target function Subject to a set of Linear constraints Current algorithms: Does find the optimal solution, or else the problem is not feasible at all. Example: Knapsack problem

Winter-Spring 2001Codesign of Embedded Systems8 ILP Model (cont’d) Introduction (cont’d) Integer Linear Programming (ILP) Integer-solution counterpart of LP Example: Knapsack problem with integer-solution constraint Current algorithms: Absolute optimal solution is found Takes much CPU time Only feasible for fairly small problems

Winter-Spring 2001Codesign of Embedded Systems9 Prakash-Parker ILP Model By Prakash and Parker, 1992 Developed an ILP formulation Used general ILP solvers to solve it Inputs to the algorithm Single-rate task graph Technology model for the PEs, communication channels, and processes’ execution characteristics on them Target function Minimize system implementation cost Constraints Describe the requirements of the system

Winter-Spring 2001Codesign of Embedded Systems10 Prakash-Parker ILP Model (cont’d) Algorithm classification criteria Input Model Single-rate task graph Target Architecture Distributed multiprocessor Quantum Processes of the task graph Cost Estimation Based on technology models provided to the algorithm Represented as target function of the ILP

Winter-Spring 2001Codesign of Embedded Systems11 Prakash-Parker ILP Model (cont’d) Algorithm classification criteria (cont’d) Performance Estimation Based on technology models provided to the algorithm Scheduling, Allocation Embedded in the ILP formulation constraints Algorithm details

Winter-Spring 2001Codesign of Embedded Systems12 Prakash-Parker ILP Model (cont’d) Algorithm classification topics (cont’d) Algorithm details Target Function Minimize cost Sets of Constraints Allocation (PE and communication links) Scheduling (Processes on PEs, and communications on links)

Winter-Spring 2001Codesign of Embedded Systems13 Prakash-Parker ILP Model (cont’d) Sets of Constraints (cont’d) Allocation Processor-selection constraints Each process must be assigned to one and only one (not more, not less) processor Data-transfer type constraints Each communication must be either local or multi-hop. But not both, and not neither

Winter-Spring 2001Codesign of Embedded Systems14 Prakash-Parker ILP Model (cont’d) Sets of Constraints (cont’d) Scheduling Input-availability constraints Data cannot be used by the sink process until after produced by the source process Output-availability constraints Data must obey the fractional output generation parameters Process execution start/end constraints Process finish-time depends on its start-time and the PE on which it executes Data-transfer start/end constraints Similar to previous, but using data transfer times

Winter-Spring 2001Codesign of Embedded Systems15 Prakash-Parker ILP Model (cont’d) Sets of Constraints (cont’d) Scheduling (cont’d) Processor-usage-exclusion Processes on a single PE must not execute simultaneously Communication-usage-exclusion Multiple communications must not be scheduled on the same link simultaneously

Winter-Spring 2001Codesign of Embedded Systems16 Prakash-Parker ILP Model (cont’d) Experimental Results Applied only to relatively small problems Reason: use of general ILP solvers Their largest task graph: 9 processes Took 6000 CPU minutes on an unspecified processor Significance of the work Did Achieve precisely optimal solutions on those examples which they could solve Used as benchmarks for heuristic co-synthesis algorithms

Winter-Spring 2001Codesign of Embedded Systems17 Co-Synthesis Algorithms: Distributed System Co- Synthesis Wolf’s Heuristic Algorithm

Winter-Spring 2001Codesign of Embedded Systems18 Wolf’s Heuristic Algorithm As ever, topics of importance: System Specification Language/Model Target Architecture Functionality (Allocation/Scheduling) Quantum Allocation Strategy Scheduling Strategy Cost Estimation Performance Estimation Algorithm Details

Winter-Spring 2001Codesign of Embedded Systems19 Wolf’s Heuristic Algorithm (cont’d) Wolf’s Heuristic Algorithm System Specification Language/Model Algorithm input: single-rate task graph Target Architecture Heterogeneous multiprocessor architecture Allocation Primal approach: Performance is the major objective Scheduling ? Functionality Quantum Processes in a single-rate task graph

Winter-Spring 2001Codesign of Embedded Systems20 Wolf’s Heuristic Algorithm (cont’d) Performance Estimation Component Technology Library Run-time of each process on each available PE is supposed to be known Cost Estimation Component Technology Library Total Cost =  i (Cost of PE i ) +  j (Cost of Comm_Channel j ) Algorithm Details

Winter-Spring 2001Codesign of Embedded Systems21 Wolf’s Heuristic Algorithm Details First ignore communication costs. Later, take them into account Steps: 1. Create an initial feasible solution, and perform an initial scheduling on it. Initial feasible solution: assign each process to a separate PE 2. Reallocate processes to PEs to minimize total PE cost. Possibly eliminate PEs from initial feasible solution 3. Reallocate processes again to minimize the amount of communication required between PEs 4. Allocate communication channels 5. Allocate IO devices. (Internal or external to PEs)

Winter-Spring 2001Codesign of Embedded Systems22 Wolf’s Heuristic Algorithm Details (cont’d) The most important step: 2. Initial reallocation Reason: PE cost is the dominant hardware cost Initial reallocation 1. PE cost reduction: 1.1 Scan the PEs, starting with the least-utilized PE. 1.2 Try to reallocate that PE’s processes to other existing PEs 1.3 If no process left on the PE, eliminate it otherwise replace the PE with a suitable lower-cost one 2. Pair-wise merge Merge a pair of PEs into a single, more powerful one 3. Load balancing

Winter-Spring 2001Codesign of Embedded Systems23 Wolf’s Heuristic Algorithm Details (cont’d) Initial reallocation (cont’d) “PE cost reduction” phase tries to reallocate multiple processes at a time The above 3 phases are repeated as far as possible Experimental results Finds optimal solutions to most of ILP-solved examples Finds near-optimal solutions for the remaining examples Showed good results on larger examples Requires very little run-time Due to multiple-move strategy during PE cost minimization phase

Winter-Spring 2001Codesign of Embedded Systems24 What we learned today Distributed System Co-Synthesis: The other broad category of co-synthesis algorithms