Design & Co-design of Embedded Systems Distributed System Co-synthesis (2) Maziar Goudarzi.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
Chapter 6: Memory Management
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
PradeepKumar S K Asst. Professor Dept. of ECE, KIT, TIPTUR. PradeepKumar S K, Asst.
Design & Co-design of Embedded Systems Distributed System Co-synthesis (1) Maziar Goudarzi.
1 of 14 1 /23 Flexibility Driven Scheduling and Mapping for Distributed Real-Time Systems Paul Pop, Petru Eles, Zebo Peng Department of Computer and Information.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Reference: Message Passing Fundamentals.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Multiprocessors zWhy multiprocessors? zCPUs and accelerators. zMultiprocessor performance.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
1 of 14 1/15 Schedulability Analysis and Optimization for the Synthesis of Multi-Cluster Distributed Embedded Systems Paul Pop, Petru Eles, Zebo Peng Embedded.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
1 Oct 2, 2003 Design Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems Traian Pop, Petru Eles, Zebo Peng Embedded Systems Laboratory.
1 of 14 1 Analysis and Synthesis of Communication-Intensive Heterogeneous Real-Time Systems Paul Pop Computer and Information Science Dept. Linköpings.
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
Winter-Spring 2001Codesign of Embedded Systems1 Introduction to HW/SW Co-Synthesis Algorithms Part of HW/SW Codesign of Embedded Systems Course (CE )
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
1 of 14 1 / 18 An Approach to Incremental Design of Distributed Embedded Systems Paul Pop, Petru Eles, Traian Pop, Zebo Peng Department of Computer and.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
ESL: Electronic System Level Design طراحی الکترونیک در سطح سیستم Maziar Goudarzi Sharif University of Technology Fall 2009.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
Tradeoff Analysis for Dependable Real-Time Embedded Systems during the Early Design Phases Junhe Gan.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
SBSE Course 4. Overview: Design Translate requirements into a representation of software Focuses on –Data structures –Architecture –Interfaces –Algorithmic.
Task Alloc. In Dist. Embed. Systems Murat Semerci A.Yasin Çitkaya CMPE 511 COMPUTER ARCHITECTURE.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
May 2004 Department of Electrical and Computer Engineering 1 ANEW GRAPH STRUCTURE FOR HARDWARE- SOFTWARE PARTITIONING OF HETEROGENEOUS SYSTEMS A NEW GRAPH.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE )
Architectural Blueprints The “4+1” View Model of Software Architecture
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Design & Co-design of Embedded Systems Introduction to Co-synthesis Algorithms + HW/SW Partitioning Algorithms Maziar Goudarzi.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 2: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
High Performance Embedded Computing © 2007 Elsevier Lecture 18: Hardware/Software Codesign Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Chapter 5B: Hardware/Software Codesign / Partitioning EECE **** Embedded System Design.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
© 2000 Morgan Kaufman Overheads for Computers as Components Accelerators zAccelerated systems. zSystem design: yperformance analysis; yscheduling and.
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
1 Hardware-Software Co-Synthesis of Low Power Real-Time Distributed Embedded Systems with Dynamically Reconfigurable FPGAs Li Shang and Niraj K.Jha Proceedings.
ESE 566: Hardware/Software Co-Design of Embedded Systems Fall 2005 Instructor: Dr. Alex Doboli. Paper discussed in class: P. Eles, Z. Peng, K. Kuchcinski,
1 Chapter 6 Reformulation-Linearization Technique and Applications.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Pradeep Konduri Static Process Scheduling:  Proceedance process model  Communication system model  Application  Dicussion.
Dynamo: A Runtime Codesign Environment
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Algorithm Design
Parallel Programming in C with MPI and OpenMP
UNIT 5 EMBEDDED SYSTEM DEVELOPMENT
UNIT 5 EMBEDDED SYSTEM DEVELOPMENT
Parallel Programming in C with MPI and OpenMP
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Design & Co-design of Embedded Systems Distributed System Co-synthesis (2) Maziar Goudarzi

Fall 2005 Design & Co-design of Embedded Systems2 Today Program zIntroduction zPreliminaries zHardware/Software Partitioning zDistributed System Co-Synthesis (part 2) References: Wayne Wolf, “Hardware/Software Co-Synthesis Algorithms,” Chapter 2, Hardware/Software Co-Design: Principles and Practice, Eds: J. Staunstrup, W. Wolf, Kluwer Academic Publishers, W. Wolf, “An architectural co-synthesis algorithm for distributed, embedded computing systems,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 5, no. 2, pp , References: Wayne Wolf, “Hardware/Software Co-Synthesis Algorithms,” Chapter 2, Hardware/Software Co-Design: Principles and Practice, Eds: J. Staunstrup, W. Wolf, Kluwer Academic Publishers, W. Wolf, “An architectural co-synthesis algorithm for distributed, embedded computing systems,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 5, no. 2, pp , 1997.

Fall 2005 Design & Co-design of Embedded Systems3 Topics zIntroduction zAn Integer Linear Programming Model zA Heuristic Algorithm yOn ordinary task graphs yOn an Object-Oriented model

Co-Synthesis Algorithms: Distributed System Co-Synthesis Wolf’s Heuristic Algorithm on Ordinary Task Graphs

Fall 2005 Design & Co-design of Embedded Systems5 Wolf’s Heuristic Algorithm zAs ever, topics of importance: ySystem Specification Language/Model yTarget Architecture yFunctionality (Allocation/Scheduling) Quantum yAllocation Strategy yScheduling Strategy yCost Estimation yPerformance Estimation yAlgorithm Details

Fall 2005 Design & Co-design of Embedded Systems6 Wolf’s Heuristic Algorithm (cont’d) zWolf’s Heuristic Algorithm ySystem Specification Language/Model xAlgorithm input: single-rate task graph yTarget Architecture xHeterogeneous multiprocessor architecture yAllocation xPrimal approach: Performance is the major objective yScheduling x? yFunctionality Quantum xProcesses in a single-rate task graph

Fall 2005 Design & Co-design of Embedded Systems7 Wolf’s Heuristic Algorithm (cont’d) zWolf’s Heuristic Algorithm (cont’d) yPerformance Estimation xComponent Technology Library xRun-time of each process on each available PE is supposed to be known yCost Estimation xComponent Technology Library  Total Cost =  i (Cost of PE i ) +  j (Cost of Device j ) +   (Cost of Comm. Channel k ) yAlgorithm Details

Fall 2005 Design & Co-design of Embedded Systems8 Wolf’s Heuristic Algorithm Details zFour major steps in co-design yPartitioning: dividing the spec. into smaller parts (e.g. processes) yAllocation: assigning each process to a multiprocessor node (PE) yScheduling: serializing processes assigned to each PE yMapping: selecting a particular component for each PE zProblem: These steps (especially allocation, scheduling, and mapping) have a circular relationship zSolution: Break the loop

Fall 2005 Design & Co-design of Embedded Systems9 Wolf’s Heuristic Algorithm Details (cont’d) zWolf: 1.Give an initial allocation 2.Refine it to reduce cost zOrder of satisfying design criteria: 1.Satisfy all deadlines 2.Minimize PE cost 3.Minimize comm. port cost 4.Minimize device cost

Fall 2005 Design & Co-design of Embedded Systems10 Wolf’s Heuristic Algorithm Details (cont’d) yFirst ignore communication costs. Later, take them into account ySteps: 1. Create an initial feasible solution, and perform an initial scheduling on it. Initial feasible solution: assign each process to a separate PE 2. Reallocate processes to PEs to minimize total PE cost. Possibly eliminate PEs from initial feasible solution 3. Reallocate processes again to minimize the amount of communication required between PEs 4. Allocate communication channels 5. Allocate IO devices. (Internal or external to PEs)

Fall 2005 Design & Co-design of Embedded Systems11 Wolf’s Heuristic Algorithm Details (cont’d) yThe most important step: 2. Initial reallocation xReason: PE cost is the dominant hardware cost yInitial reallocation 1. PE cost reduction: 1.1 Scan the PEs, starting with the least-utilized PE. 1.2 Try to reallocate that PE’s processes to other existing PEs 1.3 If no process left on the PE, eliminate it otherwise replace the PE with a suitable lower-cost one 2. Pair-wise merge Merge a pair of PEs into a single, more powerful one 3. Load balancing

Fall 2005 Design & Co-design of Embedded Systems12 Wolf’s Heuristic Algorithm Details (cont’d) yInitial reallocation (cont’d) x“PE cost reduction” phase tries to reallocate multiple processes at a time xThe above 3 phases are repeated as far as possible

Fall 2005 Design & Co-design of Embedded Systems13 Wolf’s Heuristic Algorithm: Experimental Results Example#processesPeriodImpl. CostCPU time (sec) WolfP&PWolfP&P pp pp

Fall 2005 Design & Co-design of Embedded Systems14 Wolf’s Heuristic Algorithm Experimental Results (cont’d) zFinds optimal solutions to most of ILP-solved examples zFinds near-optimal solutions for the remaining examples zShowed good results on larger examples zRequires very little run-time yDue to multiple-move strategy during PE cost minimization phase

Co-Synthesis Algorithms: Distributed System Co-Synthesis Wolf’s Heuristic Algorithm for Object-Oriented Models

Fall 2005 Design & Co-design of Embedded Systems16 Introduction zTarget yCo-synthesis of a Distributed-System out of an Object-Oriented Specification zSignificance yOO is a promising approach in designing embedded systems at ESL Reference: W. Wolf, “Object-Oriented Co-Synthesis of Distributed Embedded Systems,” ACM Transactions on Design Automation of Electronics Systems, pp , 1996 Reference: W. Wolf, “Object-Oriented Co-Synthesis of Distributed Embedded Systems,” ACM Transactions on Design Automation of Electronics Systems, pp , 1996

Fall 2005 Design & Co-design of Embedded Systems17 OO Co-Synthesis Algorithm zAgain, our eight topics ySystem Specification Language/Model yTarget Architecture yFunctionality (Allocation/Scheduling) Quantum yAllocation Strategy yScheduling Strategy yCost Estimation yPerformance Estimation yAlgorithm Details

Fall 2005 Design & Co-design of Embedded Systems18 OO Co-Synthesis Algorithm (cont’d) zSystem Specification Model/Language yAn Object-Oriented Specification as input yMethod dataflow graph as model Object O1 method m1 variables v1,v2 method m2 variables v2,v3 Object O2 method m4 variables v10,v20 Object O3 method m3 variables v8,v9

Fall 2005 Design & Co-design of Embedded Systems19 OO Co-Synthesis Algorithm (cont’d) zTarget Architecture yDistributed System xAn arbitrary-topology network of PEs zFunctionality Quantum yMethods of Objects in an OO Specification yAs far as possible, keeps together all methods of an object yPartitioning is done during algorithm execution

Fall 2005 Design & Co-design of Embedded Systems20 OO Co-Synthesis Algorithm (cont’d) zCost and Performance Estimation yPre-specified xA technology description of available components is input to the algorithm zAllocation, Scheduling, and Algorithm Details yMuch like Wolf’s previous heuristic algorithm yIncludes modifications in order to: xhandle large sets of methods xconsider effects of splitting objects across PEs

Fall 2005 Design & Co-design of Embedded Systems21 OO Co-Synthesis Algorithm (cont’d) zAllocation, Scheduling, and Algorithm Details 1.Initial allocation and scheduling. Allocate processes to PEs such that all tasks are placed on PEs fast enough to ensure that all deadlines are met, keeping objects together as much as possible 2. Minimize PE cost. Reallocate processes to PEs to minimize PE cost, splitting objects when necessary. 3. Minimize communication. Reallocate processes again to minimize inter-PE communication, taking into account traffic generated by splitting objects across PEs

Fall 2005 Design & Co-design of Embedded Systems22 OO Co-Synthesis Algorithm (cont’d) 4. Allocate channels. Allocate communication channels 5. Allocate devices. either as on-chip devices or external devices on communication channels z Allocation, … Details (cont’d)

Fall 2005 Design & Co-design of Embedded Systems23 OO Co-synthesis Details zStep 1 (initial allocation) yOne PE per object zStep 2 (minimize PE cost) yoo_balance_load() xTries to redistribute methods to better balance the system load yPE_replacement() xUse a cheaper PE without distributing the allocation yoo_pairwise_merge() xTries to eliminate PE by moving its methods to other PEs zStep 2 is done repeatedly yMethods are re-scheduled after each new allocation

Fall 2005 Design & Co-design of Embedded Systems24 OO Co-synthesis Details (cont’d) Note : This operation may cause "Hidden communication”. Note : This operation may cause "Hidden communication”.

Fall 2005 Design & Co-design of Embedded Systems25 OO Co-synthesis Details (cont’d)

Fall 2005 Design & Co-design of Embedded Systems26 OO Co-Synthesis Algorithm (cont’d) zExperimental Results yAlgorithm implemented in C++ xUsing NIH class library x8600 lines of code xExecuted on SGI Indigo workstation yAlgorithm applied to examples from software engineering books on OO design xExample#objects/methods CPU Time xcfuge2/30.05 xdye3/152.0 xjuice3/40.05 xtrain5/60.05 Reason for highest cpu-time: Having most methods => scheduling required in each inner loop of step 2 This implementation, had a simple inefficient scheduler. Reason for highest cpu-time: Having most methods => scheduling required in each inner loop of step 2 This implementation, had a simple inefficient scheduler.

Fall 2005 Design & Co-design of Embedded Systems27 OO Co-Synthesis Algorithm (cont’d) zMain contribution yOO specification is an important aid to automatic partitioning xThe specification is naturally divided into two levels of granularity Systems is composed of Objects Objects are composed of data members and methods yThe heuristic: xPreserve the specification’s partitioning as much as possible

Fall 2005 Design & Co-design of Embedded Systems28 What we learned today zDistributed System Co-Synthesis yA heuristic approach xNon-OO algorithm xCustomization to OO specifications xHeuristic: First minimize the PE cost since it is the dominant factor