Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course (CE )
Winter-Spring 2001Codesign of Embedded Systems2 Topics Introduction Preliminaries Hardware/Software Partitioning Distributed System Co-Synthesis
Winter-Spring 2001Codesign of Embedded Systems3 Topics Introduction A Integer Linear Programming Model A Heuristic Algorithm
Winter-Spring 2001Codesign of Embedded Systems4 Introduction to Distributed System Co-Syn. Does not use an architectural template Instead, creates a multiprocessor architecture during co-synthesis Usually heterogeneous multiprocessor in Processing Elements Communication Channels Topologies Less emphasis on the design of ASICs More emphasis on the design of multiprocessor topology
Winter-Spring 2001Codesign of Embedded Systems5 Introduction to Distrib. Sys. CoSyn. (cont’d) Very common in practice Specially large CPU + small microcontrollers + small ASICs
Winter-Spring 2001Codesign of Embedded Systems6 Co-Synthesis Algorithms: Distributed System Co- Synthesis Integer Linear Programming Model
Winter-Spring 2001Codesign of Embedded Systems7 ILP Model Introduction Linear Programming (LP): Minimizing/maximizing a Linear target function Subject to a set of Linear constraints Current algorithms: Does find the optimal solution, or else the problem is not feasible at all. Example: Knapsack problem
Winter-Spring 2001Codesign of Embedded Systems8 ILP Model (cont’d) Introduction (cont’d) Integer Linear Programming (ILP) Integer-solution counterpart of LP Example: Knapsack problem with integer-solution constraint Current algorithms: Absolute optimal solution is found Takes much CPU time Only feasible for fairly small problems
Winter-Spring 2001Codesign of Embedded Systems9 Prakash-Parker ILP Model By Prakash and Parker, 1992 Developed an ILP formulation Used general ILP solvers to solve it Inputs to the algorithm Single-rate task graph Technology model for the PEs, communication channels, and processes’ execution characteristics on them Target function Minimize system implementation cost Constraints Describe the requirements of the system
Winter-Spring 2001Codesign of Embedded Systems10 Prakash-Parker ILP Model (cont’d) Algorithm classification criteria Input Model Single-rate task graph Target Architecture Distributed multiprocessor Quantum Processes of the task graph Cost Estimation Based on technology models provided to the algorithm Represented as target function of the ILP
Winter-Spring 2001Codesign of Embedded Systems11 Prakash-Parker ILP Model (cont’d) Algorithm classification criteria (cont’d) Performance Estimation Based on technology models provided to the algorithm Scheduling, Allocation Embedded in the ILP formulation constraints Algorithm details
Winter-Spring 2001Codesign of Embedded Systems12 Prakash-Parker ILP Model (cont’d) Algorithm classification topics (cont’d) Algorithm details Target Function Minimize cost Sets of Constraints Allocation (PE and communication links) Scheduling (Processes on PEs, and communications on links)
Winter-Spring 2001Codesign of Embedded Systems13 Prakash-Parker ILP Model (cont’d) Sets of Constraints (cont’d) Allocation Processor-selection constraints Each process must be assigned to one and only one (not more, not less) processor Data-transfer type constraints Each communication must be either local or multi-hop. But not both, and not neither
Winter-Spring 2001Codesign of Embedded Systems14 Prakash-Parker ILP Model (cont’d) Sets of Constraints (cont’d) Scheduling Input-availability constraints Data cannot be used by the sink process until after produced by the source process Output-availability constraints Data must obey the fractional output generation parameters Process execution start/end constraints Process finish-time depends on its start-time and the PE on which it executes Data-transfer start/end constraints Similar to previous, but using data transfer times
Winter-Spring 2001Codesign of Embedded Systems15 Prakash-Parker ILP Model (cont’d) Sets of Constraints (cont’d) Scheduling (cont’d) Processor-usage-exclusion Processes on a single PE must not execute simultaneously Communication-usage-exclusion Multiple communications must not be scheduled on the same link simultaneously
Winter-Spring 2001Codesign of Embedded Systems16 Prakash-Parker ILP Model (cont’d) Experimental Results Applied only to relatively small problems Reason: use of general ILP solvers Their largest task graph: 9 processes Took 6000 CPU minutes on an unspecified processor Significance of the work Did Achieve precisely optimal solutions on those examples which they could solve Used as benchmarks for heuristic co-synthesis algorithms
Winter-Spring 2001Codesign of Embedded Systems17 Co-Synthesis Algorithms: Distributed System Co- Synthesis Wolf’s Heuristic Algorithm
Winter-Spring 2001Codesign of Embedded Systems18 Wolf’s Heuristic Algorithm As ever, topics of importance: System Specification Language/Model Target Architecture Functionality (Allocation/Scheduling) Quantum Allocation Strategy Scheduling Strategy Cost Estimation Performance Estimation Algorithm Details
Winter-Spring 2001Codesign of Embedded Systems19 Wolf’s Heuristic Algorithm (cont’d) Wolf’s Heuristic Algorithm System Specification Language/Model Algorithm input: single-rate task graph Target Architecture Heterogeneous multiprocessor architecture Allocation Primal approach: Performance is the major objective Scheduling ? Functionality Quantum Processes in a single-rate task graph
Winter-Spring 2001Codesign of Embedded Systems20 Wolf’s Heuristic Algorithm (cont’d) Performance Estimation Component Technology Library Run-time of each process on each available PE is supposed to be known Cost Estimation Component Technology Library Total Cost = i (Cost of PE i ) + j (Cost of Comm_Channel j ) Algorithm Details
Winter-Spring 2001Codesign of Embedded Systems21 Wolf’s Heuristic Algorithm Details First ignore communication costs. Later, take them into account Steps: 1. Create an initial feasible solution, and perform an initial scheduling on it. Initial feasible solution: assign each process to a separate PE 2. Reallocate processes to PEs to minimize total PE cost. Possibly eliminate PEs from initial feasible solution 3. Reallocate processes again to minimize the amount of communication required between PEs 4. Allocate communication channels 5. Allocate IO devices. (Internal or external to PEs)
Winter-Spring 2001Codesign of Embedded Systems22 Wolf’s Heuristic Algorithm Details (cont’d) The most important step: 2. Initial reallocation Reason: PE cost is the dominant hardware cost Initial reallocation 1. PE cost reduction: 1.1 Scan the PEs, starting with the least-utilized PE. 1.2 Try to reallocate that PE’s processes to other existing PEs 1.3 If no process left on the PE, eliminate it otherwise replace the PE with a suitable lower-cost one 2. Pair-wise merge Merge a pair of PEs into a single, more powerful one 3. Load balancing
Winter-Spring 2001Codesign of Embedded Systems23 Wolf’s Heuristic Algorithm Details (cont’d) Initial reallocation (cont’d) “PE cost reduction” phase tries to reallocate multiple processes at a time The above 3 phases are repeated as far as possible Experimental results Finds optimal solutions to most of ILP-solved examples Finds near-optimal solutions for the remaining examples Showed good results on larger examples Requires very little run-time Due to multiple-move strategy during PE cost minimization phase
Winter-Spring 2001Codesign of Embedded Systems24 What we learned today Distributed System Co-Synthesis: The other broad category of co-synthesis algorithms