CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010
Winter CS CS244 – Lecture 5 Hardware/Software Co-design
Winter CS Review: Design Objectives Performance Cost Quality Thresholds Better Improving quality beyond threshold is desired Improving performance beyond threshold Is a waste Improving cost is desired
Winter CS Co-design Flow System Model System SimulationInformal Specification Hardware/Software Partitioning Partitioned Model Schedule Partitioned Model & Sch. HW/SW Co-simulation Refine Algorithmic Design
Winter CS Co-design Flow Partitioned Model + Sch. Communication Synthesis Software Model Hardware Model HW/SW Co-simulation CompilationSynthesis HW/SW Co-simulation Gate-level Model Binary Exec. Model Refine
Winter CS Co-design Flow Gate-level Model Binary Exec. Model Emulate or Prototype Refine Fabrication
Winter CS Informal Specification & System Level Model Informal Specification loosely defines high level behavior, constraints, and optimization objectives of the system Algorithmic and implementation details absent Performance estimates not present System level model formally captures behavior, constraints, and optimization objectives Can be simulated to obtain early performance estimates Feedback to refine the system specification Can serve as a golden model for validation of intermediate or final stages Algorithmic design
Winter CS Hardware Software Partitioning Decompose (i.e., partition) the function F of the system into N sub-functions F 1, F 2, F 3 … F N Decompose the constraints and design objectives of the system into sub-constraints and design sub-objectives Cluster F 1, F 2, F 3, …, F n into M partitions to run on M processors F {F 1, F 2, F 3 … F n } P1P1 P2P2 P3P3 PMPM … …
Winter CS Scheduling Scheduling is to obtain an execution sequence such that dependencies are obeyed Static During design time the schedule is fixed (the common case) Dynamic During execution time, the schedule is determined (reconfigurable computing) F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F7F7 F8F8 P1: F1 F2 F8 P2: F4 F5 P3: F3 F6 P4: F7
Winter CS Scheduling A deadline D for the entire schedule An execution time for each T i for each F i ASAP (as soon as possible) ALAP (as late as possible) F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F7F7 F8F8 P1: F1 F2 F8 P2: F4 F5 P3: F3 F6 P4: F
Winter CS Functional Co-simulation Some of the M processors are single-purpose (e.g., those with a single function mapped on to them), others are general purpose Functions mapped onto the general-purpose processors are implemented in software and simulated on virtual machines with performance models Functions mapped onto the single-purpose processors are simulated at the behavioral level with performance models Communication is done via abstract channels Feedback is used to refine the partitioning and scheduling tasks
Winter CS Communication Synthesis & Bus- accurate Co-simulation Abstract channels A 1, A 2 … A n are mapped onto a set of communication channels C 1, C 2 … C m Similar to functional partitioning Similar to hardware/software scheduling Channels correspond to physical artifacts of the architecture Hardware and software models are annotated with detailed communication constructs A hardware model and software model is obtained and co- simulated Communication synthesis (or possibly higher levels of design) are refined
Winter CS Compilation & Synthesis & Cycle- accurate Co-simulation Compiler used to generate binary executables for general-purpose processors Synthesis used to generate gate-level models of single-purpose processors Synthesis used to generate gate-level models of general-purpose processors Cycle accurate co-simulation of the entire system Note: mixed level co-simulation is common
Winter CS Emulate/Prototype and Fabrication Use hardware (e.g, FPGAs) to emulate a system as fast as possible (relative to real-time) Fabrication Place & route Mask design Chip testing Manufacturing fault models Test vector generation Packaging
Winter CS Partitioning (Clustering) Given: F = { F 1, F 2, F 3 … F N } P = { P 1, P 2, P 3 … P M } Find a lowest cost partition (cluster), as computed by an objective function Exhaustive approach O(M N ) Heuristics Constructive partitioning (based on closeness function) Random (good for seeding iterative approaches) Cluster Growth Hierarchical clustering Iterative partitioning Start with a partition and improve Gradient search Controlled random search Modified Kernighan/Lin and FM algorithm Partitions a set of nodes (functions) into two bins (processors) Minimize edges between bins (communication cost, wires, etc.) Cost function for moving a node from one partition to another ILP Genetic evolution Simulated annealing
Winter CS Hierarchical Clustering – Example
Clustering w/ several criteria
Winter CS Partitioning (Clustering) Given: F = { F 1, F 2, F 3 … F N } P = { P 1, P 2, P 3 … P M } Find a lowest cost partition (cluster), as computed by an objective function Exhaustive approach O(M N ) Heuristics Constructive partitioning (based on closeness function) Random (good for seeding iterative approaches) Cluster Growth Hierarchical clustering Iterative partitioning Start with a partition and improve Gradient search Controlled random search Modified Kernighan/Lin algorithm Partitions a set of nodes (functions) into two bins (processors) Minimize edges between bins (communication cost, wires, etc.) Cost function for moving a node from one partition to another ILP Genetic evolution Simulated annealing
Iterative Partitioning Algorithms The computation time in an iterative algorithm is spent evaluating large numbers of partitions Iterative algorithms differ from one another primarily in the ways in which they modify the partition and in which they accept or reject bad modifications
Kernighan-Lin (Min-Cut) Algorithms Two-way partitioning example Start with 2 equal subgraphs Exchange k pairs in each iteration Continue until no further improvement Gain function f(internal – external) cost
Winter CS Alternate Partitioning Techniques Start with all functionality in software and move portions into hardware which are time- critical and can not be allocated to software (software-oriented partitioning) Start with all functionality in hardware and move portions into software implementation (hardware-oriented partitioning)
Winter CS More Partitioning Issues Partitioning into hardware and software affects overall system cost and performance Hardware implementation Provides higher performance via hardware speeds and parallel execution of operations Incurs additional design expense Software implementation Lower performance Incurs high cost of developing and maintaining (complex) software
Winter CS Conclusion Satisfying performance, cost, and quality metrics of a system entails hardware and software codesign Partitioning is at the heart of codesign Functional Communication Scheduling Partitioning techniques Constructive Iterative Heuristics often used to bound the running time