Partitioning Presented by AMIT KUMAR GUPTA(2001VLS007) RAM BABU ROY(2001VLS022)
Agenda………………… Motivation Objective System partitioning-structural and functional Major partitioning issues Surveying of basic algorithms Partitioning functionality among hardware components Partitioning functionality among both hardware and software components Hardware components are implemented by designing structure Software components by compiling software Applications of partitioning
Motivation……………………… Design automation to system level To support integrated design of HW/SW
What HW/SW partitioning actually means?….. Selection of appropriate part of the system for HW/SW implementation This has got crucial impact on cost and overall performance of the system For small systems partitioning can be done by designer’s experience and intuition For large systems it needs high performance heuristics and CAD tools
Steps in partitioning HW/SW partitioning Problem Formulation Optimize Performance Cost minimization Satisfy all design constraints Some performance constraints are only HW,so dedicated ASIC/FPGA used
Partitioning approaches differ in……………….. Initial specification Level of granularity Degree of automation Cost function Partitioning Algorithm
Structural partitioning Structural partitioning can be easily mapped to a graph partitioning problem Size/performance tradeoffs are difficult Large number of objects Limited to only hardware designs
Functional partitioning Divide the functionality into non divisible pieces called functional objects. Advantages Size/performance tradeoffs Small no of objects Hardware and software solutions both possible
Partitioning issues Specification abstraction level Granularity System-component allocation Metrics and estimations Objective and closeness functions Partitioning algorithms Output Flow of control and designer interaction
Metrics……… Monetary cost Execution time Communication bit rates Power consumption,Area,Pins Testability,reliability,program size,data size,memory size
Partitioning algorithms…….. Constructive algorithms Iterative algorithms
Basic partitioning algorithms……… Random mapping Hierarchical clustering Multistage clustering Group migration Ratio cut Kerninghan-Lin algorithm Simulated annealing Genetic evolution Integer linear programming
Hardware software partitioning algorithms Greedy algorithms Hill-climbing algorithms Binary constraint search(BCS) HW/SW partitioning Energy-conscious HW/SW partitioning Preference-driven hierarchical HW/SW partitioning Simulated annealing Tabu search
Energy-Conscious HW/SW-Partitioning of Embedded Systems: Basic Concept Energy dissipation is a hot topic in the design of – especially mobile embedded systems. This is because applications like digital video cameras, cellular phones etc. draw their current from batteries that spend a limited amount of energy only. we show that energy-conscious HW/SW partitioning can lead to drastic reductions of energy dissipation of a whole embedded system. The obtained results show energy savings up 59% while the performance remains approximately the same or becomes even slightly higher. As a main result, energy-conscious HW/SW-partitioning is a promising method to be deployed in addition to classical energy and/or power reduction methods. Since the power dissipation varies according to the executed instruction, the term software energy is justified.
Preference-Driven Hierarchical Hardware/Software Partitioning We present a hierarchical evolutionary approach to hardware/software partitioning for real-time embedded systems. In contrast to most of previous approaches, we apply a hierarchical structure and dynamically determine the granularity of tasks and hardware modules to adaptively optimize the solution while keeping the search space as small as possible. Efficient ranking is another problem addressed in this paper. Experiment results show that our algorithm is both effective and efficient.
Hierarchical Models and our approach
Hierarchical Models and our approach…………………….
Hierarchical Evolutionary Algorithm In hardware/software partitioning problem, for a nonhierarchical task graph, each node is to be assigned to a hardware module. In EA, such a node-hardware tuple becomes a gene in an individual. However, for the hierarchical task graph, how to encode genes needs some careful consideration. A simple approach is to associate each element with a finest level task node. Note that no task is represented more than once in an HTG instance. This guarantees the correctness when constructing the individual. We use the notation (Vi;Mk) to denote task Vi is assigned to (hardware) module Mk. Then a gene list for the instance in Figure 2(a) might be f(V1;M1) (V21;M2) (V22;M3) (V31;M4) (V32;M5)g, and f(V1;M01) (V21;M02)(V22;M03) (V31a;M04) (V31b;M05) (V31c;M06) (V31d;M07)(V32;M08)g for Figure 2(b).
Hierarchical Evolutionary Algorithm………………. Ns is total nodes at the finest level Ni is no of nodes at the present instance K is user defined constant G is no of iterations Θ is the probability of going in deep of a complex node 1- Θ is probability of mapping Vi to another hardware module
Hierarchical Evolutionary Algorithm……………………
Preference Driven Ranking When solving the partitioning problem, how to handle multiple, often conflicting design objectives is not easy. ISMAUT offers an efficient way to compare alternative design according to the designer’s preferences. Let the fitness of a design x be represented by Vx, and denote the kth of x attribute by ak(x), then where vk() maps the raw attribute values to set [0; 1] and wk is the corresponding weight.
Preference Driven Ranking……………………… Designs are considered to be more desirable. Let x, x’ be two individuals with attribute a k(x) and a k(x’) k = 1, 2, .. , n. Suppose that according to the designer preference, x is considered to be preferable to x’, denoted by x > x’.
Preference Driven Ranking…………………… Solving the linear programming problems can therefore be transformed to check the objective function values at each of these extreme points. To compare two indifferent individuals x and y,calculate
Example Results:
Summary of preference driven Hw/Sw partition we present several techniques to improve the hardware/software partitioning process for large, complex embedded systems. We proposed the use of both hierarchical task specification and hardware modules. To facilitate the partitioning process, we extended the existing EA approach so that it can effectively handle hierarchical structures. we introduced the idea of employing the extreme points in multi-objective linear programming to eliminate the time-consuming procedure of solving multiple linear programming problem instances. The experimental results obtained so far have clearly demonstrated the advantages of our proposed approach.
Overview of the co-synthesis environment
Overview of the co-synthesis environment….. initial system specification- a set of processes interacting through communication channels. This specification is further decomposed into units of smaller granularity. The partitioning algorithm generates as output a model consisting of two sets of interacting processes The processes in one set are marked as candidates for hardware implementation, while the processes in the other set are marked as software implementation candidates. The main goal of partitioning is to maximize performance in terms of execution speed.
The partitioning steps
The partitioning steps…… 1. Extraction of blocks of statements, loops, and subprograms: processes that are responsible for most of the execution time spent inside a process (regions with a large CL). Candidate regions are typically loops and subprograms, but can also be blocks of statements with a high CL. -The designer guides identification and extraction of the regions and decides implicitly on the granularity of further partitioning a. By identifying a certain region to be extracted (regardless of its CL) assigning hardware or software partition b. By imposing boundary values: 2. Process graph generation: 3. Partitioning of the process graph: 4. Process merging: During the first step one or several child processes are possibly extracted from a parent process. If, as result of step 3, some of the child processes are assigned to the same partition with their parent process, they are, optionally, merged back together.
Objectives to be considered 1. To identify basic regions (processes, subprograms, loops, and blocks of statements which are responsible for most of the execution time in order to be assigned to the hardware partition; 2. To minimize communication between the hardware and software domains; 3. To increase parallelism within the resulted system at the following three levels: - internal parallelism of each hardware process (during high-level synthesis, operations are scheduled to be executed in parallel by the available functional units); - parallelism between processes assigned to the hardware partition; - parallelism between the hardware coprocessor and the microprocessor executing the software processes.
Statistics used Two types of statistics are used by the partitioning algorithm: 1. Computation load (CL) of a basic region is a quantitative measure of the total computation executed by that region, considering all its activations during the simulation process. It is expressed as the total number of operations executed inside that region, where each operation is weighted with a coefficient depending on its relative complexity -The relative computation load (RCL) of a block of statements, loop, or a subprogram is the computation load of the respective basic region divided by the computation load of the process the region belongs to. The relative computation load of a process is the computation load of that process divided by the total computation load of the system. 2. Communication intensity (CI) on a channel connecting two processes is expressed as the total number of send operations executed on the respective channel.
Simulated annealing Iterative improvement algorithms based on neighborhood search are widely used for hardware/software partitioning. To avoid being trapped in a local minimum heuristics are implemented which are very often based on simulated annealing Simulated annealing selects the neighboring solution randomly and always accepts an improved solution. It also accepts worse solutions with a certain probability that depends on the deterioration of the cost function and on a control parameter called temperature. Simulated annealing algorithms can be quickly implemented and are widely applicable to many different problems. Limitation - long execution time, large amount of experiments needed to tune the algorithm.
Simulated annealing algorithm
Cooling schedules
Generation of a new solution with improved move
Partitioning times with SA: simple moves (SM) and improved moves (IM)
Variation of cost function during simulated annealing with simple moves for 100 nodes
Variation of cost function during simulated annealing with improved moves for 100 nodes
Partitioning time with SA
Tabu search algorithm Tabu search controls uphill moves not purely randomly but in an intelligent way. The tabu search approach accepts uphill moves and stimulates convergence toward a global optimum by creating and exploiting data structures to take advantage of the search history at selection of the next move. Two key elements of the TS algorithm are the data structures called short and long term memory. Short term memory stores information relative to the most recent history of the search. It is used in order to avoid cycling that could occur if a certain move returns to a recently visited solution. Long term memory, on the other side, stores information on the global evolution of the algorithm. These are typically frequency measures relative to the occurrence of a certain event. They can be applied to perform diversification which is meant to improve exploration of the solution space by broadening the spectrum of visited solutions.
Tabu search algorithm
Parameters and CPU time with TS
Partitioning times with SA and TS
Future possibilities Many systems exhibit a high degree of regularity(regularity means many of the behaviors in the system are identical,differing only the data on which they operate).Future algorithms should include techniques to partition regular and semi-regular behaviors. Feedback metrics to the system and again partition Since partitioning is a quite mature field,the majority of future tasks will involve adaptation of existing techniques for applicability at the functional level Develop an algorithm which partition at multiple levels of granularity Combine functional partitioning with high-level synthesis
References Petru Eles, Zebo Peng, Krzysztof Kuchcinski, Alexa Doboli “System Level Hardware/Software Partitioning Based on Simulated Annealing and Tabu Search” Gang Quan Xiaobo(Sharon) Hu Garrison Greenwood “Preference-Driven Hierarchical Hardware/Software Partitioning” J¨org Henkel Yanbing Li “Energy-Conscious HW/SW-Partitioning of Embedded Systems:A Case Study on an MPEG-2 Encoder” Gajski D D, Vahid F, Narayan S,Gong J “Specification and Design of Embedded Systems”