1 Simulated Evolution Algorithm for Multiobjective VLSI Netlist Bi-Partitioning By Dr Sadiq M. Sait Dr Aiman El-Maleh Raslan Al Abaji King Fahd University Computer Engineering Department MS Thesis Presentation
2 Introduction Problem Formulation Cost Functions Proposed Approaches Experimental results Conclusion Outline ….
3 Design Characteristics 0.13M 12MHz 1.5um CAE Systems, Silicon compilation 7.5M 333MHz 0.25um Cycle-based simulation, Formal Verification 3.3M 200MHz 0.6um Top-Down Design, Emulation 1.2M 50MHz 0.8um HDLs, Synthesis 0.06M 2MHz 6um SPICE Simulation Key CAD Capabilities The Challenges to sustain such an exponential growth to achieve gigascale integration have shifted in a large degree, from the process of manufacturing technologies to the design technology. VLSI Technology Trend
4 Technology0.1 um Transistors200 M Logic gates40 M Size520 mm 2 Clock GHz Chip I/O’s4,000 Wiring levels Voltage Power160 Watts Supply current~160 Amps Performance Power consumption Noise immunity Area Cost Time-to-market Tradeoffs!!! The VLSI Chip in 2006
5 1.System Specification 2.Functional Design 3.Logic Design 4.Circuit Design 5.Physical Design 6.Design Verification 7.Fabrication 8.Packaging Testing and Debugging VLSI design process is carried out at a number of levels. VLSI Design Cycle
6 Physical Design converts a circuit description into a geometric description. This description is used to manufacture a chip. 1.Partitioning 2.Floorplanning and Placement 3.Routing 4.Compaction The physical design cycle consists of: Physical Design
7 Decomposition of a complex system into smaller subsystems. Each subsystem can be designed independently speeding up the design process (divide-and conquer-approach). Decompose a complex IC into a number of functional blocks, each of them designed by one or a team of engineers. Decomposition scheme has to minimize the interconnections between subsystems. Why we need Partitioning ?
8 System Level Partitioning Board Level Partitioning Chip Level Partitioning System PCBs Chips Subcircuits / Blocks Levels of Partitioning
9 Partitioning Algorithms Group Migration Simulation Based Iterative Performance Driven 1.Kernighan-Lin 2.Fiduccia- Mattheyeses (FM) 3.Multilevel K-way Partitioning Others 1.Simulated annealing 2.Simulated evolution 3.Tabu Search 4.Genetic 1.Lawler et al. 2.Vaishnav 3.choi et al. 4.jun’ichiro et al. 1.Spectral 2.Multilevel Spectral Classification of Partitioning Algorithms
10 Related previous Works 1999Two low power oriented techniques based on simulated annealing (SA) algorithm by choi et al. 1969A bottom-up approach for delay optimization (clustering) was proposed by Lawler et al. 1998A circuit partitioning algorithm under path delay constraint is proposed by jun’ichiro et al. The proposed algorithm consists of the clustering and iterative improvement phases. 1999Enumerative partitioning algorithm targeting low power is proposed in Vaishnav et al. Enumerates alternate partitionings and selects a partitioning that has the same delay but less power dissipation. (not feasible for huge circuits.)
11 Need for Power optimization Portable devices. Power consumption is a hindrance in further integration. Increasing clock frequency. Need for delay optimization In current sub micron design wire delay tend to dominate gate delay. Larger die size imply long on-chip global routes, which affect performance. Optimizing delay due to off-chip capacitance. Motivation
12 Objective Design a class of iterative algorithms for VLSI multi objective partitioning. Explore partitioning from a wider angle and consider circuit delay, power dissipation and interconnect in the same time, under balance constraint.
13 Objectives : Power cost is optimized AND Delay cost is optimized AND Cutset cost is optimized Constraint Balanced partitions to a certain tolerance degree. (10%) Problem formulation
14 Problem formulation the circuit is modeled as a hypergraph H(V,E) Where V ={v 1,v 2,v 3,… v n } is a set of modules (cells). And E = {e 1, e 2, e 3,… e k } is a set of hyperedges. Being the set of signal nets, each net is a subset of V containing the modules that the net connects. A two-way partitioning of a set of nodes V is to determine two subsets V A and V B such that V A U V B = V and V A V B =
15 Based on hypergraph model H = (V, E) Cost 1: c(e) = 1 if e spans more than 1 block Cutset = sum of hyperedge costs Efficient gain computation and update cutset = 3 Cutset
16 path : SE 1 C 1 C 4 C 5 SE 2. Delay = CD SE1 + CD C1 + CD C4 + CD C5 + CD SE2 CD C1 = BD C1 + LF C1 * ( Coffchip + CINP C2 + CINP C3 + CINP C4 ) Delay Model
17 Delay(Pi) = Pi: is any path Between 2 cells or nodes P : set of all paths of the circuit. Delay
18 The average dynamic power consumed by CMOS logic gate in a synchronous circuit is given by: Ni : is the number of output gate transition per cycle( switching Probability) : Is the Load Capacitance Power
19 : Load Capacitances driven by a cell before Partitioning : additional Load due to off chip capacitance.( cut net) Total Power dissipation of a Circuit: Power
20 : Can be assumed identical for all nets :Set of Visible gates Driving a load outside the partition. Power
21 The Balance as constraint is expressed as follows: However balance as a constraint is not appealing because it may prohibits lots of good moves. Objective : |Cells(block1) – Cells( block2)| Balance
22 Weighted Sum Approach 1.Problems in choosing Weights. 2.Need to tune for every circuit. Unifying Objectives, How ?
23 Imprecise values of the objectives – best represented by linguistic terms that are basis of fuzzy algebra Conflicting objectives Operators for aggregating function Fuzzy logic for cost function
24 1.The cost to membership mapping. 2.Linguistic fuzzy rule for combining the membership values in an aggregating function. 3.Translation of the linguistic rule in form of appropriate fuzzy operators. Use of fuzzy logic for Multi- objective cost function
25 And-like operators –Min operator = min ( 1, 2 ) –And-like OWA = * min ( 1, 2 ) + ½ (1- ) ( 1 + 2 ) Or-like operators –Max operator = max ( 1, 2 ) –Or-like OWA = * max ( 1, 2 ) + ½ (1- ) ( 1 + 2 ) Where is a constant in range [0,1] Some fuzzy operators
26 WhereO i and C i are lower bound and actual cost of objective “i” i (x) is the membership of solution x in set “good ‘i’ ” g i is the relative acceptance limit for each objective. Membership functions
27 A good partitioning can be described by the following fuzzy rule IF solution has small cutset AND low power AND short delay AND good Balance. THEN it is a good solution Fuzzy linguistic rule
28 The above rule is translated to AND-like OWA Represent the total Fuzzy fitness of the solution, our aim is to Maximize this fitness. Respectively (Cutset, Power, Delay, Balance ) Fitness. Fuzzy cost function
29 Simulated Evolution Algorithm Simulated evolution Begin Start with an initial feasible Partition S Repeat Evaluation : Evaluate the G i (goodness) of all modules Selection : For each V i (cell) DO begin if Random Rm > G i then select the cell End For Allocation : For each selected V i (cell) DO begin Move the cell to destination Block. End For Until Stopping criteria is satisfied. Return best solution. End
Simulated evolution Implementation. Cut goodness Power goodness Delay goodness The overall is a Fuzzy goodness.
31 Cut goodness d i : set of all nets, Connected and not cut. w i : set of all nets, Connected and cut.
32 Power Goodness V i is the set of all nets connected and Ui is the set of all nets connected and cut.
33 Delay Goodness Ki: is the set of cells in all paths passing by cell i. Li: is the set of cells in all paths passing by cell i and are not in same block as i.
34 Final selection Fuzzy rule. IF Cell I is near its optimal Cut-set goodness as compared to other cells AND AND THEN it has a high goodness. near its optimal net delay goodness as compared to other cells OR T (max) (i) is much smaller than T max near its optimal power goodness compared to other cells
35 T max :delay of most critical path in current iteration. T (max) (i) :delay of longest path traversing cell i. X path = T max / T (max) (i) Fuzzy Goodness Fuzzy Goodness: Respectively (Cutset, Power, Delay ) goodness.
36 Selection implementation Biasless selection scheme The goodness distribution among the cells is Guassian, with mean G m and Standard deviation G . A random Guassian R m number is generated with R . Eliminate having cells with zero selection probability.
37 Selection implementation R m = G m - G R = G selection rule : if R m > Goodness (I) then select the cell.
38 Experimental Results ISCAS Benchmark Circuits
39 SimE Vs Ts Vs GA against time Circuit S13207
40 Experimental Results SimE Vs Ts Vs GA SimE results were better than TS and GA, with faster execution time.
41 Thank you. Questions?