1 Simulated Evolution Algorithm for Multiobjective VLSI Netlist Bi-Partitioning By Dr Sadiq M. Sait Dr Aiman El-Maleh Raslan Al Abaji King Fahd University.

Slides:



Advertisements
Similar presentations
Topics Electrical properties of static combinational gates:
Advertisements

Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Improving Placement under the Constant Delay Model Kolja Sulimma 1, Ingmar Neumann 1, Lukas Van Ginneken 2, Wolfgang Kunz 1 1 EE and IT Department University.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
1 Closed-Loop Modeling of Power and Temperature Profiles of FPGAs Kanupriya Gulati Sunil P. Khatri Peng Li Department of ECE, Texas A&M University, College.
Simulated Evolution Algorithm for Multi- Objective VLSI Netlist Bi-Partitioning Sadiq M. Sait, Aiman El-Maleh, Raslan Al-Abaji King Fahd University of.
Spie98-1 Evolutionary Algorithms, Simulated Annealing, and Tabu Search: A Comparative Study H. Youssef, S. M. Sait, H. Adiche
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
Fuzzy Simulated Evolution for Power and Performance of VLSI Placement Sadiq M. Sait Habib Youssef Junaid A. KhanAimane El-Maleh Department of Computer.
Finite State Machine State Assignment for Area and Power Minimization Aiman H. El-Maleh, Sadiq M. Sait and Faisal N. Khan Department of Computer Engineering.
VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Power-Aware Placement
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.
Chapter 2 – Netlist and System Partitioning
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
1 Simulated Evolution Algorithm for Multiobjective VLSI Netlist Bi-Partitioning By Dr Sadiq M. Sait Dr Aiman El-Maleh Raslan Al Abaji King Fahd University.
Fuzzy Simulated Evolution for Power and Performance of VLSI Placement Sadiq M. SaitHabib Youssef Junaid A. KhanAimane El-Maleh Department of Computer Engineering.
Fast Force-Directed/Simulated Evolution Hybrid for Multiobjective VLSI Cell Placement Junaid Asim Khan Dept. of Elect. & Comp. Engineering, The University.
Iterative Algorithms for Low Power VLSI Placement Sadiq M. Sait, Ph.D Department of Computer Engineering King Fahd University of Petroleum.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
1 General Iterative Heuristics for VLSI Multiobjective Partitioning by Dr. Sadiq M. Sait Dr. Aiman El-Maleh Mr. Raslan Al Abaji King Fahd University Computer.
1 Topology Design of Structured Campus Networks by Habib Youssef Sadiq M. SaitSalman A. Khan Department of Computer Engineering King Fahd University of.
Fuzzy Evolutionary Algorithm for VLSI Placement Sadiq M. SaitHabib YoussefJunaid A. Khan Department of Computer Engineering King Fahd University of Petroleum.
1 Enhancing Performance of Iterative Heuristics for VLSI Netlist Partitioning Dr. Sadiq M. Sait Dr. Aiman El-Maleh Mr. Raslan Al Abaji. Computer Engineering.
RLC Interconnect Modeling and Design Students: Jinjun Xiong, Jun Chen Advisor: Lei He Electrical Engineering Department Design Automation Group (
1 Topology Design of Structured Campus Networks by Habib Youssef Sadiq M. SaitSalman A. Khan Department of Computer Engineering King Fahd University of.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
Introduction to VLSI Design – Lec01. Chapter 1 Introduction to VLSI Design Lecture # 2 A Circuit Design Example.
Power Reduction for FPGA using Multiple Vdd/Vth
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS I. Bucur, N. Cupcea, C. Stefanescu, A. Surpateanu Computer Science and Engineering Department, University.
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
CAD for Physical Design of VLSI Circuits
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
1 Wire Length Prediction-based Technology Mapping and Fanout Optimization Qinghua Liu Malgorzata Marek-Sadowska VLSI Design Automation Lab UC-Santa Barbara.
HDL-Based Layout Synthesis Methodologies Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
1 Moore’s Law in Microprocessors Pentium® proc P Year Transistors.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Ho-Lin Chang, Hsiang-Cheng Lai, Tsu-Yun Hsueh, Wei-Kai Cheng, Mely Chen Chi Department of Information and Computer Engineering, CYCU A 3D IC Designs Partitioning.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
10/25/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 3. Circuit Partitioning.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
Deferred Decision Making Enabled Fixed- Outline Floorplanner Jackey Z. Yan and Chris Chu DAC 2008.
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
Distributed Computation: Circuit Simulation CK Cheng UC San Diego
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Simulated Evolution Algorithm for Multi- Objective VLSI Netlist Bi-Partitioning Sadiq M. Sait, Aiman El-Maleh, Raslan Al Abaji King Fahd University of.
ICS 252 Introduction to Computer Design
EE415 VLSI Design THE INVERTER [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
CprE566 / Fall 06 / Prepared by Chris ChuPartitioning1 CprE566 Partitioning.
1 Simulated Evolution Algorithm for Multi- Objective VLSI Netlist Bi-Partitioning Sadiq M. Sait,, Aiman El-Maleh, Raslan Al Abaji King Fahd University.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Prediction of Interconnect Net-Degree Distribution Based on Rent’s Rule Tao Wan and Malgorzata Chrzanowska- Jeske Department of Electrical and Computer.
VLSI Testing Lecture 5: Logic Simulation
Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.
Chapter 2 – Netlist and System Partitioning
Fast Min-Register Retiming Through Binary Max-Flow
Presentation transcript:

1 Simulated Evolution Algorithm for Multiobjective VLSI Netlist Bi-Partitioning By Dr Sadiq M. Sait Dr Aiman El-Maleh Raslan Al Abaji King Fahd University Computer Engineering Department MS Thesis Presentation

2 Introduction Problem Formulation Cost Functions Proposed Approaches Experimental results Conclusion Outline ….

3 Design Characteristics 0.13M 12MHz 1.5um CAE Systems, Silicon compilation 7.5M 333MHz 0.25um Cycle-based simulation, Formal Verification 3.3M 200MHz 0.6um Top-Down Design, Emulation 1.2M 50MHz 0.8um HDLs, Synthesis 0.06M 2MHz 6um SPICE Simulation Key CAD Capabilities The Challenges to sustain such an exponential growth to achieve gigascale integration have shifted in a large degree, from the process of manufacturing technologies to the design technology. VLSI Technology Trend

4 Technology0.1 um Transistors200 M Logic gates40 M Size520 mm 2 Clock GHz Chip I/O’s4,000 Wiring levels Voltage Power160 Watts Supply current~160 Amps Performance Power consumption Noise immunity Area Cost Time-to-market Tradeoffs!!! The VLSI Chip in 2006

5 1.System Specification 2.Functional Design 3.Logic Design 4.Circuit Design 5.Physical Design 6.Design Verification 7.Fabrication 8.Packaging Testing and Debugging VLSI design process is carried out at a number of levels. VLSI Design Cycle

6 Physical Design converts a circuit description into a geometric description. This description is used to manufacture a chip. 1.Partitioning 2.Floorplanning and Placement 3.Routing 4.Compaction The physical design cycle consists of: Physical Design

7 Decomposition of a complex system into smaller subsystems. Each subsystem can be designed independently speeding up the design process (divide-and conquer-approach). Decompose a complex IC into a number of functional blocks, each of them designed by one or a team of engineers. Decomposition scheme has to minimize the interconnections between subsystems. Why we need Partitioning ?

8 System Level Partitioning Board Level Partitioning Chip Level Partitioning System PCBs Chips Subcircuits / Blocks Levels of Partitioning

9 Partitioning Algorithms Group Migration Simulation Based Iterative Performance Driven 1.Kernighan-Lin 2.Fiduccia- Mattheyeses (FM) 3.Multilevel K-way Partitioning Others 1.Simulated annealing 2.Simulated evolution 3.Tabu Search 4.Genetic 1.Lawler et al. 2.Vaishnav 3.choi et al. 4.jun’ichiro et al. 1.Spectral 2.Multilevel Spectral Classification of Partitioning Algorithms

10 Related previous Works 1999Two low power oriented techniques based on simulated annealing (SA) algorithm by choi et al. 1969A bottom-up approach for delay optimization (clustering) was proposed by Lawler et al. 1998A circuit partitioning algorithm under path delay constraint is proposed by jun’ichiro et al. The proposed algorithm consists of the clustering and iterative improvement phases. 1999Enumerative partitioning algorithm targeting low power is proposed in Vaishnav et al. Enumerates alternate partitionings and selects a partitioning that has the same delay but less power dissipation. (not feasible for huge circuits.)

11 Need for Power optimization Portable devices. Power consumption is a hindrance in further integration. Increasing clock frequency. Need for delay optimization In current sub micron design wire delay tend to dominate gate delay. Larger die size imply long on-chip global routes, which affect performance. Optimizing delay due to off-chip capacitance. Motivation

12 Objective Design a class of iterative algorithms for VLSI multi objective partitioning. Explore partitioning from a wider angle and consider circuit delay, power dissipation and interconnect in the same time, under balance constraint.

13 Objectives : Power cost is optimized AND Delay cost is optimized AND Cutset cost is optimized Constraint Balanced partitions to a certain tolerance degree. (10%) Problem formulation

14 Problem formulation the circuit is modeled as a hypergraph H(V,E) Where V ={v 1,v 2,v 3,… v n } is a set of modules (cells). And E = {e 1, e 2, e 3,… e k } is a set of hyperedges. Being the set of signal nets, each net is a subset of V containing the modules that the net connects. A two-way partitioning of a set of nodes V is to determine two subsets V A and V B such that V A U V B = V and V A  V B = 

15 Based on hypergraph model H = (V, E) Cost 1: c(e) = 1 if e spans more than 1 block Cutset = sum of hyperedge costs Efficient gain computation and update cutset = 3 Cutset

16 path  : SE 1  C 1  C 4  C 5  SE 2. Delay  = CD SE1 + CD C1 + CD C4 + CD C5 + CD SE2 CD C1 = BD C1 + LF C1 * ( Coffchip + CINP C2 + CINP C3 + CINP C4 ) Delay Model

17 Delay(Pi) = Pi: is any path Between 2 cells or nodes P : set of all paths of the circuit. Delay

18 The average dynamic power consumed by CMOS logic gate in a synchronous circuit is given by: Ni : is the number of output gate transition per cycle( switching Probability) : Is the Load Capacitance Power

19 : Load Capacitances driven by a cell before Partitioning : additional Load due to off chip capacitance.( cut net) Total Power dissipation of a Circuit: Power

20 : Can be assumed identical for all nets :Set of Visible gates Driving a load outside the partition. Power

21 The Balance as constraint is expressed as follows: However balance as a constraint is not appealing because it may prohibits lots of good moves. Objective : |Cells(block1) – Cells( block2)| Balance

22 Weighted Sum Approach 1.Problems in choosing Weights. 2.Need to tune for every circuit. Unifying Objectives, How ?

23 Imprecise values of the objectives – best represented by linguistic terms that are basis of fuzzy algebra Conflicting objectives Operators for aggregating function Fuzzy logic for cost function

24 1.The cost to membership mapping. 2.Linguistic fuzzy rule for combining the membership values in an aggregating function. 3.Translation of the linguistic rule in form of appropriate fuzzy operators. Use of fuzzy logic for Multi- objective cost function

25 And-like operators –Min operator  = min (  1,  2 ) –And-like OWA  =  * min (  1,  2 ) + ½ (1-  ) (  1 +  2 ) Or-like operators –Max operator  = max (  1,  2 ) –Or-like OWA  =  * max (  1,  2 ) + ½ (1-  ) (  1 +  2 ) Where  is a constant in range [0,1] Some fuzzy operators

26 WhereO i and C i are lower bound and actual cost of objective “i”  i (x) is the membership of solution x in set “good ‘i’ ” g i is the relative acceptance limit for each objective. Membership functions

27 A good partitioning can be described by the following fuzzy rule IF solution has small cutset AND low power AND short delay AND good Balance. THEN it is a good solution Fuzzy linguistic rule

28 The above rule is translated to AND-like OWA Represent the total Fuzzy fitness of the solution, our aim is to Maximize this fitness. Respectively (Cutset, Power, Delay, Balance ) Fitness. Fuzzy cost function

29 Simulated Evolution Algorithm Simulated evolution Begin Start with an initial feasible Partition S Repeat Evaluation : Evaluate the G i (goodness) of all modules Selection : For each V i (cell) DO begin if Random Rm > G i then select the cell End For Allocation : For each selected V i (cell) DO begin Move the cell to destination Block. End For Until Stopping criteria is satisfied. Return best solution. End

Simulated evolution Implementation. Cut goodness Power goodness Delay goodness The overall is a Fuzzy goodness.

31 Cut goodness d i : set of all nets, Connected and not cut. w i : set of all nets, Connected and cut.

32 Power Goodness V i is the set of all nets connected and Ui is the set of all nets connected and cut.

33 Delay Goodness Ki: is the set of cells in all paths passing by cell i. Li: is the set of cells in all paths passing by cell i and are not in same block as i.

34 Final selection Fuzzy rule. IF Cell I is near its optimal Cut-set goodness as compared to other cells AND AND THEN it has a high goodness. near its optimal net delay goodness as compared to other cells OR T (max) (i) is much smaller than T max near its optimal power goodness compared to other cells

35 T max :delay of most critical path in current iteration. T (max) (i) :delay of longest path traversing cell i. X path = T max / T (max) (i) Fuzzy Goodness Fuzzy Goodness: Respectively (Cutset, Power, Delay ) goodness.

36 Selection implementation Biasless selection scheme The goodness distribution among the cells is Guassian, with mean G m and Standard deviation G . A random Guassian R m number is generated with R . Eliminate having cells with zero selection probability.

37 Selection implementation R m = G m - G  R  = G  selection rule : if R m > Goodness (I) then select the cell.

38 Experimental Results ISCAS Benchmark Circuits

39 SimE Vs Ts Vs GA against time Circuit S13207

40 Experimental Results SimE Vs Ts Vs GA SimE results were better than TS and GA, with faster execution time.

41 Thank you. Questions?