ICCAD Nov-2000 Timing Driven Gate Duplication: Complexity Issues and Algorithms Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh Embedded & Reconfigurable.

Slides:



Advertisements
Similar presentations
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Advertisements

Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
ECE 667 Synthesis and Verification of Digital Circuits
Timing Optimization. Optimization of Timing Three phases 1globally restructure to reduce the maximum level or longest path Ex: a ripple carry adder ==>
ECE 667 Synthesis & Verificatioin - FPGA Mapping 1 ECE 667 Synthesis and Verification of Digital Systems Technology Mapping for FPGAs D.Chen, J.Cong, DAOMap.
An Efficient Technology Mapping Algorithm Targeting Routing Congestion Under Delay Constraints Rupesh S. Shelar Intel Corporation Hillsboro, OR Prashant.
Complexity class NP Is the class of languages that can be verified by a polynomial-time algorithm. L = { x in {0,1}* | there exists a certificate y with.
Logical Effort A Method to Optimize Circuit Topology Swarthmore College E77 VLSI Design Adem Kader David Luong Mark Piper December 6, 2005.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Minimum Dynamic Power CMOS Circuit Design by a Reduced Constraint Set Linear Program Tezaswi Raja Vishwani Agrawal Michael L. Bushnell Rutgers University,
Advanced Topics in Algorithms and Data Structures
A Robust Algorithm for Approximate Compatible Observability Don’t Care (CODC) Computation Nikhil S. Saluja University of Colorado Boulder, CO Sunil P.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
International Conference on Computer-Aided Design San Jose, CA Nov. 2001ER UCLA UCLA 1 Congestion Reduction During Placement Based on Integer Programming.
38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.
1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.
Modern VLSI Design 2e: Chapter4 Copyright  1998 Prentice Hall PTR.
ER UCLA UCLA ICCAD: November 5, 2000 Predictable Routing Ryan Kastner, Elaheh Borzorgzadeh, and Majid Sarrafzadeh ER Group Dept. of Computer Science UCLA.
Technology Mapping.
4/20/2006ELEC7250: Alexander 1 LOGIC SIMULATION AND FAULT DIAGNOSIS BY JINS DAVIS ALEXANDER ELEC 7250 PRESENTATION.
Logical Effort.
May 28, 2003Minimum Dynamic Power CMOS1 Minimum Dynamic Power CMOS Circuits Vishwani D. Agrawal Rutgers University, Dept. of ECE Piscataway, NJ 08854
EDA (CS286.5b) Day 3 Clustering (LUT Map and Delay) N.B. no lecture Thursday.
Chapter 11: Limitations of Algorithmic Power
1 Application Specific Integrated Circuits. 2 What is an ASIC? An application-specific integrated circuit (ASIC) is an integrated circuit (IC) customized.
Ryan Kastner ASIC/SOC, September Coupling Aware Routing Ryan Kastner, Elaheh Bozorgzadeh and Majid Sarrafzadeh Department of Electrical and Computer.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
EE141 © Digital Integrated Circuits 2nd Combinational Circuits 1 Logical Effort - sizing for speed.
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Pei-Ci Wu Martin D. F. Wong On Timing Closure: Buffer Insertion for Hold-Violation Removal DAC’14.
1 A Method for Fast Delay/Area Estimation EE219b Semester Project Mike Sheets May 16, 2000.
A Topology-based ECO Routing Methodology for Mask Cost Minimization Po-Hsun Wu, Shang-Ya Bai, and Tsung-Yi Ho Department of Computer Science and Information.
Chapter 11 Limitations of Algorithm Power. Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples:
Computational Complexity Polynomial time O(n k ) input size n, k constant Tractable problems solvable in polynomial time(Opposite Intractable) Ex: sorting,
POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS I. Bucur, N. Cupcea, C. Stefanescu, A. Surpateanu Computer Science and Engineering Department, University.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
1 Wire Length Prediction-based Technology Mapping and Fanout Optimization Qinghua Liu Malgorzata Marek-Sadowska VLSI Design Automation Lab UC-Santa Barbara.
10/14/2015 Based on text by S. Mourad "Priciples of Electronic Systems" Digital Testing: Testability Measures.
A NEW ECO TECHNOLOGY FOR FUNCTIONAL CHANGES AND REMOVING TIMING VIOLATIONS Jui-Hung Hung, Yao-Kai Yeh,Yung-Sheng Tseng and Tsai-Ming Hsieh Dept. of Information.
TECH Computer Science NP-Complete Problems Problems  Abstract Problems  Decision Problem, Optimal value, Optimal solution  Encodings  //Data Structure.
Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Combinational network delay. n Logic optimization.
1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.
ECO Timing Optimization Using Spare Cells Yen-Pin Chen, Jia-Wei Fang, and Yao-Wen Chang ICCAD2007, Pages ICCAD2007, Pages
1 ER UCLA ISPD: Sonoma County, CA, April, 2001 An Exact Algorithm for Coupling-Free Routing Ryan Kastner, Elaheh Bozorgzadeh,Majid Sarrafzadeh.
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Combinational network delay. n Logic optimization.
Static Timing Analysis
International Symposium on Physical Design San Diego, CA April 2002ER UCLA UCLA 1 Routability Driven White Space Allocation for Fixed-Die Standard-Cell.
Multi-Objective Optimization for Topology Control in Hybrid FSO/RF Networks Jaime Llorca December 8, 2004.
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong , Computer Science Department , UCLA Presented.
1 Timing Closure and the constant delay paradigm Problem: (timing closure problem) It has been difficult to get a circuit that meets delay requirements.
Fast Synthesis of Clock Gating from Existing Logic Aaron P. Hurst Univ. of California, Berkeley Portions In Collaboration with… Artur Quiring and Andreas.
Advanced Algorithms Analysis and Design
Memory Segmentation to Exploit Sleep Mode Operation
The Analysis of Cyclic Circuits with Boolean Satisfiability
CS137: Electronic Design Automation
Buffer Insertion with Adaptive Blockage Avoidance
Computability and Complexity
Analysis and design of algorithm
Standard-Cell Mapping Revisited
Alan Mishchenko University of California, Berkeley
Multi-Objective Optimization for Topology Control in Hybrid FSO/RF Networks Jaime Llorca December 8, 2004.
SAT-Based Optimization with Don’t-Cares Revisited
Scalable and Scalably-Verifiable Sequential Synthesis
Sungho Kang Yonsei University
Timing Optimization.
Chapter 11 Limitations of Algorithm Power
CS137: Electronic Design Automation
Achieving Design Closure Through Delay Relaxation Parameter
Presentation transcript:

ICCAD Nov-2000 Timing Driven Gate Duplication: Complexity Issues and Algorithms Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh Embedded & Reconfigurable System Design ER-Group UCLA Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh Embedded & Reconfigurable System Design ER-Group UCLA

ICCAD Nov-2000MotivationMotivation Need for new methodologies of delay improvement in the light of the stringent timing constraint that designers have Gate duplication has been studied primarily for cut-set minimization. Applicability of this method for improving delay has not been studied by the research community Need for new methodologies of delay improvement in the light of the stringent timing constraint that designers have Gate duplication has been studied primarily for cut-set minimization. Applicability of this method for improving delay has not been studied by the research community

ICCAD Nov-2000 Load Dependent Delay Model (LDDM)  i  i  j  j i j ii jj  (i) =  i +  i * COUT wire-delays are assumed to be zero

ICCAD Nov-2000 Gate Duplication for Delay Improvement A B C D E r = 2  = 5 r = -14  = 1  = 1  = 0.1 r = r = Input pin required time = required time at O/P - gate delay C D = 15 C E = 0.1

ICCAD Nov-2000 Gate Duplication for Delay Improvement r = -9 B C E r = 2  = 5 r = D D’ A  = 1  = 1  = 0.1 C D = 10 C D’ = 5 C E = 0.2

ICCAD Nov-2000 Complexity Issues Theorem: Global Gate Duplication is NP-Complete in LDDM MONO3SAT gets transformed to an instance of the global problem Theorem: Local Gate Duplication is NP- Complete PARTITION problem gets transformed to an instance of the local problem Theorem: Global Gate Duplication is NP-Complete in LDDM MONO3SAT gets transformed to an instance of the global problem Theorem: Local Gate Duplication is NP- Complete PARTITION problem gets transformed to an instance of the local problem

ICCAD Nov-2000 Complexity Issues (Comparison with Buffer Insertion) Local Buffer Insertion Problem: Polynomially Solvable if the net topology is fixed. Global Buffer Insertion Problem: Polynomially solvable if the delay model has same pin to pin parameters Situations in which buffer insertion is polynomially solvable, Gate Duplication becomes NP-Complete Local Buffer Insertion Problem: Polynomially Solvable if the net topology is fixed. Global Buffer Insertion Problem: Polynomially solvable if the delay model has same pin to pin parameters Situations in which buffer insertion is polynomially solvable, Gate Duplication becomes NP-Complete

ICCAD Nov-2000 Algorithm for Gate Duplication Based on the structure of dynamic programming Applies duplication to all the gates in the circuit. Hence works in the pro- active mode Assumption: The circuit has only single output combinational gates. Based on the structure of dynamic programming Applies duplication to all the gates in the circuit. Hence works in the pro- active mode Assumption: The circuit has only single output combinational gates.

ICCAD Nov-2000 Algorithm for Gate Duplication Stage1: Traverse the network from POs to PIs in the topological order evaluating tuples at every step Stage2: Now traverse the network from PI to PO in topological order deciding the gates to be duplicated Stage3: Traverse the network from PO to PI physically duplicating the gates Stage1: Traverse the network from POs to PIs in the topological order evaluating tuples at every step Stage2: Now traverse the network from PI to PO in topological order deciding the gates to be duplicated Stage3: Traverse the network from PO to PI physically duplicating the gates

ICCAD Nov-2000 Stage 1: Need to find the best duplication strategy of the fanouts such that the input pin required time is maximized g i tup(i,g).dup.r_small tup(i,g).dup.r_large g g’ tup(i,g).nodup i’ i

ICCAD Nov-2000 Stage 1: Need to find the best duplication strategy of the fanouts and the best fanout partitioning between g and g’ such that the input pin required time is maximized g i tup(i,g).dup.r_small tup(i,g).dup.r_large g g’ tup(i,g).nodup i’ i

ICCAD Nov-2000 Stage 1: NODUP: Sort the fanouts and duplicate in that order. (total n+1 duplication strategies) RESULT: This Algorithm is optimal g g

ICCAD Nov-2000 Stage 1: DUP: g g’ g

ICCAD Nov Stage 2: Stage2: Forward traversal in topo sorted order

ICCAD Nov-2000 Stage 3: Stage 3: Traverse the circuit backwards from PO to PI, physically duplicating the gates

ICCAD Nov-2000 Experimental Results The circuit was first optimized using script.rugged of SIS followed by speed_up Results obtained in two categories, one with minimum delay technology mapping map -n 1, other with minimum delay technology mapping with fanout optimization map -n 1 - AFG The circuit was first optimized using script.rugged of SIS followed by speed_up Results obtained in two categories, one with minimum delay technology mapping map -n 1, other with minimum delay technology mapping with fanout optimization map -n 1 - AFG

ICCAD Nov-2000 Experimental Results (map -n 1)

ICCAD Nov-2000 Experimental Results (map -n 1 - AFG)

ICCAD Nov-2000ConclusionConclusion We presented an algorithm for gate duplication and showed it’s effectiveness in reducing circuit delay, both with and without buffer insertion We proved the local problem NP- Complete The future work would include the extension of this algorithm in a layout driven framework. We presented an algorithm for gate duplication and showed it’s effectiveness in reducing circuit delay, both with and without buffer insertion We proved the local problem NP- Complete The future work would include the extension of this algorithm in a layout driven framework.

ICCAD Nov-2000 Timing Driven Gate Duplication: Complexity Issues and Algorithms Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh Embedded & Reconfigurable System Design ER-Group UCLA Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh Embedded & Reconfigurable System Design ER-Group UCLA