Timing Model Reduction for Hierarchical Timing Analysis Shuo Zhou Synopsys November 7, 2006.

Slides:



Advertisements
Similar presentations
Lecture 24 MAS 714 Hartmut Klauck
Advertisements

Poly-Logarithmic Approximation for EDP with Congestion 2
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Bipartite Matching, Extremal Problems, Matrix Tree Theorem.
1 Discrete Structures & Algorithms Graphs and Trees: III EECE 320.
1 Maximum flow sender receiver Capacity constraint Lecture 6: Jan 25.
Label Placement and graph drawing Imo Lieberwerth.
Tutorial 6 of CSCI2110 Bipartite Matching Tutor: Zhou Hong ( 周宏 )
Global Flow Optimization (GFO) in Automatic Logic Design “ TCAD91 ” by C. Leonard Berman & Louise H. Trevillyan CAD Group Meeting Prepared by Ray Cheung.
Augmenting path algorithm Two theorems to recall: Theorem (Berge). A matching M in a graph G is a maximum matching in G iff G has no M-augmenting.
Management Science 461 Lecture 2b – Shortest Paths September 16, 2008.
Da Yan, Zhou Zhao and Wilfred Ng The Hong Kong University of Science and Technology.
CSL758 Instructors: Naveen Garg Kavitha Telikepalli Scribe: Manish Singh Vaibhav Rastogi February 7 & 11, 2008.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
HCS Clustering Algorithm
Network Optimization Models: Maximum Flow Problems In this handout: The problem statement Solving by linear programming Augmenting path algorithm.
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
1 Bipartite Matching Lecture 3: Jan Bipartite Matching A graph is bipartite if its vertex set can be partitioned into two subsets A and B so that.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Exact Two-level Minimization Quine-McCluskey Procedure.
ECE 667 Synthesis and Verification of Digital Systems
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.
PLACEMENT USING DON’T CARE WIRES Fan Mo Don’t Care Wire Group: P.Chong, Y-J.Jiang, S.Singha and R.K.Brayton.
Gene expression & Clustering (Chapter 10)
CAFE router: A Fast Connectivity Aware Multiple Nets Routing Algorithm for Routing Grid with Obstacles Y. Kohira and A. Takahashi School of Computer Science.
Hungarian Algorithm Vida Movahedi Elderlab, York University June 2007.
Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-Path Steiner Graph Chung-Kuan Cheng, Peng Du, Andrew B. Kahng, and Shih-Hung Weng UC San.
A NEW ECO TECHNOLOGY FOR FUNCTIONAL CHANGES AND REMOVING TIMING VIOLATIONS Jui-Hung Hung, Yao-Kai Yeh,Yung-Sheng Tseng and Tsai-Ming Hsieh Dept. of Information.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
Two-Level Simplification Approaches Algebraic Simplification: - algorithm/systematic procedure is not always possible - No method for knowing when the.
Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
1 Efficient Obstacle-Avoiding Rectilinear Steiner Tree Construction Chung-Wei Lin, Szu-Yu Chen, Chi-Feng Li, Yao-Wen Chang, Chia-Lin Yang National Taiwan.
CAS 721 Course Project Minimum Weighted Clique Cover of Test Set By Wei He ( )
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
ELEC692 VLSI Signal Processing Architecture Lecture 3
Pipelining and Retiming
Speeding Up Enumeration Algorithms with Amortized Analysis Takeaki Uno (National Institute of Informatics, JAPAN)
Mihir Choudhury, Kartik Mohanram (ICCAD’10 best paper nominee) Presentor: ABert Liu.
Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.
and 6.855J March 6, 2003 Maximum Flows 2. 2 Network Reliability u Communication Network u What is the maximum number of arc disjoint paths from.
Ramakrishna Lecture#2 CAD for VLSI Ramakrishna
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Two Finger Caging of Concave Polygon Peam Pipattanasomporn Advisor: Attawith Sudsang.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Theory of Computing Lecture 12 MAS 714 Hartmut Klauck.
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network You-Chiun Wang, Chun-Chi Hu, and Yu-Chee Tseng IEEE Transactions on Mobile Computing.
ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
All Pairs Shortest Path Algorithms Aditya Sehgal Amlan Bhattacharya.
Prof. Yu-Chee Tseng Department of Computer Science
The NP class. NP-completeness
Haim Kaplan and Uri Zwick
Chapter 5. Optimal Matchings
SAT-Based Area Recovery in Technology Mapping
Clustering.
EE5900 Advanced Embedded System For Smart Infrastructure
Clustering.
Presentation transcript:

Timing Model Reduction for Hierarchical Timing Analysis Shuo Zhou Synopsys November 7, 2006

2 Outline Static Timing Analysis in Design Flow Hierarchical timing analysis Proposed Techniques –Iterative timing model reduction algorithm based on a biclique-star replacement technique. Experimental Results Conclusions

3 Static Timing Analysis in Design Flow Static Timer is integrated in each stage. Need efficient static timer. Design Flow Floorplaning Synthesis Placement &Routing Static Timing Analysis

4 Hierarchical Timing Analysis Hierarchical timing analysis is essential for hierarchical design. Consider circuits inside the blocks to be fixed. Complexity O(n): n is #edges in timing models. gates Partition Design into Blocks Characterize Blocks into Timing Models gates

5 Problem Statement Timing model minimization for hierarchical timing analysis: –Given a hierarchical block, construct an abstract timing model with minimal number of edges that covers the longest and shortest path delays of each pair of input and output in the block.

6 Previous Works Transform timing graph [Visweswariah ICCAD’99, Moon DAC’02]. –Perform serial/parallel edge merging. Represent delay matrix with minimal number of edges. –Optimal realization of a distance matrix [Hakimi Quart. Appl. Math. 22 (1964), Chung –Biclique-star replacement for bicliques with unit edge delay [Feder Symp. on Theoretical Aspects of Computer Science (2003)].

7 Terminologies: Bipartite Timing Model G = {B, D, E} –Input set B, output set D, and edge set E –Longest and shortest delays.

Bipartite timing model Timing graph path: 1->4->5->7->8->10

9 Delay matrix Element on row i col j is delay from input i to output j,  for disconnected input i and output j. Row i implies input delay vector = {d i,j | d i,j from input i.} Bipartite timing model 4 I1I1 I2I2 Outputs O4O4 I3I3 O5O5 O6O6 378  67  5 Delay matrix Inputs

10 Star G s = (B s, D s, s, E s ) –B s input set, D s output set, center vertex s. –Edges (i,s) and (s,j) s 5 6 Star

11 Biclique-Star Replacement Basic idea: match various input delay vectors to a pattern and cover each input delay vector by one edge plus the pattern.

Biclique #edge = s star #edge = 6 Replace d ij = d is +d sj I1I1 I2I2 Outputs O4O4 I3I3 O5O5 O6O Outputs O4O4 2 + O5O5 O6O Pattern = Input vectors

13 Bipartite Timing Model Reduction Biclique Search Reduction Ratio Evaluation ratio = #edges_covered/(r+c) Biclique-star Replacement Reduction > 1 Re-evaluation Repeat

14 Delay Vector Subtraction Input delay vector subtraction Sub(I a, I b ) –Distance vector V(I a,I b ) = {  j Ia,Ib =d a,j – d b,j | j  [1..c]} Input vectors I a, I b share a pattern if all  j Ia,Ib are equal. V (I 2,I 1 )= Sub(I 2,I 1 ) 111 O4O4 O5O5 O6O6 I1I1 I2I

15 Biclique Expansion for Replacements Choose an input delay vector as the pattern vector. Expand the biclique of the pattern vector by covering as many as possible input vectors. Replace the biclique by a star. Biclique Expansion (G, I a, G c ) I.Add edges (a,j) to biclique G c ; II.For each input vector I i 1.Vector subtraction Sub(I i,I a ); 2.If all  j Ii,Ia =  0 Ii,Ia add edges (i, j) to G c. Biclique-star Replacement (G c,I a,G s ) I.Add inputs, outputs, center vertex s, and edges (i,s), (s,j) to G s II.d a,s = 0, d s,j = I a,j ; III.For each edge (i,s) in G s 1.d i,s =  0 Ii,Ia ;

#edge = s #edge = 8 I1I1  0 I2,I Replace 223 I3I I1I1 V(I 3,I 1 ) = Sub(I 3,I 1 ) O4O4 O5O5 O6O6 step 2 V (I 2,I 1 ) = Sub(I 2,I 1 ) 111 O4O4 O5O5 O6O6 I1I1 I2I step 1  0 I2,I1

17 Don’t Care Edges Edge (i,j) is a don’t care edge in a biclique star replacement if path delay d i,s + d s,j < d i,j. Replace Biclique Don’t Care Edge Star s

18 Biclique Expansion with Don’t Cares Choice: try each  in distance vector as d i,s. For d 3,s =  –d i,j is covered if d i,s + d s,j = d i,j, i.e.,  j = . –d i,j is a don’t care edge if  j > . –Output j has to be removed if  j < .

19 #edges covered increases by I3I I1I1 V(I 3,I 1 ) = O4O4 O5O5 O6O s #edges covered decreases by s

20 Biclique Expansion and Replacement with Don’t Cares Biclique Expansion with Don’t Cares (G, I a, G c ) I.Add edges (a,j) to G c ; II.For each input vector I i 1.Vector subtraction Sub(I i,I p ); 2.For each  j in the distance vector For each  k in distance vector if  k =  j #covered++; else if  k <  j #removed +=edges to output k; 3.If maximum (#covered - #removed of  j )> 1; For each  k in distance vector if  k   j Add edge (i,k) to G c ; else remove output k and edges to k. Replacement with Don’t Cares (G c, I a, G s ) I.Add inputs, outputs, center vertex s, and edges to G s II.d a,s = 0, d s,j = I a,j ; III.For each edge (i,s) in G s 1.d i,s = min(  Ii,Ia ).

21 Replace #edge = 9 #edge = 7 Don’t Care Edge s I1I1 Min  7 V (I 2,I 1 )= Sub(I 2,I 1 ) 111 O4O4 O5O5 O6O6 I1I1 I2I step I3I I1I1 V(I 3,I 1 ) = Sub(I 3,I 1 ) O4O4 O5O5 O6O6 step 2

22 Bipartite Timing Model Reduction Biclique Search Reduction Ratio Evaluation ratio = #edges_covered/(r+c) Biclique-star Replacement Reduction > 1 Re-evaluation Star Graph to Bipartite Graph

23 Split s 1,s 2 Recover Stars s2's2' s1's1' s1s1 s2s bipartite graph s1s s2s star timing model Star Graph to Bipartite Graph Transformation

24 Correctness G: the bipartite timing model before the reduction. G': the timing model after the reduction. Edge delay d i,j of any connected input i and output j in G is covered by the longest path delay d i,j ' from input i to output j in G' after the reduction.

25 Experimental Results Test cases –Block 1: 8499 inputs, outputs, and 138,360 edges –Block 2: 4260 inputs, 7728 outputs and 103,414 edges –E G -- #edges in original timing graph of the block. –E B --#edges in bipartite timing model. –E m --#edges after timing model reduction. Reduction r G = (E G – E m )/ E G. Reduction r B = (E B – E m )/ E B.

26 Block1 E G = 138,360, E B = 262,491 Err_bound (ns) EmEm rGrG rBrB 0249, %5.1% 0.141, %84.1% 1.036, %85.9% , %86.3% , %86.2% |d i,j – d i,j ’| <= Err_bound, where d i,j and d i,j ’ are delays from input i to output j before and after the reduction. Buffer  1 delay = 1.34ns.

27 Block2 E G = 103,414, E B = 465,190 Err_bound (ns) EmEm rGrG rBrB 0397, %14.6% , %89.3% , %93.7% 1.021, %95.4% , %95.6% Buffer  1 delay = 0.74ns.

28 Conclusions We propose a biclique-star replacement technique and develop an iterative timing model reduction algorithm based the proposed technique. By allowing reasonable error bounds, the experimental results show that the proposed algorithm can effectively reduce the number of edges in the timing model.

29 Thanks!

30 References C.W. Moon, H.~Kriplani, and K.~P. Belkhale, “Timing model extraction of hierarchical blocks by graph reduction”, in DAC’02, C. Visweswariah and A.R. Conn, “Formulation of static circuit optimization with reduced size, degeneracy and redundancy by timing graph manipulation”, in ICCAD’99, S. L. Hakimi and S. S. Yau. “Distance matrix of a graph and its realizability.” Quart. Appl. Math. 22 (1964), 305–317. F. Chung, M. Garrett, R. Graham, and D. Shallcross. “Distance realization problems with applications to internet tomography.” T. Feder and A. Meyerson and R. Motwani and L. O' Callaghan and R. Panigrahy, “Representing graph metrics with fewest edges.” in Proc. of Symp. on Theoretical Aspects of Computer Science (2003),