Download presentation
Presentation is loading. Please wait.
1
NTHU-CS 1 Performance-Optimal Clustering with Retiming for Sequential Circuits Tzu-Chieh Tien and Youn-Long Lin Department of Computer Science National Tsing Hua University Hsin-Chu, Taiwan, R.O.C.
2
NTHU-CS 2 Outline Introduction Previous Work Proposed Approach Experimental Results Conclusion and Future Research
3
NTHU-CS 3 Retiming critical path delay = 8 retiming critical path delay = 7 35 2 1 3 52 1
4
NTHU-CS 4 Performance-Driven Clustering Minimize clock period under cluster- size constraint 352 1
5
NTHU-CS 5 352 1 Combining Clustering and Retiming critical path delay = 7critical path delay = 8 inter-cluster delay = 2 clustering w/o retiming consideration clustering w/ retiming consideration 35 2 1 3 52 1
6
NTHU-CS 6 Problem Definition Given a sequential circuit G, a target clock period c, and an area-bound number M Find a clustered/retimed/node-replicated circuit G r clock period less than or equal to c each cluster is of size M or less
7
NTHU-CS 7 Previous Work P. Pan, A. K. Karandikar, and C. L. Liu, “Optimal Clock Period Clustering for Sequential Circuits with Retiming,” IEEE T- CAD, June 1998. Optimal under the unit gate delay model Near-optimal for the general gate delay model J. Cong, H. Li, and C. Wu, “Simultaneous Circuit Partitioning/Clustering with Retiming for Performance Optimization,” DAC’99. 100X more efficient but still near-optimal
8
NTHU-CS 8 This Work Optimal for the general gate delay model More (2X) efficient than Pan’s approach
9
NTHU-CS 9 Pan’s Approach Label each node v an l -value, l(v) Find a clustered-retimed circuit such that all PO’s l -values less than or equal to c Retiming solution Resulting clock period less than c + max. gate delay
10
NTHU-CS 10 Pan’s l -value of a Node Total w 1 edge weight of the longest path from PI’s to the node w 1 weight of edge e from u to v: w 1 (e) = - c * w(e) + d(v) w(e): number of FF’s along e w 1 (e) 2 - 1 3 0 l (v) 0 2 1 4 4 < 6 target c = 6 253
11
NTHU-CS 11 Pan’s l -value Labeling Traveling the whole circuit for updating l -values until no more updating in any node Time complexity
12
NTHU-CS 12 Our Approach Modified l -value definition Optimal for general delay model Based on W.-J. Chen, “A Study on the Relationship Between Retiming and Loop Folding,” Master thesis, National Tsing-Hua Univ., Taiwan, R.O.C., Aug. 1994. FIFO to aid circuit traveling during labeling Improve run time Time complexity
13
NTHU-CS 13 Modified l-value Labeling If an FF’s position is occupied by a gate v, detected by l (v) 0 2 1 target c = 6 5 8 8 > 6 253
14
NTHU-CS 14 Example (target c = 7, inter-cluster delay = 2) 52 l (v) 3 1 3 10 12 1 l (v) 3 1 12 7 35 352 1 1 33 135 3 3152 12 9 3 7 5
15
NTHU-CS 15 Example (Cont’) (target c = 7, inter-cluster delay = 2) 3 35 1 52 clusteringconnecting & retimingmerging 352 1 3 52 1 352 1 35
16
NTHU-CS 16 Example (target c = 6, inter-cluster delay = 2) 52 l (v) 3 1 3 10 11 1 l (v) 3 1 11 7 35 352 1 1 33 135 3 3152 11 9 3 7 5
17
NTHU-CS 17 Example of Pan’s Approach (target c = 6, inter-cluster delay = 2) 2 l (v) 3 1 3 10 1 l (v) 3 1 8 6 35 352 1 1 33 135 3 3152 8 6 3 6 5
18
NTHU-CS 18 Example of Pan’s (Cont’) (target c = 6, inter-cluster delay = 2) 3 35 1 2 clusteringconnecting & retimingmerging 352 1 35 2 1 352 1 3
19
NTHU-CS 19 Experimental Results 26 ISCAS-89 Benchmark Circuits Pan’s approach produces suboptimal results for 11 circuits Our approach produces optimal result for every circuit Our CPU time consumption is 50% of Pan’s
20
NTHU-CS 20 Conclusion and Future Research First exact algorithm for performance- optimal clustering with retiming under general gate delay model Twice as fast as Pan’s near-optimal heuristic Future research is to improve run time efficiency
21
NTHU-CS 21
22
NTHU-CS 22
23
NTHU-CS 23
24
NTHU-CS 24 Experimental Results
25
NTHU-CS 25 Experimental Results (Cont’)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.