Download presentation
Presentation is loading. Please wait.
1
Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of California, Los Angeles Sponsors: NSF CAREER, UC MICRO (Fujitsu, Intel and Mindspeed), and IBM Faculty Partner Award.
2
Motivation Increasing interconnect power 35% cells are buffers at 65nm technology [Saxena, TCAD 04] Previous work Power-optimal single V dd buffer insertion [Lillis, JSSC 96] Delay-optimal buffered tree generation [Cong, DAC 00; Alpert, TCAD 02] No existing algorithms consider dual-V dd for buffer insertion or buffered tree generation
3
Major Contributions First in-depth study of dual V dd buffer insertion and buffered tree generation Large power saving over single V dd buffering Efficient algorithms for power optimality 17x faster than [Lillis, JSSC 96] when single Vdd is considered
4
Outline Dual V dd buffer insertion and sizing (DVB) Problem formulation Sampling for speedup Experimental results Dual V dd buffered tree generation (D-Tree) Problem formulation Improved augmented orthogonal search tree Experimental results
5
Delay, Slew and Power Modeling Elmore delay Wire:, buffer: Bakoglu ’ s slew metric (ln 9 ∙Elmore) Power = energy per switch Wire: Lumped buffer dynamic/short-circuit power Can be easily extended to leakage power Low V dd (V L ) reduces leakage Need to assume of clock rate and switching activity
6
Introducing Dual V dd Buffering Achieves power saving since power α V dd 2 Suffer no loss of delay optimality V L => V H requires level converter (LC) Restore voltage level and reduce leakage Ext-CVS for logic [Srivastava, ISLPED 04] LC delay and power overhead amortized V VLVL VHVH V I Reduced noise margin Leakage VHVH V I
7
Key Observation in Dual V dd Buffering Disallowing V L => V H will not affect optimality Optimality empirically illustrated (@ 65nm): (a) has LC and V H drives C l, power (a) > (b) Delay (b) > (a) only if C l > 0.5pF (~ 9mm wire) VHVH VLVL
8
DVB Formulation Dual V dd Buffer Insertion (DVB) Given interconnect tree Find buffer placement, V dd assignment for buffers, sizes of buffers V H buffers driving V L buffers within the tree Level converters at V H sinks driven by V L buffers Minimize power subject to Arrival time requirement at the source (RAT) Slew rate constraint at buffer inputs and sinks
9
DVB Algorithm Based on [Lillis, JSSC 96] Dynamic programming with partial solution (option) pruning Options must now record downstream V dd levels for buffering To prevent V L => V H, which removes unnecessary search on solution space Still quite slow for large nets Challenge Considering power causes super-linear growth in the number of options (w.r.t. tree size) Dual V dd buffers => 2x options at each node
10
Speed-up Technique Approximate by power-delay sampling Sampling under each distinct cap value Uniformly pick options from the entire RAT — power trade-off curve
11
Experimental Settings for DVB Testcase: randomly generated Steiner trees 20 to 800 terminals in 1cm x 1cm routing area Buffer sizes: 16x, 32x, 64x Sampling grid set to 20x20 Comparison Exact power-optimal algorithm (PB) [Lillis, JSSC 96] Our algorithm with single (SVB) and dual (DVB) V dd buffers
12
Sampling Preserves Optimality Sampling has little impact on optimality SVB follows PB closely Still optimal delay, 1.7% larger power over PB
13
Dual V dd Reduces Power Dual Vdd shifts power-delay curve to the left
14
Experimental Results for DVB DVB saves 23% power over SVB More power saving in larger nets Power saving becomes larger w/delay slack e.g. relax delay 5%, saving becomes 26% TestcasePower (at optimal RAT) (fJ) Net# nodes# sinksSVBDVB S53751991869913808 [-26%] S65152992344317239 [-26%] S77844993355223804 [-29%] S810546993835125799 [-33%] S911887994022826646 [-34%] avg[-23%]
15
Runtime SVB scales a lot better for larger testcases Achieved 17x speedup over PB [Lillis, JSSC 96] DVB takes ~2.5x more runtime than SVB TestcasesRuntime (s) net# nodes# sinksPBSVBDVB S537519971986212 S65152992121139371 S778449933419393635 S81054699> 1 day5981072 S91188799> 1 day8531859 avg1x1/17x1/7x
16
Outline Dual-V dd Buffer insertion and sizing (DVB) Problem formulation “ Sampling ” speed-up technique Experimental results Dual-V dd buffered tree generation (D-Tree) Problem formulation Improved augmented orthogonal search tree Experimental results
17
D-Tree Formulation Dual V dd Buffered Tree (D-Tree) Given locations of terminals, buffer stations and blockages Find a rectilinear Steiner tree (RST), buffer placement/size/V dd assignment V H buffers driving V L buffers only Level converters at V H sinks driven by V L buffers Minimize power Arrival time requirement at the source (RAT) Slew rate constraint at buffer inputs and sinks D-Tree is NP-Hard Finding minimum RST alone is NP-Complete
18
Buffered Tree Construction Delay optimization only [Cong, DAC 00] by 1. Build Hanan Graph w/buffer insertion nodes according to locations of buffer stations 2. Path search on the grid by option propagation
19
D-Tree Algorithm Overview Challenges Growth of option is exponential An artifact of D-Tree ’ s NP-hardness Considering power worsens option growth Solution: sampling + efficient prune tree
20
Prune Tree in [Lillis, JSSC 96] Option inserted in sorted capacitance Never need to clear options out from the tree If new option is checked against the tree Automatically avoid redundant option in tree e.g. Ф new = (c = 20, p = 100, q = 600) Not applicable to D-Tree problem Order of new options is not known a priori c=20, q=600 c=10, q=500 c=8, q=400c=15, q=550 c=12, q=520c=7, q=380 P=100
21
Our Improvement on Prune Tree Indexing w/capacitance results in fewer trees # capacitance value < # power value Efficient “ tree cleaning ” Enables out-of-order option insertion Guarantee no redundancy in tree
22
Tree Cleaning To add an option Ф new in O(|c| · log(|T|)) time 1. Check whether Ф new is dominated by any option in the data-structure 2. If not, remove options in the tree dominated by Ф new in two downward tree traversals e.g. Ф new = (c = 10, p = 70, q = 410, … )
23
Experimental Settings for D-Tree Random testcases All based on a random floorplan of 1cm x 1cm Blockages ~ 30%, buffer stations ~1mm apart Comparison Delay-optimal tree (RMP) [Cong, DAC 00] Ours with single (S-Tree) and dual (D-Tree) V dd Buffer
24
Experimental Results for D-Tree Significant power saving over RMP S-Tree: 7%, D-Tree: 18% Larger saving for large testcases (e.g. T4) Handles up to 6-sink nets (T5 takes 23 mins) Similar capability compared with delay-optimal approaches [Cong, DAC 00; Chen, ASP-DAC 02] TestcasesPower @ optimal RAT (pJ) Net# nodes# sinksRMPS-TreeD-Tree T313743.93.5 [-10%]2.9 [-23%] T426154.94.4 [-13%]3.1 [-37%] T523564.23.8 [-10%]3.4 [-18%] avg-7%-18%
25
Conclusion Formulated dual V dd buffer insertion/tree generation without level converters Proposed 2 speedup techniques “ Sampling ” w/negligible loss of optimality “ Improved prune tree ” for solution pruning Applied to single-Vdd buffer insertion, 17x faster than existing work Large power saving over single V dd buffering 23% in buffer insertion: dual V dd vs single V dd 18% in buffered tree: dual V dd vs delay optimal
26
Future Work Speed up tree construction Slack allocation for more power reduction Path-based buffer insertion [Sze, DAC 05] Allocate slack along one interconnect path Consider single V dd buffers only Chip level FPGA dual V dd assignment [Lin, DAC 05] Fixed buffer location, assign V dd levels Consider Multiple critical path Solved as a linear programming problem
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.