Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of Electrical and Computer Engineering University of Maryland College Park Vishal Khandelwal and Ankur Srivastava Department of Electrical and Computer Engineering University of Maryland College Park
2 Introduction Process variations cause significant spread in design performance in sub 90nm technologies Impact yield and reliability It is necessary to explicitly consider the impact of process variations on design parameters Several statistical analysis and optimization techniques have been proposed to improve timing/power yields
3 Handling Process Variations Statistical Gate Sizing Statistical Buffer Insertion Process Variations Design-Time Optimization Post-Fabrication Tunability Post-Silicon Tunable Clock-Tree Buffers Adaptive Body-Biasing [Davoodi, DAC’06] [Sapatnekar, DAC’05] [Zhou, ICCAD’05] [He, ISPD’06] [Davoodi, ICCD’05] [Wong, ICCAD’05] [Khandelwal, ICCAD’03] [Chen, ICCAD’05] [Mahoney, ISSC’05] [Takahashi, 2003] [Tam, JSSC’00] [Kim, ISLPED’03] [Orshansky, ICCAD’06]
4 Traditional Gate Sizing Minimize Area, Power, … Gate size: s i Minimize area, or power Subject to: meeting a delay constraint at the output size constraints [Fishburn, Dunlop 1985] [Sapatnekar,1993] titi i tjtj didi n 0 T cons
5 Traditional Gate Sizing i j Posynomial Gate Delay Expression [Fishburn, Dunlop 1985] [Sapatnekar,1993] Minimize Area, Power, … Convex Formulation
6 Effects of Process Variations Delay of each gate becomes a random variable Statistical Gate Sizing T ox n + n + L eff Set of random variables with arbitrary distributions [Davoodi, DAC’06] [Sapatnekar, DAC’05] [Zhou, ICCAD’05]
7 Post-Silicon Tunable (PST) Clock Tree Buffers FF 1 FF 2 FF 3 FF 4 FF 5 FF 6 FF 7 FF 8 B1 B2 B4 B3 B5 B6B7 Tunable clock buffers can introduce extra slack into critical paths after fabrication Design Overhead Area, Clock-Tree Power [Chen, ICCAD’05] [Mahoney, ISSC’05] [Takahashi, 2003] [Tam, JSSC’00]
8 Post-Silicon Tunable Clock Tree Buffers Let D ij be the delay of the longest path between flip-flops i and j Consider Flip-Flops 2 and 7: Tune buffers to change clock-skew FF 1 FF 2 FF 3 FF 4 FF 5 FF 6 FF 7 FF 8 B1 B2 B4 B3 B5 B6B7
9 Optimization Objective: Tunability Cost Metric to capture the overhead due to PST buffers in the design Silicon Area Clock-Tree Power
10 Optimization Objective: Binning Yield Loss [V. Zolotov, DAC’04] Convex loss function Q(.) Loss T cons Delay (t) (BYL) [D. Blaauw, GLSVLSI’05]
11 Problem Statement Given a sequential design with a synthesized PST clock- tree (known buffer locations), perform simultaneous Statistical gate sizing PST buffer tuning range determination Such that Binning Yield Loss and Tunability Cost is minimized F FF1FF1F FF2FF2F FF3FF3F FF4FF4F FF5FF5F FF6FF6F FF7FF7F FF8FF8 B1 B2 B4 B3 B5 B6B7 i di n 0 T cons
12 Two-Stage Formulation Gate Size:, Tuning Buffer Range: 1.Deterministic constraints: meeting timing requirement assuming no variations 2.Capturing variability in objective First Stage
13 Second Stage Formulation T cons Loss Q Second Stage Given a solution to the first stage problem and a variability sample: No Statistical Timing Analysis scheme exists to estimate the timing distribution of a circuit given gate sizes and tuning buffer ranges Each sample of variability requires different amount of tuning for maximum timing yield
14 THEOREM:The proposed two-stage stochastic programming formulation is convex PROOF:Detailed proof omitted for brevity Convex Problem First stage constraints are convex First stage objective is convex if BYL(x,r) is convex From second stage formulation one can show that is convex Need to show each sample is convex
15 Kelley’s Cutting Plane Algorithm Iteratively solve first and second stage formulation Given a solution to the first stage formulation, we use method of finite differences to generate a lower bound to BYL from the second stage formulation Add this constraint to the first stage formulation at each iteration
16 Shortest-Path Constraints Inherently non-convex in nature Approximate gate delay using a linear approximation (lower bound) The two-stage stochastic programming formulation can be modified to consider shortest path constraints
17 Experimental Results Implemented the framework in SIS using MOSEK to solve the convex formulation Used CAPO to place netlist to get spatially correlated gate delays Assumed 15% V th variation in 90nm technology node [Predictive Technology Model] Synthesized the PST clock-tree using the technique proposed in [Chen et. al, ICCAD’05] xixi xixi yiyi yiyi i i xjxj xjxj yjyj yjyj j j
18 Experimental Results Experimental Comparison – ISCAS benchmarks [Chen]: Nominal gate sizing PST clock-tree generation using [Chen et. al, ICCAD’05] Sensitivity: Retain PST clock-tree location and range Sensitivity-driven statistical gate sizing algorithm –Size the gate with maximum yield gain greedily (iterative) –Similar in spirit to [Zhou ICCAD’05, Zolotov DAC’05] Stochastic: Retain PST clock-tree buffer locations Proposed simultaneous gate sizing and post-silicon tunability allocation algorithm
19 BYL, Area and Tuning Range Comparison
20 Timing Yield Loss Comparison [Chen]SensitivityStochastic Average Timing Yield Loss
21 Runtime Comparison Techniques344s382s400s526s635 Sensitivity Stochastic Number of Iterations
22 Summary and Future Work Variability-driven framework for simultaneous gate sizing and post-silicon tunability allocation to minimize binning- yield loss and tunability cost Efficient stochastic programming based scheme to solve the formulation No assumptions about parameter distribution or their correlations Need to develop a statistical timing analysis scheme that can consider the effect of post-silicon tunability
23 Thank You!