© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department, UCLA 2 Altera Corporation, San Jose
© 2006 Altera Corporation 2 Outline Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results Conclusions and Discussions
© 2006 Altera Corporation 3 Background Process variations more and more significant in nanometer technology affect timing and power in both ASICs and FPGAs Delay with variations Variation sources Threshold voltage (V th ) and effective channel length (L eff ) Independent Gaussians for global/local variations First order canonical form Related work FPGA device and architecture evaluation with process variations [Wong et al, ICCAD’05] SSTA [Chang et al, ICCAD ’ 03] [Viseswariah et al, DAC ’ 04] Statistical criticality analysis [Viseswariah et al, DAC ’ 04] [Li et al, ICCAD ’ 05] [Xiong et al, TAU ’ 06] Statistical gate sizing for ASICs [Guthaus et al, ICCAD ’ 05] [Sinha et al, ICCAD ’ 05]
© 2006 Altera Corporation 4 Motivation STA is inaccurate with variation Slack ignores near criticality Near-critical paths may be statistically timing critical Deterministic timing-driven placer (e.g. T-VPlace in VPR) Based on STA Optimize for static critical path May not optimize timing with variation Stochastic placer is needed with variations Same placement for one application across chips
© 2006 Altera Corporation 5 Pre-routing Interconnect Uncertainty vs. Process Variation in Placement Clearly, process variation leads to a more significant delay variance in placement stage Therefore, only consider process variation for placement Existing timing- driven placer Leverages timing slack in STA With interconnect delay estimated May incur uncertainty along with process variation
© 2006 Altera Corporation 6 Outline Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results Conclusions and Discussions
© 2006 Altera Corporation 7 Uniqueness for Timing in FPGAs FPGAs vs. ASICs Similarity Susceptible to process variations Advantages Long switching paths dampen (average out) local variation Binned for speed-grades to isolate global variation Can be programmed repeatedly and differently during timing chip-test Disadvantages Critical paths unknown at test time Same timing model to be applied to unknown applications at unknown clock frequency and varied conditions Guard-banded timing model can be arbitrarily conservative or aggressive
© 2006 Altera Corporation 8 Timing with Guard-banding A guard-band is applied for individual node to model uncertainty in STA A constant guard-banded delay is µ + cσ µ and σ are the nominal delay and standard deviation, respectively c is constant for all circuit elements Guard-band cost (T grd /T norm )-1 T grd : critical path delay in STA w/ guard-banding T norm : critical path delay in STA w/ nominal timing model Pessimistic/optimistic for designs with longer/shorter critical path Actual timing yield analyzed by SSTA
© 2006 Altera Corporation 9 Timing with Speed-binning Test and eliminate local variation by testing multiple similar paths across the test chip Model global variation Gaussians ΔX i as a single ΔG a Speed-binning = Categorizing ΔG a All chips fell into the same bin share the same guard- banded timing model e.g., µ -σ g / µ +σ g / µ +3σ g for fast/medium/slow bin STA for the circuit delay T bin for each bin
© 2006 Altera Corporation 10 Yield Analysis with Speed-binning Yield loss due to ignored local variation Yield loss due to unknown critical paths Timing yield analysis for a bin circuit delay T µ +σ Tg ΔG a +σ Tl ΔR a bin k [G low (k), G up (k) ] cut-off delay γT bin (k) timing yield for bin k is The overall timing yield is
© 2006 Altera Corporation 11 Outline Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results Conclusions and Discussions
© 2006 Altera Corporation 12 Timing-Driven Placement T-VPlace [Marquardt et al, FPGA 2000] Simulated annealing based placement Both wiring and timing are considered in the cost function Wiring cost Timing cost for a connection for a placement solution Overall cost STA is performed at each annealing temperature to update critical path delay and slack
© 2006 Altera Corporation 13 Stochastic Placement ST-VPlace Main differences between ST-VPlace and T-VPlace Estimate delay matrix in canonical form instead of just nominal delay matrix Used in SSTA for statistical timing cost during placement Perform SSTA instead of STA at each temperature in simulated annealing framework Using statistical criticality instead of static criticality in cost function Statistical criticality for an edge/node is the probability that this edge/node is statistically timing critical in SSTA Statistical criticality exponent θ Static criticality is based on slack and the longest path delay in STA
© 2006 Altera Corporation 14 Outline Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results Conclusions and Discussions
© 2006 Altera Corporation 15 Experimental Settings Variation and device setting 10% as 3 sigma for global and local variation in V th and L eff at IRTS 65nm technology node Min-ED device setting V dd =0.9v V th =0.3v [Wong et al, ICCAD ’ 05] Architecture similar to Altera ’ s Stratix TM Island style FPGA architecture cluster size 10 and LUT size 4 60% length-4 and 40% length-8 wire in interconnects 1.2X routing channel width obtained by T-VPlace Yield loss in failed parts per 10K parts (pp10K) Evaluated using MCNC and QUIP designs
© 2006 Altera Corporation 16 Cost Function Tuning Perform ST-VPlace and SSTA to obtain mean delay and standard deviation over all designs for each statistical criticality exponent θ θ=0.3 leads to the smallest mean and deviation the highest timing yield
© 2006 Altera Corporation 17 T-VPlace vs. ST-VPlace Some correlation between mean delay and deviation ST-VPlace achieves smaller mean delay for all designs smaller variance for most designs a higher timing yield
© 2006 Altera Corporation 18 Statistical Criticality vs. Static Criticality Statistic criticality vs. static criticality Statistical criticality does not increase monotonically with static one Statistical criticality may vary significantly with similar static one ST-VPlace considers statistical criticality explicitly Optimizes near-critical paths under variations Leads to a higher timing yield
© 2006 Altera Corporation 19 Impact on Path-length Distribution Path-length distribution in ST-VPlace is almost on top of that in T-VPlace ST-VPlace reduces top 10% near-critical paths from 1.3% to 0.8% Although has a larger nominal delay But has a smaller mean and variance a higher timing yield
© 2006 Altera Corporation 20 Effect of Guard-banding Variation (3sigma) global 5% local 5% 0% 20% 40% 60% 80% 100% 120% Guard-band factor Guard-band cost Yield loss (pp10k) guard-band cost T-Vplace yield lost STV-Place yield lost Variation (3sigma) global 20% local 20% 0% 20% 40% 60% 80% 100% 120% Guard-band factor Guard-band cost Yield loss (pp10k) guard-band cost T-Vplace yield lost ST-VPlace yield lost ST-VPlace obtains a higher timing yield under varied variations and guard-band factors Larger gain with smaller variation
© 2006 Altera Corporation 21 Effect of Guard-banding Variation (3sigma) global 5% local 5% 0% 20% 40% 60% 80% 100% 120% Guard-band factor Guard-band cost Yield loss (pp10k) guard-band cost T-Vplace yield lost STV-Place yield lost Variation (3sigma) global 20% local 20% 0% 20% 40% 60% 80% 100% 120% Guard-band factor Guard-band cost Yield loss (pp10k) guard-band cost T-Vplace yield lost ST-VPlace yield lost ST-VPlace obtains a higher timing yield under varied variations and guard-band factors Larger gain with smaller variation Similar gain with varied local variation when no global variation is considered Yeild loss reduced by 3.4X with 3 sigma guard-banding under 10%/10% variations
© 2006 Altera Corporation 22 Effect of Speed-binning Fast/Medium/Slow = 40%/30%/29.999% Discard the slowest 0.001% (0.1pp10K) chips T bin may be relaxed by γ for a higher timing yield Yield loss due to local variation and unknown critical paths ST-VPlace consistently achieves higher timing yield Yield loss is reduced by 25X with γ=5%
© 2006 Altera Corporation 23 Conclusions and Discussions Conclusions Quantified the effects of guard-banding and speed- binning with variations Developed a novel stochastic placer Evaluated with MCNC and QUIP designs, reduced yield loss by 3.4X with guard-banding 25X with speed-binning Ongoing and future work Extend timing models with spatial correlated variations Develop stochastic physical synthesis algorithms, e.g., clustering, routing, re-timing