© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department,

Slides:



Advertisements
Similar presentations
Cadence Design Systems, Inc. Why Interconnect Prediction Doesn’t Work.
Advertisements

Recent Challenges. 2 Soft Errors Scaling:  SEU (Single-event upset): −Ionizing radiation corrupts data stored  Cause: −Radioactive impurities in device.
VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects Sarangi et al Prateeksha Satyamoorthy CS
On the Need for Statistical Timing Analysis Farid N. Najm University of Toronto
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 10, 2013 Statistical Static Timing Analysis.
Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of.
0 1 Width-dependent Statistical Leakage Modeling for Random Dopant Induced Threshold Voltage Shift Jie Gu, Sachin Sapatnekar, Chris Kim Department of Electrical.
Post-Placement Voltage Island Generation for Timing-Speculative Circuits Rong Ye†, Feng Yuan†, Zelong Sun†, Wen-Ben Jone§ and Qiang Xu†‡
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
1 A Lithography-friendly Structured ASIC Design Approach By: Salman Goplani* Rajesh Garg # Sunil P Khatri # Mosong Cheng # * National Instruments, Austin,
 Device and architecture co-optimization – Large search space – Need fast yet accurate power and delay estimator for FPGAs  Trace-based power and delay.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 22, 2009 Statistical Static Timing Analysis.
Yan Lin, Fei Li and Lei He EE Department, UCLA
Statistical Crosstalk Aggressor Alignment Aware Interconnect Delay Calculation Supported by NSF & MARCO GSRC Andrew B. Kahng, Bao Liu, Xu Xu UC San Diego.
Design Sensitivities to Variability: Extrapolations and Assessments in Nanometer VLSI Y. Kevin Cao *, Puneet Gupta +, Andrew Kahng +, Dennis Sylvester.
1 Variability Characterization in FPGAs Brendan Hargreaves 10/05/2006.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 14: March 19, 2008 Statistical Static Timing Analysis.
Power Modeling and Architecture Evaluation for FPGA with Novel Circuits for Vdd Programmability Yan Lin, Fei Li and Lei He EE Department, UCLA
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
Fuzzy Simulated Evolution for Power and Performance of VLSI Placement Sadiq M. SaitHabib Youssef Junaid A. KhanAimane El-Maleh Department of Computer Engineering.
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 22: April 11, 2011 Statistical Static Timing Analysis.
Toward Performance-Driven Reduction of the Cost of RET-Based Lithography Control Dennis Sylvester Jie Yang (Univ. of Michigan,
Stochastic Physical Synthesis for FPGAs with Pre-routing Interconnect Uncertainty and Process Variation Yan Lin and Lei He EE Department, UCLA
Jan. 2007VLSI Design '071 Statistical Leakage and Timing Optimization for Submicron Process Variation Yuanlin Lu and Vishwani D. Agrawal ECE Dept. Auburn.
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
Circuit Performance Variability Decomposition Michael Orshansky, Costas Spanos, and Chenming Hu Department of Electrical Engineering and Computer Sciences,
Statistical Gate Delay Calculation with Crosstalk Alignment Consideration Andrew B. Kahng, Bao Liu, Xu Xu UC San Diego
Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.
Trace-Based Framework for Concurrent Development of Process and FPGA Architecture Considering Process Variation and Reliability 1 Lerong Cheng, 1 Yan Lin,
Statistical Critical Path Selection for Timing Validation Kai Yang, Kwang-Ting Cheng, and Li-C Wang Department of Electrical and Computer Engineering University.
ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.
A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.
Power Reduction for FPGA using Multiple Vdd/Vth
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
1 Wire Length Prediction-based Technology Mapping and Fanout Optimization Qinghua Liu Malgorzata Marek-Sadowska VLSI Design Automation Lab UC-Santa Barbara.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
"A probabilistic approach to clock cycle prediction" A probabilistic approach to clock cycle prediction J. Dambre, D. Stroobandt and J. Van Campenhout.
Impact of Interconnect Architecture on VPSAs (Via-Programmed Structured ASICs) Usman Ahmed Guy Lemieux Steve Wilton System-on-Chip Lab University of British.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 22, 2015 Statistical Static Timing Analysis.
STA with Variation 1. 2 Corner Analysis PRCA (Process Corner Analysis):  Takes 1.nominal values of process parameters 2.and a delta for each parameter.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
Variation-Tolerant Circuits: Circuit Solutions and Techniques Jim Tschanz, Keith Bowman, and Vivek De Microprocessor Technology Lab Intel Corporation,
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
Variation. 2 Sources of Variation 1.Process (manufacturing) (physical) variations:  Uncertainty in the parameters of fabricated devices and interconnects.
EE201C : Stochastic Modeling of FinFET LER and Circuits Optimization based on Stochastic Modeling Shaodi Wang
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,
© PSU Variation Aware Placement in FPGAs Suresh Srinivasan and Vijaykrishnan Narayanan Pennsylvania State University, University Park.
Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
Placement and Routing Algorithms. 2 FPGA Placement & Routing.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
University of Michigan Advanced Computer Architecture Lab. 2 CAD Tools for Variation Tolerance David Blaauw and Kaviraj Chopra University of Michigan.
Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept., UCLA 2.
Chapter 4b Process Variation Modeling
Revisiting and Bounding the Benefit From 3D Integration
Impact of Parameter Variations on Multi-core chips
Post-Silicon Calibration for Large-Volume Products
A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P
Measuring the Gap between FPGAs and ASICs
Parametric Yield Estimation Considering Leakage Variability Rajeev Rao, Anirudh Devgan, David Blaauw, Dennis Sylvester Present by Fengbo Ren Apr. 30.
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Presentation transcript:

© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department, UCLA 2 Altera Corporation, San Jose

© 2006 Altera Corporation 2 Outline Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results Conclusions and Discussions

© 2006 Altera Corporation 3 Background Process variations  more and more significant in nanometer technology  affect timing and power in both ASICs and FPGAs Delay with variations  Variation sources Threshold voltage (V th ) and effective channel length (L eff )  Independent Gaussians for global/local variations  First order canonical form Related work  FPGA device and architecture evaluation with process variations [Wong et al, ICCAD’05]  SSTA [Chang et al, ICCAD ’ 03] [Viseswariah et al, DAC ’ 04]  Statistical criticality analysis [Viseswariah et al, DAC ’ 04] [Li et al, ICCAD ’ 05] [Xiong et al, TAU ’ 06]  Statistical gate sizing for ASICs [Guthaus et al, ICCAD ’ 05] [Sinha et al, ICCAD ’ 05]

© 2006 Altera Corporation 4 Motivation STA is inaccurate with variation  Slack ignores near criticality  Near-critical paths may be statistically timing critical Deterministic timing-driven placer (e.g. T-VPlace in VPR)  Based on STA  Optimize for static critical path  May not optimize timing with variation Stochastic placer is needed with variations  Same placement for one application across chips

© 2006 Altera Corporation 5 Pre-routing Interconnect Uncertainty vs. Process Variation in Placement Clearly, process variation leads to a more significant delay variance in placement stage  Therefore, only consider process variation for placement Existing timing- driven placer  Leverages timing slack in STA  With interconnect delay estimated  May incur uncertainty along with process variation

© 2006 Altera Corporation 6 Outline Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results Conclusions and Discussions

© 2006 Altera Corporation 7 Uniqueness for Timing in FPGAs FPGAs vs. ASICs  Similarity Susceptible to process variations  Advantages Long switching paths dampen (average out) local variation Binned for speed-grades to isolate global variation Can be programmed repeatedly and differently during timing chip-test  Disadvantages Critical paths unknown at test time Same timing model to be applied to unknown applications at unknown clock frequency and varied conditions Guard-banded timing model can be arbitrarily conservative or aggressive

© 2006 Altera Corporation 8 Timing with Guard-banding A guard-band is applied for individual node to model uncertainty in STA A constant guard-banded delay is µ + cσ µ and σ are the nominal delay and standard deviation, respectively  c is constant for all circuit elements Guard-band cost (T grd /T norm )-1  T grd : critical path delay in STA w/ guard-banding  T norm : critical path delay in STA w/ nominal timing model  Pessimistic/optimistic for designs with longer/shorter critical path  Actual timing yield analyzed by SSTA

© 2006 Altera Corporation 9 Timing with Speed-binning Test and eliminate local variation by testing multiple similar paths across the test chip Model global variation Gaussians ΔX i as a single ΔG a Speed-binning = Categorizing ΔG a All chips fell into the same bin share the same guard- banded timing model  e.g., µ -σ g / µ +σ g / µ +3σ g for fast/medium/slow bin  STA for the circuit delay T bin for each bin

© 2006 Altera Corporation 10 Yield Analysis with Speed-binning Yield loss due to ignored local variation Yield loss due to unknown critical paths Timing yield analysis for a bin  circuit delay T µ +σ Tg ΔG a +σ Tl ΔR a  bin k [G low (k), G up (k) ]  cut-off delay γT bin (k)  timing yield for bin k is The overall timing yield is

© 2006 Altera Corporation 11 Outline Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results Conclusions and Discussions

© 2006 Altera Corporation 12 Timing-Driven Placement T-VPlace [Marquardt et al, FPGA 2000] Simulated annealing based placement Both wiring and timing are considered in the cost function  Wiring cost  Timing cost for a connection for a placement solution  Overall cost STA is performed at each annealing temperature to update critical path delay and slack

© 2006 Altera Corporation 13 Stochastic Placement ST-VPlace Main differences between ST-VPlace and T-VPlace  Estimate delay matrix in canonical form instead of just nominal delay matrix Used in SSTA for statistical timing cost during placement  Perform SSTA instead of STA at each temperature in simulated annealing framework  Using statistical criticality instead of static criticality in cost function Statistical criticality for an edge/node is the probability that this edge/node is statistically timing critical in SSTA  Statistical criticality exponent θ Static criticality is based on slack and the longest path delay in STA

© 2006 Altera Corporation 14 Outline Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results Conclusions and Discussions

© 2006 Altera Corporation 15 Experimental Settings Variation and device setting  10% as 3 sigma for global and local variation in V th and L eff at IRTS 65nm technology node  Min-ED device setting V dd =0.9v V th =0.3v [Wong et al, ICCAD ’ 05] Architecture similar to Altera ’ s Stratix TM  Island style FPGA architecture  cluster size 10 and LUT size 4  60% length-4 and 40% length-8 wire in interconnects  1.2X routing channel width obtained by T-VPlace Yield loss in failed parts per 10K parts (pp10K) Evaluated using MCNC and QUIP designs

© 2006 Altera Corporation 16 Cost Function Tuning Perform ST-VPlace and SSTA to obtain mean delay and standard deviation over all designs for each statistical criticality exponent θ θ=0.3 leads to the smallest mean and deviation  the highest timing yield

© 2006 Altera Corporation 17 T-VPlace vs. ST-VPlace Some correlation between mean delay and deviation ST-VPlace achieves  smaller mean delay for all designs  smaller variance for most designs   a higher timing yield

© 2006 Altera Corporation 18 Statistical Criticality vs. Static Criticality Statistic criticality vs. static criticality  Statistical criticality does not increase monotonically with static one  Statistical criticality may vary significantly with similar static one ST-VPlace considers statistical criticality explicitly  Optimizes near-critical paths under variations  Leads to a higher timing yield

© 2006 Altera Corporation 19 Impact on Path-length Distribution Path-length distribution in ST-VPlace is almost on top of that in T-VPlace ST-VPlace reduces top 10% near-critical paths from 1.3% to 0.8%  Although has a larger nominal delay  But has a smaller mean and variance  a higher timing yield

© 2006 Altera Corporation 20 Effect of Guard-banding Variation (3sigma) global 5% local 5% 0% 20% 40% 60% 80% 100% 120% Guard-band factor Guard-band cost Yield loss (pp10k) guard-band cost T-Vplace yield lost STV-Place yield lost Variation (3sigma) global 20% local 20% 0% 20% 40% 60% 80% 100% 120% Guard-band factor Guard-band cost Yield loss (pp10k) guard-band cost T-Vplace yield lost ST-VPlace yield lost ST-VPlace obtains a higher timing yield under varied variations and guard-band factors  Larger gain with smaller variation

© 2006 Altera Corporation 21 Effect of Guard-banding Variation (3sigma) global 5% local 5% 0% 20% 40% 60% 80% 100% 120% Guard-band factor Guard-band cost Yield loss (pp10k) guard-band cost T-Vplace yield lost STV-Place yield lost Variation (3sigma) global 20% local 20% 0% 20% 40% 60% 80% 100% 120% Guard-band factor Guard-band cost Yield loss (pp10k) guard-band cost T-Vplace yield lost ST-VPlace yield lost ST-VPlace obtains a higher timing yield under varied variations and guard-band factors  Larger gain with smaller variation  Similar gain with varied local variation when no global variation is considered Yeild loss reduced by 3.4X with 3 sigma guard-banding under 10%/10% variations

© 2006 Altera Corporation 22 Effect of Speed-binning Fast/Medium/Slow = 40%/30%/29.999% Discard the slowest 0.001% (0.1pp10K) chips T bin may be relaxed by γ for a higher timing yield Yield loss due to local variation and unknown critical paths ST-VPlace consistently achieves higher timing yield Yield loss is reduced by 25X with γ=5%

© 2006 Altera Corporation 23 Conclusions and Discussions Conclusions  Quantified the effects of guard-banding and speed- binning with variations  Developed a novel stochastic placer  Evaluated with MCNC and QUIP designs, reduced yield loss by 3.4X with guard-banding 25X with speed-binning Ongoing and future work  Extend timing models with spatial correlated variations  Develop stochastic physical synthesis algorithms, e.g., clustering, routing, re-timing