Stochastic Physical Synthesis for FPGAs with Pre-routing Interconnect Uncertainty and Process Variation Yan Lin and Lei He EE Department, UCLA

Slides:



Advertisements
Similar presentations
Device and Architecture Co-Optimization for FPGA Power Reduction Lerong Cheng, Phoebe Wong, Fei Li, Yan Lin, and Prof. Lei He EE Department, UCLA Partially.
Advertisements

Non-Gaussian Statistical Timing Analysis Using Second Order Polynomial Fitting Lerong Cheng 1, Jinjun Xiong 2, and Lei He 1 1 EE Department, UCLA *2 IBM.
Cadence Design Systems, Inc. Why Interconnect Prediction Doesn’t Work.
ECE 506 Reconfigurable Computing Lecture 6 Clustering Ali Akoglu.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 10, 2013 Statistical Static Timing Analysis.
Improving Placement under the Constant Delay Model Kolja Sulimma 1, Ingmar Neumann 1, Lukas Van Ginneken 2, Wolfgang Kunz 1 1 EE and IT Department University.
0 1 Width-dependent Statistical Leakage Modeling for Random Dopant Induced Threshold Voltage Shift Jie Gu, Sachin Sapatnekar, Chris Kim Department of Electrical.
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Clustering of Large Designs for Channel-Width Constrained FPGAs Marvin TomGuy Lemieux University of British Columbia Department of Electrical and Computer.
 Device and architecture co-optimization – Large search space – Need fast yet accurate power and delay estimator for FPGAs  Trace-based power and delay.
Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources Lerong Cheng 1, Jinjun Xiong 2, and Prof. Lei He 1 1 EE Department, UCLA.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 22, 2009 Statistical Static Timing Analysis.
Yan Lin, Fei Li and Lei He EE Department, UCLA
Statistical Crosstalk Aggressor Alignment Aware Interconnect Delay Calculation Supported by NSF & MARCO GSRC Andrew B. Kahng, Bao Liu, Xu Xu UC San Diego.
1 Variability Characterization in FPGAs Brendan Hargreaves 10/05/2006.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 14: March 19, 2008 Statistical Static Timing Analysis.
Simultaneous Time Slack Budgeting and Retiming for Dual-Vdd FPGA Power Reduction Yu Hu 1, Yan Lin 1, Lei He 1 and Tim Tuan 2 1 EE Department, UCLA 2 Xilinx.
Power Modeling and Architecture Evaluation for FPGA with Novel Circuits for Vdd Programmability Yan Lin, Fei Li and Lei He EE Department, UCLA
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 22: April 11, 2011 Statistical Static Timing Analysis.
Toward Performance-Driven Reduction of the Cost of RET-Based Lithography Control Dennis Sylvester Jie Yang (Univ. of Michigan,
© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department,
Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel,
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los Angeles Partially supported by NSF Grants.
Statistical Gate Delay Calculation with Crosstalk Alignment Consideration Andrew B. Kahng, Bao Liu, Xu Xu UC San Diego
Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.
Selective Gate-Length Biasing for Cost-Effective Runtime Leakage Control Puneet Gupta 1 Andrew B. Kahng 1 Puneet Sharma 1 Dennis Sylvester 2 1 ECE Department,
Trace-Based Framework for Concurrent Development of Process and FPGA Architecture Considering Process Variation and Reliability 1 Lerong Cheng, 1 Yan Lin,
Statistical Critical Path Selection for Timing Validation Kai Yang, Kwang-Ting Cheng, and Li-C Wang Department of Electrical and Computer Engineering University.
StaticRoute: A novel router for the dynamic partial reconfiguration of FPGAs Brahim Al Farisi, Karel Bruneel, Dirk Stroobandt 2/9/2013.
Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2
Power Reduction for FPGA using Multiple Vdd/Vth
Titan: Large and Complex Benchmarks in Academic CAD
UC San Diego / VLSI CAD Laboratory Toward Quantifying the IC Design Value of Interconnect Technology Improvement Tuck-Boon Chan, Andrew B. Kahng, Jiajia.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
University of British Columbia Dept. of Electrical and Computer Engineering November 30, 2007 A Combined Clustering and Placement Algorithm for FPGAs Mark.
Julien Lamoureux and Steven J.E Wilton ICCAD
Accelerating Statistical Static Timing Analysis Using Graphics Processing Units Kanupriya Gulati and Sunil P. Khatri Department of ECE, Texas A&M University,
Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Impact of Interconnect Architecture on VPSAs (Via-Programmed Structured ASICs) Usman Ahmed Guy Lemieux Steve Wilton System-on-Chip Lab University of British.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 22, 2015 Statistical Static Timing Analysis.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
Parallel Routing for FPGAs based on the operator formulation
Variation. 2 Sources of Variation 1.Process (manufacturing) (physical) variations:  Uncertainty in the parameters of fabricated devices and interconnects.
UC San Diego / VLSI CAD Laboratory Learning-Based Approximation of Interconnect Delay and Slew Modeling in Signoff Timing Tools Andrew B. Kahng, Seokhyeong.
FPGA CAD 10-MAR-2003.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,
© PSU Variation Aware Placement in FPGAs Suresh Srinivasan and Vijaykrishnan Narayanan Pennsylvania State University, University Park.
Interconnect Driver Design for Long Wires in FPGAs Edmund Lee University of British Columbia Electrical & Computer Engineering MASc Thesis Presentation.
Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.
Interconnect Driver Design for Long Wires in FPGAs Edmund Lee, Guy Lemieux & Shahriar Mirabbasi University of British Columbia, Canada Electrical & Computer.
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
University of Michigan Advanced Computer Architecture Lab. 2 CAD Tools for Variation Tolerance David Blaauw and Kaviraj Chopra University of Michigan.
Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept., UCLA 2.
Yu-Guang Chen1,2, Wan-Yu Wen1, Tao Wang2,
Impact of Parameter Variations on Multi-core chips
A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Presentation transcript:

Stochastic Physical Synthesis for FPGAs with Pre-routing Interconnect Uncertainty and Process Variation Yan Lin and Lei He EE Department, UCLA Partially supported by NSF and UC Micro sponsored by Actel

Motivation  Variations Pre-routing interconnect uncertainty Process variation  Impact Any near-critical paths  statistically timing critical STA ignores near-criticality  Related work for FPGAs Chipwise placement [Cheng, FPL’06] Stochastic placement [Lin, FPL’06] Stochastic routing [Sivaswamy, FPGA’07] Stochastic physical synthesis and the interaction have not been studied for FPGAs

Outline  Preliminaries  Stochastic Clustering  Stochastic Placement  Stochastic Routing  Interaction between Clustering, Placement and Routing  Conclusions

Model of Variations  Pre-routing interconnect uncertainty modeled as independent Gaussian distribution Standard deviation estimated with post-routing delay distribution  Again, Gaussian models for process variations Threshold voltage (V th ) Effective channel length (L eff ) Model these variation sources as independent Gaussians

Model of Variations  Pre-routing interconnect uncertainty modeled as independent Gaussian distribution Standard deviation estimated with post-routing delay distribution  Again, Gaussian models for process variations Threshold voltage (V th ) Effective channel length (L eff ) Model these variation sources as independent Gaussians models process variation models interconnect uncertainty are standard deviations  Delay with variations First order canonical form

Synthesis Flow

Experimental Settings  Variation and device setting 10%/10%/6% as 3 sigma for global/spatial/local variation in V th and L eff IRTS 65nm technology node  Island style FPGA architecture Cluster size 10 and LUT size 4 60% length-4 and 40% length-8 wire in interconnects  Yield loss in failed parts per 10K parts (pp10K) 2.5 sigma guard-banded delay as the cut-off delay Evaluated using MCNC designs

Outline  Preliminaries  Stochastic Clustering  Stochastic Placement  Stochastic Routing  Interaction between Clustering, Placement and Routing  Conclusions

 With statistical criticality Better seed BLE selection Better candidate BLE selection for the current cluster Stochastic Clustering ST-VPack  Based on T-VPack [Betz, FPGA book] An iterative approach  Select a seed BLE for a new cluster  Pack BLE into the current cluster STA with constant delay model to calculate slack  ST-VPack performs SSTA Statistical criticality of an edge/node is the probability of this edge/node being timing critical with variations  Statistical timing cost of BLE B

The Impact of the Combination of Two Uncertainty Sources  Timing gain mainly due to modeling interconnect uncertainty Modeling interconnect uncertainty leads to a better delay distribution than process variation Considering both does not have much further gain Process variation Interconnect uncertainty Both 0% 10%20%10%20% 10%20%0.0 10% Tmean Tsigma

Interconnect Uncertainty vs. Process Variation in Clustering  Clearly, interconnect uncertainty leads to a more significant delay variance in clustering With process variation With interconnect uncertainty

Comparison between T-VPack and ST-VPack  ST-VPack on average reduces mean delay by 5.0% (up to 13.0%) standard deviation by 6.4% (up to 31.8%) yield loss from 50pp10K to 9pp10K  In addition, ST-VPack has virtually no wire length, area and runtime overhead

Outline  Motivation and Background  Stochastic Clustering  Stochastic Placement  Stochastic Routing  Interaction between Clustering, Placement and Routing  Conclusions

Pre-routing Interconnect Uncertainty vs. Process Variation in Placement  Clearly, process variation leads to a more significant delay variance in placement Only considering process variation is sufficient With process variation With interconnect uncertainty

Stochastic Placement ST-VPlace  Stochastic placement developed in [Lin, FPL’06] Based on T-VPlace [Marquardt, ISFPGA ’ 00] Replace SSTA with STA Replace statistical criticality with static criticality  Main improvement Consider spatially correlated variation with PCA

Comparison between T-VPlace and ST-VPlace  ST-VPlace on average reduces mean delay by 4.0% (up to 14.2%) standard deviation by 6.1% (up to 22.7%) yield loss from 50pp10K to 12pp10K virtually no wire overhead  On the other hand, ST-VPlace takes 3.1X runtime

Outline  Preliminaries  Stochastic Clustering  Stochastic Placement  Stochastic Routing  Interaction between Clustering, Placement and Routing  Conclusions

Stochastic Routing ST-PathFinder  Based on PathFinder [Betz, FPGA book] An iterative maze router, w/ congestion allowed Considering both timing and wiring costs  Interconnect estimation in routing Occurs when predicting delay to the target sink Has the highest accuracy  ST-PathFinder performs SSTA The new statistical cost function for node n is better tradeoff between timing and wiring costs

Comparison between PathFinder and ST-PathFinder  ST-PathFinder on average reduces mean delay by 1.4% (up to 7.8%) standard deviation by 0.7% (up to 5.2%) yield loss from 50pp10K to 35pp10K no runtime overhead  ST-PathFinder also reduces wire length by 4.5% on average

Outline  Preliminaries  Stochastic Clustering  Stochastic Placement  Stochastic Routing  Interaction between Clustering, Placement and Routing  Conclusions

Interaction between Clustering, Placement and Routing  The stochastic flow reduces yield loss from 50 to 5, but 3.0X runtime  Timing gain mainly due to clustering and placement, but w/ overlap  Stochastic clustering + deterministic P&R is a good flow Significant timing gains and slightly less runtime clusterDSDDSSDS PlacerDDSDSDSS RouterDDDSDSSS Tnorm %-3.3%-1.4%-6.4%-4.1%-3.6%-6.3% Tmean %-4.0%-1.4%-5.9%-4.7%-4.0%-6.2% Tsigma %-6.1%-0.7%-8.8%-6.1%-6.3%-7.5% Yield loss runtime1X0.99X3.1X0.96X3.0X0.97X3.1X3.0X Wire1X0.8%1.3%-4.5%3.2%-3.4% -1.6%  Deterministic clusterer, placer + stochastic router is a good flow Significant wiring gains and less runtime

Conclusions  The timing gain mainly due to clusterer and placer modeling interconnect uncertainty for clustering considering process variation for placement  The stochastic flow reduces yield loss from 50 to 5pp10K mean delay by 6.2%, standard deviation by 7.5% but takes 3X runtime  Deterministic clusterer, placer + stochastic router reduces wire length by 4.5% also runs slightly faster than deterministic flow  Stochastic clusterer + deterministic P&R reduces yield loss from 50 to 9pp10K mean delay by 5.0%, standard deviation by 6.4% also runs slightly faster than deterministic flow