Stochastic Physical Synthesis for FPGAs with Pre-routing Interconnect Uncertainty and Process Variation Yan Lin and Lei He EE Department, UCLA Partially supported by NSF and UC Micro sponsored by Actel
Motivation Variations Pre-routing interconnect uncertainty Process variation Impact Any near-critical paths statistically timing critical STA ignores near-criticality Related work for FPGAs Chipwise placement [Cheng, FPL’06] Stochastic placement [Lin, FPL’06] Stochastic routing [Sivaswamy, FPGA’07] Stochastic physical synthesis and the interaction have not been studied for FPGAs
Outline Preliminaries Stochastic Clustering Stochastic Placement Stochastic Routing Interaction between Clustering, Placement and Routing Conclusions
Model of Variations Pre-routing interconnect uncertainty modeled as independent Gaussian distribution Standard deviation estimated with post-routing delay distribution Again, Gaussian models for process variations Threshold voltage (V th ) Effective channel length (L eff ) Model these variation sources as independent Gaussians
Model of Variations Pre-routing interconnect uncertainty modeled as independent Gaussian distribution Standard deviation estimated with post-routing delay distribution Again, Gaussian models for process variations Threshold voltage (V th ) Effective channel length (L eff ) Model these variation sources as independent Gaussians models process variation models interconnect uncertainty are standard deviations Delay with variations First order canonical form
Synthesis Flow
Experimental Settings Variation and device setting 10%/10%/6% as 3 sigma for global/spatial/local variation in V th and L eff IRTS 65nm technology node Island style FPGA architecture Cluster size 10 and LUT size 4 60% length-4 and 40% length-8 wire in interconnects Yield loss in failed parts per 10K parts (pp10K) 2.5 sigma guard-banded delay as the cut-off delay Evaluated using MCNC designs
Outline Preliminaries Stochastic Clustering Stochastic Placement Stochastic Routing Interaction between Clustering, Placement and Routing Conclusions
With statistical criticality Better seed BLE selection Better candidate BLE selection for the current cluster Stochastic Clustering ST-VPack Based on T-VPack [Betz, FPGA book] An iterative approach Select a seed BLE for a new cluster Pack BLE into the current cluster STA with constant delay model to calculate slack ST-VPack performs SSTA Statistical criticality of an edge/node is the probability of this edge/node being timing critical with variations Statistical timing cost of BLE B
The Impact of the Combination of Two Uncertainty Sources Timing gain mainly due to modeling interconnect uncertainty Modeling interconnect uncertainty leads to a better delay distribution than process variation Considering both does not have much further gain Process variation Interconnect uncertainty Both 0% 10%20%10%20% 10%20%0.0 10% Tmean Tsigma
Interconnect Uncertainty vs. Process Variation in Clustering Clearly, interconnect uncertainty leads to a more significant delay variance in clustering With process variation With interconnect uncertainty
Comparison between T-VPack and ST-VPack ST-VPack on average reduces mean delay by 5.0% (up to 13.0%) standard deviation by 6.4% (up to 31.8%) yield loss from 50pp10K to 9pp10K In addition, ST-VPack has virtually no wire length, area and runtime overhead
Outline Motivation and Background Stochastic Clustering Stochastic Placement Stochastic Routing Interaction between Clustering, Placement and Routing Conclusions
Pre-routing Interconnect Uncertainty vs. Process Variation in Placement Clearly, process variation leads to a more significant delay variance in placement Only considering process variation is sufficient With process variation With interconnect uncertainty
Stochastic Placement ST-VPlace Stochastic placement developed in [Lin, FPL’06] Based on T-VPlace [Marquardt, ISFPGA ’ 00] Replace SSTA with STA Replace statistical criticality with static criticality Main improvement Consider spatially correlated variation with PCA
Comparison between T-VPlace and ST-VPlace ST-VPlace on average reduces mean delay by 4.0% (up to 14.2%) standard deviation by 6.1% (up to 22.7%) yield loss from 50pp10K to 12pp10K virtually no wire overhead On the other hand, ST-VPlace takes 3.1X runtime
Outline Preliminaries Stochastic Clustering Stochastic Placement Stochastic Routing Interaction between Clustering, Placement and Routing Conclusions
Stochastic Routing ST-PathFinder Based on PathFinder [Betz, FPGA book] An iterative maze router, w/ congestion allowed Considering both timing and wiring costs Interconnect estimation in routing Occurs when predicting delay to the target sink Has the highest accuracy ST-PathFinder performs SSTA The new statistical cost function for node n is better tradeoff between timing and wiring costs
Comparison between PathFinder and ST-PathFinder ST-PathFinder on average reduces mean delay by 1.4% (up to 7.8%) standard deviation by 0.7% (up to 5.2%) yield loss from 50pp10K to 35pp10K no runtime overhead ST-PathFinder also reduces wire length by 4.5% on average
Outline Preliminaries Stochastic Clustering Stochastic Placement Stochastic Routing Interaction between Clustering, Placement and Routing Conclusions
Interaction between Clustering, Placement and Routing The stochastic flow reduces yield loss from 50 to 5, but 3.0X runtime Timing gain mainly due to clustering and placement, but w/ overlap Stochastic clustering + deterministic P&R is a good flow Significant timing gains and slightly less runtime clusterDSDDSSDS PlacerDDSDSDSS RouterDDDSDSSS Tnorm %-3.3%-1.4%-6.4%-4.1%-3.6%-6.3% Tmean %-4.0%-1.4%-5.9%-4.7%-4.0%-6.2% Tsigma %-6.1%-0.7%-8.8%-6.1%-6.3%-7.5% Yield loss runtime1X0.99X3.1X0.96X3.0X0.97X3.1X3.0X Wire1X0.8%1.3%-4.5%3.2%-3.4% -1.6% Deterministic clusterer, placer + stochastic router is a good flow Significant wiring gains and less runtime
Conclusions The timing gain mainly due to clusterer and placer modeling interconnect uncertainty for clustering considering process variation for placement The stochastic flow reduces yield loss from 50 to 5pp10K mean delay by 6.2%, standard deviation by 7.5% but takes 3X runtime Deterministic clusterer, placer + stochastic router reduces wire length by 4.5% also runs slightly faster than deterministic flow Stochastic clusterer + deterministic P&R reduces yield loss from 50 to 9pp10K mean delay by 5.0%, standard deviation by 6.4% also runs slightly faster than deterministic flow