Improved Performance of 3DIC Implementations Through Inherent Awareness of Mix-and-Match Die Stacking Kwangsoo Han, Andrew B. Kahng and Jiajia Li University.

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

Design Rule Generation for Interconnect Matching Andrew B. Kahng and Rasit Onur Topaloglu {abk | rtopalog University of California, San Diego.
OCV-Aware Top-Level Clock Tree Optimization
Timing Margin Recovery With Flexible Flip-Flop Timing Model
Minimum Implant Area-Aware Gate Sizing and Placement
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Multi-Project Reticle Floorplanning and Wafer Dicing Andrew B. Kahng 1 Ion I. Mandoiu 2 Qinke Wang 1 Xu Xu 1 Alex Zelikovsky 3 (1) CSE Department, University.
Krit Athikulwongse, Dae Hyun Kim, Moongon Jung, and Sung Kyu Lim
UC San Diego / VLSI CAD Laboratory NOLO: A No-Loop, Predictive Useful Skew Methodology for Improved Timing in IC Implementation Tuck-Boon Chan, Andrew.
An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim, Deokjin Joo, Taehan Kim DAC’13.
Rasit Onur Topaloglu University of California San Diego Computer Science and Engineering Department Ph.D. candidate “Location.
Background: Scan-Based Delay Fault Testing Sequentially apply initialization, launch test vector pairs that differ by 1-bit shift A vector pair induces.
Multi-Project Reticle Design & Wafer Dicing under Uncertain Demand Andrew B Kahng, UC San Diego Ion Mandoiu, University of Connecticut Xu Xu, UC San Diego.
Power-Aware Placement
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
Placement Feedback: A Concept and Method for Better Min-Cut Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La.
Yield- and Cost-Driven Fracturing for Variable Shaped-Beam Mask Writing Andrew B. Kahng CSE and ECE Departments, UCSD Xu Xu CSE Department, UCSD Alex Zelikovsky.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Toward Performance-Driven Reduction of the Cost of RET-Based Lithography Control Dennis Sylvester Jie Yang (Univ. of Michigan,
An Algebraic Multigrid Solver for Analytical Placement With Layout Based Clustering Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng, Bo Yao, Zhengyong Zhu.
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
A Proposal for Routing-Based Timing-Driven Scan Chain Ordering Puneet Gupta 1 Andrew B. Kahng 1 Stefanus Mantik 2
Detailed Placement for Leakage Reduction Using Systematic Through-Pitch Variation Andrew B. Kahng †‡ Swamy Muddu ‡ Puneet Sharma ‡ CSE † and ECE ‡ Departments,
UC San Diego Computer Engineering. VLSI CAD Laboratory.. UC San Diego Computer EngineeringVLSI CAD Laboratory.. UC San Diego Computer EngineeringVLSI CAD.
Timing Analysis and Optimization Implications of Bimodal CD Distribution in Double Patterning Lithography Kwangok Jeong and Andrew B. Kahng VLSI CAD LABORATORY.
UC San Diego / VLSI CAD Laboratory Reliability-Constrained Die Stacking Order in 3DICs Under Manufacturing Variability Tuck-Boon Chan, Andrew B. Kahng,
Dose Map and Placement Co-Optimization for Timing Yield Enhancement and Leakage Power Reduction Kwangok Jeong, Andrew B. Kahng, Chul-Hong Park, Hailong.
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
UC San Diego / VLSI CAD Laboratory Toward Quantifying the IC Design Value of Interconnect Technology Improvement Tuck-Boon Chan, Andrew B. Kahng, Jiajia.
Horizontal Benchmark Extension for Improved Assessment of Physical CAD Research Andrew B. Kahng, Hyein Lee and Jiajia Li UC San Diego VLSI CAD Laboratory.
UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.
-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Clock Clustering and IO Optimization for 3D Integration Samyoung Bang*, Kwangsoo Han ‡, Andrew B. Kahng ‡† and Vaishnav Srinivas ‡ ‡ ECE and † CSE Departments,
Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang
Kwangsoo Han‡, Andrew B. Kahng‡† and Hyein Lee‡
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
UC San Diego / VLSI CAD Laboratory Learning-Based Approximation of Interconnect Delay and Slew Modeling in Signoff Timing Tools Andrew B. Kahng, Seokhyeong.
Timing Model Reduction for Hierarchical Timing Analysis Shuo Zhou Synopsys November 7, 2006.
Mixed Cell-Height Implementation for Improved Design Quality in Advanced Nodes Sorin Dobre +, Andrew B. Kahng * and Jiajia Li * * UC San Diego VLSI CAD.
0 Optimizing Stochastic Circuits for Accuracy-Energy Tradeoffs Armin Alaghi 3, Wei-Ting J. Chan 1, John P. Hayes 3, Andrew B. Kahng 1,2 and Jiajia Li 1.
Outline Motivation and Contributions Related Works ILP Formulation
Improved Path Clustering for Adaptive Path-Delay Testing Tuck-Boon Chan* and Prof. Andrew B. Kahng*# UC San Diego ECE* & CSE # Departments.
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Improved Flop Tray-Based Design Implementation for Power Reduction
Kun Young Chung*, Andrew B. Kahng+ and Jiajia Li+
Time-borrowing platform in the Xilinx UltraScale+ family of FPGAs and MPSoCs Ilya Ganusov, Benjamin Devlin.
Chapter 7 – Specialized Routing
On the Relevance of Wire Load Models
Kristof Blutman† , Hamed Fatemi† , Andrew B
Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.
Revisiting and Bounding the Benefit From 3D Integration
Two-phase Latch based design
Fault-Tolerant Architecture Design for Flow-Based Biochips
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
Buffered tree construction for timing optimization, slew rate, and reliability control Abstract: With the rapid scaling of IC technology, buffer insertion.
EE5780 Advanced VLSI Computer-Aided Design
Post-Silicon Calibration for Large-Volume Products
Automated Layout and Phase Assignment for Dark Field PSM
Measuring the Gap between FPGAs and ASICs
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow
Fast Min-Register Retiming Through Binary Max-Flow
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Presentation transcript:

Improved Performance of 3DIC Implementations Through Inherent Awareness of Mix-and-Match Die Stacking Kwangsoo Han, Andrew B. Kahng and Jiajia Li University of California at San Diego http://vlsicad.ucsd.edu/

Mix-and-Match Integration Interesting idea proposed by previous works Integrate slow copies of tiers with fast copies of tiers to improve 3DIC parametric yield == “Mix-and-match” integration 3D integration SS Tier 1 wafer/die FF Tier 0 wafer Fast Tier 1 Slow Tier 0 Wafer-to-wafer (die-to-wafer) bonding: integrate SS wafer/die with FF wafer/die (SS Tier 0 wafer/die + FF Tier 1 wafer or FF Tier 0 wafer/die + SS Tier 1 wafer) Monolithic 3D: adapt process so that Tier 1 is fast if Tier 0 is known to be slow

Agenda Motivation Related Works Our Methodology Experimental Results Conclusions

Mix-and-Match Needs Design-Stage Optimization Mix-and-match integration offers performance benefits over the conventional worst-case analysis No holistic “design for eventual integration” of 3DICs With mix-and-match integration, blue cut has maximum slack, red cut has minimum slack Our goal: Study design-stage optimization for mix-and-match 75ps XX-YY = XX Tier 0 + YY Tier 1 Technology: 28FDSOI 3D netlist is bipartitioned with minimum net cut SS (FF) FF (SS) Benefit from mix-and-match SS (FF) FF (SS) Bad setup and hold slacks

Challenges in Mix-and-Match-Aware Partitioning Examples with different optimal solutions for different objectives Assumption: (i) Uniform stage delay (dSS = 30ps, dFF = 10ps) (ii) PathAC: 26 stages, PathBC: 30 stages (iii) Balance criteria: 50% for each die Objective: (i) Minimize delay of PathAC (ii) Minimize delay of PathBC (iii) Minimize worst-case delay over the two paths (iv) Minimize worst-case delay over the two paths in regime of large VI delay impact (dVI) 14 3 6 9 1 15 3 5 6 dVI = 60ps dVI = 10ps A DAC = 680ps DBC = 720ps DAC = 530ps DBC = 680ps DAC = 640ps DBC = 610ps DAC = 620ps DBC = 620ps Tier 1 Tier 0 C 18 2 20 13 7 B 10 5

Challenges in Mix-and-Match-Aware Partitioning Optimal cut location on one path conflicts with those on other paths Example has different optimal solutions for different objectives Vertical interconnect (VI) delay impact Asymmetric path delay distribution due to process variation

Agenda Motivation Related Works Our Methodology Experimental Results Conclusions

Related Works Mix-and-match die stacking [Ferri08] integrates fast (slow) CPU die with slow (fast) L2 cache die to improve parametric yield [Garg13] formulates mathematical programming to optimize mix-and-match stacking [Chan13] uses mix-and-match stacking to optimize reliability of 3DICs [Juan14] performs thermal-aware matching and stacking But no design-stage optimization has been studied Netlist partitioning for 3DICs [Li06] uses simulated annealing engine to partition block into tiers [Thorolfsson10] uses hMetis to partition netlist with minimized #VIs [Hu10] proposes a multi-level framework for 3D partitioning [Cong07][Panth15] assign cells to tiers through transformations of a 2D placement These do not comprehend mix-and-match die stacking

Agenda Motivation Related Works Our Methodology Experimental Results Conclusions

ILP-Based Partitioning Method ILP formulation Minimize Dmax // Minimize max path delay in regime of mix-and-match Such that βi, i’ ≥ xi - xi’ // indicators of VI insertions βi, i’ ≥ xi’ - xi for all adjacent cells Delay bound ∑(dij∙(1-xi) + dij’∙xi) + ∑ βi, i’∙dVI ≤ Dmax for all paths Area-balancing criterion ∑ ai∙xi - ∑ai∙(1-xi) ≤ α∙∑ai ∑ ai∙(1-xi) - ∑ai∙xi ≤ α∙∑ai Notations Dmax maximum path delay xi binary indicator of cell on Tier 0 (xi = 0) or Tier 1 (xi = 1) βi, i’ binary indicator of whether a VI (cut) exists dij cell delay at jth process corner dVI delay impact of VI insertion ai cell area α area-balancing criterion (e.g., α = 5%)

Heuristic Partitioning Method (1) Maximum-cut on timing-critical sequential graph Classify paths according to their slacks and VI delay impact Type-I Timing non-critical paths Type-II Timing-critical paths without tolerance of VI insertion  Impact of VI insertion ≥ timing benefit from mix-and-match Type-III Timing-critical paths with tolerance of VI insertion  Impact of VI insertion < timing benefit from mix-and-match Extract restricted sequential graph containing only Type-II/Type-III paths Collapse vertices connected with Type-II paths into one vertex Perform maximum cut on updated graph

Heuristic Partitioning Method (2) Timing-aware multi-phase FM partitioning Issue: Hard to “foresee” slack benefits with existence of VI delay impact Our solution: Cluster cells with given range of cluster size Moving one cell degrades slack, but following moves compensate VI delay impact Clustering with cluster size [L2, U2] Clustering with cluster size [Lk, Uk] Timing-aware FM … Partitioning solution Partitioning solution with maximum slack One phase

Agenda Motivation Related Works Our Methodology Experimental Results Conclusions

Design of Experiments Testcases: ARM Cortex M0 and DMA, AES, VGA from OpenCores website Technology: 28FDSOI, dual-VT Tools ILP solver: CPLEX v12.5 Synthesis: Synopsys Design Compiler H-2013.03-SP3 P&R: Cadence EDI System v12.0 Signoff timer: Synopsys PrimeTime H-2013.06-SP2 Two sets of experiments Validation of our heuristic partitioning method Extend existing 3DIC implementation flows with our optimization

Calibration of Heuristic Partitioning ILP-based optimization leads to near-optimal solution Vary delay impact of VI insertion, process corners Heuristic consistently achieves < 30ps slack difference compared to ILP-based partitioning solution 3σ SS + 3σ FF 2σ SS + 3σ FF 3σ SS + 2σ FF dVI (ps) 10 30 50 10 30 50 10 30 50 Design Clk period DMA 0.6ns

Validation of Our Method Extend two existing 3DIC implementation flows (i) GT2012, and (ii) Shrunk2D to include our partitioning method for mix-and-match Up to 16% performance improvement GT2012 (orig) GT2012 (opt) Design Clk period M0 1.2ns AES 1.1ns VGA 1.0ns

Agenda Motivation Related Works Our Methodology Experimental Results Conclusions

Futures and Conclusions Design-stage optimization for mix-and-match die stacking ILP-based and heuristic partitioning methodologies directly maximize design’s slack in the regime of mix-and-match Up to 16% timing improvement compared to conventional 3D partitioning solution Future works Integration of design-stage and die- and/or wafer-level optimization Clock tree synthesis for mix-and-match stacking

THANK YOU !