Interconnect Modeling for Improved System-Level Design Optimization Luca Carloni  § Andrew B. Kahng ¶ Swamy Muddu ¶ Alessandro Pinto ‡ Kambiz Samadi ¶

Slides:



Advertisements
Similar presentations
EE 201A Modeling and Optimization for VLSI LayoutJeff Wong and Dan Vasquez EE 201A Noise Modeling Jeff Wong and Dan Vasquez Electrical Engineering Department.
Advertisements

Design Rule Generation for Interconnect Matching Andrew B. Kahng and Rasit Onur Topaloglu {abk | rtopalog University of California, San Diego.
Tunable Sensors for Process-Aware Voltage Scaling
A Novel 3D Layer-Multiplexed On-Chip Network
Improved On-Chip Analytical Power and Area Modeling Andrew B. Kahng Bill Lin Kambiz Samadi University of California, San Diego January 20, 2010.
Improving Placement under the Constant Delay Model Kolja Sulimma 1, Ingmar Neumann 1, Lukas Van Ginneken 2, Wolfgang Kunz 1 1 EE and IT Department University.
Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
Noise Model for Multiple Segmented Coupled RC Interconnects Andrew B. Kahng, Sudhakar Muddu †, Niranjan A. Pol ‡ and Devendra Vidhani* UCSD CSE and ECE.
DARPA Assessing Parameter and Model Sensitivities of Cycle-Time Predictions Using GTX u Abstract The GTX (GSRC Technology Extrapolation) system serves.
EE466: VLSI Design Lecture 11: Wires
EE 447 VLSI Design Lecture 5: Wires. EE 447VLSI Design 6: Wires2 Outline Introduction Wire Resistance Wire Capacitance Wire RC Delay Crosstalk Wire Engineering.
ORION2.0: A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration Andrew B. Kahng ¶ Bin Li ‡ Li-Shiuan Peh ‡ Kambiz Samadi.
Communication Modeling for System-Level Design Andrew B. Kahng #,* Kambiz Samadi * CSE # and ECE * Departments,
Statistical Crosstalk Aggressor Alignment Aware Interconnect Delay Calculation Supported by NSF & MARCO GSRC Andrew B. Kahng, Bao Liu, Xu Xu UC San Diego.
04/11/02EECS 3121 Lecture 26: Interconnect Modeling, continued EECS 312 Reading: 8.2.2, (text) HW 8 is due now!
Design Sensitivities to Variability: Extrapolations and Assessments in Nanometer VLSI Y. Kevin Cao *, Puneet Gupta +, Andrew Kahng +, Dennis Sylvester.
Architectural-Level Prediction of Interconnect Wirelength and Fanout Kwangok Jeong, Andrew B. Kahng and Kambiz Samadi UCSD VLSI CAD Laboratory
Study of Floating Fill Impact on Interconnect Capacitance Andrew B. Kahng Kambiz Samadi Puneet Sharma CSE and ECE Departments University of California,
04/09/02EECS 3121 Lecture 25: Interconnect Modeling EECS 312 Reading: 8.3 (text), 4.3.2, (2 nd edition)
Mehdi Amirijoo1 Power estimation n General power dissipation in CMOS n High-level power estimation metrics n Power estimation of the HW part.
Interconnect Optimizations
Chung-Kuan Cheng†, Andrew B. Kahng†‡,
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Effects of Global Interconnect Optimizations on Performance Estimation of Deep Sub-Micron Design Yu (Kevin) Cao 1, Chenming Hu 1, Xuejue Huang 1, Andrew.
Circuit Performance Variability Decomposition Michael Orshansky, Costas Spanos, and Chenming Hu Department of Electrical Engineering and Computer Sciences,
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 9.1 EE4800 CMOS Digital IC Design & Analysis Lecture 9 Interconnect Zhuo Feng.
Statistical Gate Delay Calculation with Crosstalk Alignment Consideration Andrew B. Kahng, Bao Liu, Xu Xu UC San Diego
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
Effects of Global Interconnect Optimizations on Performance Estimation of Deep Sub-Micron Design Yu Cao, Chenming Hu, Xuejue Huang, Andrew B. Kahng, Sudhakar.
University of Utah 1 The Effect of Interconnect Design on the Performance of Large L2 Caches Naveen Muralimanohar Rajeev Balasubramonian.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
1 A Novel Metric for Interconnect Architecture Performance Parthasarathi Dasgupta, Andrew B. Kahng, Swamy V. Muddu Dept. of CSE and ECE University of California,
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
Noise and Delay Uncertainty Studies for Coupled RC Interconnects Andrew B. Kahng, Sudhakar Muddu † and Devendra Vidhani ‡ UCLA Computer Science Department,
Enhanced Metamodeling Techniques for High-Dimensional IC Design Estimation Problems Andrew B. Kahng, Bill Lin and Siddhartha Nath VLSI CAD LABORATORY,
Yulei Zhang1, James F. Buckwalter1, and Chung-Kuan Cheng2
MOS Inverter: Static Characteristics
A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.
1 Delay Estimation Most digital designs have multiple data paths some of which are not critical. The critical path is defined as the path the offers the.
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
Capturing Crosstalk-Induced Waveform for Accurate Static Timing Analysis Masanori Hashimoto, Yuji Yamada, Hidetoshi Onodera Kyoto University.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Horizontal Benchmark Extension for Improved Assessment of Physical CAD Research Andrew B. Kahng, Hyein Lee and Jiajia Li UC San Diego VLSI CAD Laboratory.
Chapter 07 Electronic Analysis of CMOS Logic Gates
Veronica Eyo Sharvari Joshi. System on chip Overview Transition from Ad hoc System On Chip design to Platform based design Partitioning the communication.
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
10/03/2005: 1 Physical Synthesis of Latency Aware Low Power NoC Through Topology Exploration and Wire Style Optimization CK Cheng CSE Department UC San.
Eyecharts: Constructive Benchmarking of Gate Sizing Heuristics Puneet Gupta, University of California, Los Angeles Andrew B. Kahng, University of California,
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
Introduction to Clock Tree Synthesis
Chapter 4: Secs ; Chapter 5: pp
Inductance Screening and Inductance Matrix Sparsification 1.
Joshua L. Garrett Digital Circuits Design GroupUniversity of California, Berkeley Compact DSM MOS Modeling for Energy/Delay Estimation Joshua Garrett,
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.
1 Modeling and Optimization of VLSI Interconnect Lecture 2: Interconnect Delay Modeling Avinoam Kolodny Konstantin Moiseev.
PROCEED: Pareto Optimization-based Circuit-level Evaluation Methodology for Emerging Devices Shaodi Wang, Andrew Pan, Chi-On Chui and Puneet Gupta Department.
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Circuit Design Techniques for Low Power DSPs
Inductance Screening and Inductance Matrix Sparsification
Puneet Gupta1 , Andrew B. Kahng1 , Youngmin Kim2, Dennis Sylvester2
Applications of GTX Y. Cao, X. Huang, A.B. Kahng, F. Koushanfar, H. Lu, S. Muddu, D. Stroobandt and D. Sylvester Abstract The GTX (GSRC Technology Extrapolation)
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Presentation transcript:

Interconnect Modeling for Improved System-Level Design Optimization Luca Carloni  § Andrew B. Kahng ¶ Swamy Muddu ¶ Alessandro Pinto ‡ Kambiz Samadi ¶ Puneet Sharma ¶ § Columbia University ¶ University of California, San Diego ‡ University of California, Berkeley January 22, 2008

Outline  Motivation  System-Level Communication Synthesis  Buffered Interconnect Model  Interconnect Optimization  Validation and Significance Assessment  Conclusions

Motivation  Focus of design process is shifting from “computation” to “communication”  Device and interconnect performance scaling mismatches cause breakdown of traditional across-chip communication  System-level designers require accurate, yet simple models to bridge planning and implementation stages  Today’s system-level performance, power modeling suffers:  Ad hoc selection of models  Poor balance between accuracy and simplicity  Poor definition of inputs  Lack of model extensibility across future technology nodes  Inability to explore different implementation styles  Our Goal: Develop accurate models that are easily usable by system-level design early in the design cycle

Previous Interconnect Delay Models  Missing required aspects of accurate delay estimation  90nm  Do not consider input slew change, which impacts effective drive resistance and consequently cell delay  Do not consider scattering, which impacts metal resistivity and consequently metal resistance  Bakoglu90  No crosstalk impact, assumes driver on-resistance R d, gate input capacitance C g vary linearly with device size, uses Elmore delay model  Pamunuwa03  Similar to Bakoglu90 but adds crosstalk impact  CongPan99 (IPEM)  Multiple delay models under certain optimization schemes  Use of second-order RC model for gate delay (e.g., Shao03)  Does not address gate loading during model construction

Other Limitations of Previous Work  Design style and buffering schemes  Design-level degrees of freedom: wire width, spacing, shielding  Practical buffer sizing  Only consider the delay as optimization objective = wrong  Analytic solutions have large buffer sizes (100X-400X) which are not in any realistic cell library  Model inputs and technology capture  Do not have well-defined pathways to capture necessary technology and device parameters  Collect inputs from ad hoc sources, which often leads to misleading conclusions

Outline Motivation  System-Level Communication Synthesis  Buffered Interconnect Model  Interconnect Optimization  Validation and Significance Assessment  Conclusions

Communication Synthesis for Network-on-Chip  Given  An input specification as a set of communication constraints  A library of communication components  An objective function (e.g., power, area, delay)  Find  A network-on-chip implementation as a composition of library components that  Satisfies the specification  Minimizes the cost function  Communication Synthesis Infrastructure (COSI)  Based on the Platform-Based Design methodology  Takes specification and library descriptions in XML format  Produces a variety of outputs, including a cycle accurate SystemC implementation of the optimal network-on-chip

Application Implementation Constraints Propagation Point-to-Point Specification On-Chip Communication Library Perf. / Cost Abstractions Synthesis Synthesis Result Constraint-Driven Communication Synthesis

Communication Synthesis Key Elements  Specification of input constraints  Set of IP cores: area and interface  End-to-end communication requirements between pairs of IP cores: latency and throughput  Characterization of library of components  Interface types, max number of ports  Max capacities: bandwidth, latency, max distance  Performance and cost model  Component instantiation and parallel composition  Rename, set parameters of library components  Composition based on algebra on quantities (including type compatibility)

 Synthesis of optimal network-on-chip  Return valid composition that meets input constraints and  Minimizes the objective function (e.g., power dissipation) (Original Specification) Platform Instance 1 Platform Instance 2 Communication Synthesis Example

 COSI is a public-domain software package for NoC synthesis COSI: Communication Synthesis Infrastructure

Outline Motivation System-Level Communication Synthesis  Buffered Interconnect Model  Interconnect Optimization  Validation and Significance Assessment  Conclusions

Proposed Model Features Tech. Characteristics # metal layers min. width, spacing, thickness dielectric thickness, constant device drive res, cap, leakage Design Style width/spacing configs buffering scheme shielding signaling scheme Bus Attributes length, # bits, layer, switching Proposed Model Delay Leakage Dynamic Max. unclocked length, # pipelines, latency, throughput Area  Improved accuracy with respect to well-known models  Modeling of nanoscale-era effects: crosstalk, scattering, barrier thickness, dependence of delay on slews, etc.  Single-digit percentage accuracy relative to gate-level analyses

Model Technology Inputs  Inputs for repeater delay calculation  Delay and slew values for a set of input slew and load capacitance values (obtained from Liberty / Timing Library Formats (TLF) / SPICE)  Input capacitance for different repeater size (Liberty, Predictive Technology Models (PTM))  Inputs for wire delay calculation  Wire dimensions (ITRS/PTM, LEF, ITF)  Inter-wire spacings for global and intermediate layers (ITRS/PTM, LEF, ITF)  Inputs for power calculation  Input capacitance (Liberty, PTM)  Wire parasitics (computed in wire delay calculation)  Inputs for area calculation  Wire dimensions used above  Repeater area is available from Liberty and for future technologies, ITRS A-factors or proposed area models can be used

Buffered Interconnect Model  Buffered interconnect model for delay, power, and area  Constructed from: buffer (repeater) and wire delay models  Accounts for coupling capacitances, slew dependence and UDSM effects (e.g., scattering-dependent wire resistance changes)  Calibrated against SPICE  Components:  Repeater delay model  Separate models for intrinsic delay, output slew, input capacitance  Wire delay model  Accounts for coupling capacitance impact on wire delay  Repeater power model  Accounts for sub-threshold and gate leakages  Repeater area model  Derived from existing cell layouts (can be extrapolated)  Wire area model  Derived from wire width and spacing (can be extrapolated)

Repeater Delay Model  Repeater delay can be decomposed into load independent (i) and load dependent (r d.c l ) components: d = i + r d.c l i(s i ) = α 0 + α 1.s 1 + α 2.s i 2  s i denotes input slew; α 0, α 1 and α 2 are the coefficient by quadratic regression  Drive resistance is nearly linear with input slew; also both the intercept and slope vary with repeater size r d = r d0 + r d1.s i  Output slew depends on load capacitance; slope is independent of input slew, while intercept depends linearly on it s o (c l, s i ) = s o0 + s 01.s i + s o2.c l  s o is the output slew, and s o0, s o1 and s o2 are the fitting coefficients from linear regression  c i is the input capacitance, w p, w n are PMOS and NMOS widths respectively, and η is a coefficient derived using linear regression with zero intercept c i = η × (w p + w n )

Wire Delay Model  For wire delay we use the model proposed by Pamunuwa et al. (cf. TVLSI03) which accounts for cross-talk  d w, r w, c g, c c, and c i respectively denote wire delay, wire resistance, ground capacitance, coupling capacitance and input capacitance of the next-stage repeater  λ i is a coefficient (i.e., based on SPICE simulation) due to switching patterns of the neighboring wires d w = r w.(0.4c g + (λ i.c c )/ c i )  We enhance the quality of the wire delay model by considering two other important factors that change wire resistance:  Scattering-aware resistivity (cf. Shi et al. ASPDAC06): ρ(w) = ρ B + K ρ /w w  w w is the wire width, ρ B =2.202 µΩ.cm, and K ρ =1.030× Ω.m 2  Interconnect barrier (cf. Mai et al. IEEE01)  t m, t b respectively are the metal and barrier thicknesses, l w is the length of the wire, and ρ is computed using the above equation r w = (ρ.l w ) / (t m - t b ).(w w - 2t b )

Repeater and Wire Delay Models  Model coefficient fit from data extracted from Liberty/LEF/Tech. files and other extrapolatable sources (i.e., PTM and ITRS) Drive Resistance Model – r(slew in ) Intrinsic Delay Model – i(slew in ) Output Slew Model – o(slew in, C L )  delay = i(slew in ) + r(slew in ) * C L  r(s) = f(size, slew in )  slew out = f(slew in,C L )  wire delay = Elmore

Repeater and Wire Power Models  Power is an important design objective and must be accounted for early in the design flow  Today, leakage and dynamic power are primary forms of power dissipation  Leakage has two main components: (1) sub-threshold leakage, and (2) gate-tunneling current  Both components depend linearly on device size p s = (p s n + p s p ) / 2 p s n = k 0 n + k 1 n.w n p s p = k 0 p + k 1 p.w p  Dynamic power can be calculated as: p d = a.c l.v dd 2.f c l = c i + c g + c c  p d, a, c l, v dd and f are dynamic power, activity factor, load capacitance, supply voltage and frequency, respectively  Load capacitance is composed of the input capacitance of the next repeater (c i ), ground (c g ) and coupling (c c ) capacitances of the wire driven

Repeater and Wire Area Models  For existing technologies, the area of a repeater can be calculated as: a r = τ 0 + τ 1.w n  a r denotes repeater area, τ 0 and τ 1 are coefficients using linear regression; w n and w p are widths of NMOS and PMOS, respectively  For future technologies, feature size (F), contacted pitch (CP), row height (RH), and row width (RW) can be used to estimate the area: NF = (w p + w n + 2.F) / RH RW = NF × (F + CP) + CP a r = RH × RW  Wiring area can be calculated as: a w = n × (w w + s w ) + s w  a w denotes wire area, n is the bit width of the bus, and w w and s w are wire width and spacing

Repeater Power and Area Models  Repeater area and power models fit from simulation data points  Area and leakage power are linear over the range of implementable repeater sizes (larger repeater sizes  higher leakage power)

Outline Motivation System-Level Communication Synthesis Buffered Interconnect Model  Interconnect Optimization  Validation and Significance Assessment  Conclusions

Interconnect Optimization: Buffering  Conventional delay-optimal buffering  unrealistic buffer sizes  high dynamic / leakage power  suboptimal  Our approach: iterative optimization of hybrid objective (power + delay)  Search for optimal number and size of repeaters  Can be extended for other interconnect optimizations (e.g., wire sizing and driver sizing) Pareto-optimal frontier of the power-delay tradeoff of a 5mm interconnect in 90nm / 65nm

Outline Motivation Communication Synthesis Buffered Interconnect Model Interconnect Optimization  Validation and Significance Assessment  Conclusions

Model Validation  Model comparison with results from physical implementation  {5mm wire} X {90nm, 65nm} X {wiring layers} X {design styles}  Model-predicted delays compared with delays from PrimeTime Deviation of proposed model from PrimeTime delays < 15%

Impact on System-Level Design  Testcases  VPROC: video processor with 42 cores and 128-bit datawidth  dVOPD: dual video object plane decoder with 26 cores and 128-bit datawidth  Original model (Orig.) underestimates power compared to the Proposed Model (Prop.)  Original Model is very optimistic in delay (i.e. the synthesis result may be actually infeasible). This could become more critical as technology scales and the chip size becomes larger than the critical sequential length.

Outline Motivation System-Level Communication Synthesis Buffered Interconnect Model Interconnect Optimization Validation and Significance Assessment  Conclusions

Conclusions and Future Directions  Accurate models can drive effective system-level exploration  Inaccurate models can lead to misleading design targets  Reproducible methodology for extracting inputs to models from reliable sources  More realistic buffering scheme, where power and area are considered in addition to delay  Modeling of NoC components besides wires  Across future nanometer technologies (45nm and beyond)  At different levels of abstractions  protocol encapsulation (e.g., hand-shaking for AMBA bus allocation)  buses, pipelined rings (e.g. EIB in IBM Cell)  routers, network interfaces  FIFOs, queues, crossbar switches (where ORION left off)  from high-level analytical models to low-level executable models  Extending to other metrics  Reliability estimation (i.e., error probability of transmission over wires)