The Interconnect Delay Bottleneck
Interconnect delay Wire delay does not scale with technology, Relative delay is growing even for optimized interconnects Wire delay does not scale with technology, literature says. Is it true?
Experimental Setup Switch 0 Switch 1 The link performance was explored with an experimental setup consisting of a 2-switch test architecture tuning the following physical synthesis parameters: The link length spanned from 1.5mm to 10mm Up to 9 pipeline stages inserted The channel width ranged from 250um to 10um by means of non-routable obstructions. Target frequency for synthesis: from 250MHz to 1Ghz. Two technology libraries utilized: Low-Power Low-Vth 65nm Low-Power Std-Vth 45nm (so that buffers have almost the same delay and the net impact of wire parasitics is pointed out) Switch 1 Switch 0 LINK LENGTH
Link Performance (1) SW 1 LINK LENGTH The performance of the link degrades by incrementing the inter-switch spacing In 65nm even a loose target of 250MHz is not achieved for 8mm links while 1GHz is hardly affordable at 1.5mm. In 45nm the synthesis tool does not achieve the 65nm performance even for the shorter links.
Link Buffer Distribution The place&route in 45nm has required a much higher number of buffer cells with high driving strength. Physical properties of on-chip interconnects in 45nm are responsible for the performance degradation!
Link Performance (2) Let us now use a more modern synthesis flow based on placement-aware logic synthesis (hereafter named the “topographical synthesis”). SW 1 LINK LENGTH A relevant perfomance speedup is achievable by utilizing a topographical approach: 45nm library outperforms 65nm library for long links and aggressive speeds The awereness of back-end information becomes a must in 45nm and beyond
Link Pipelining The required number of pipeline stages to meet the target speed of 1Ghz on the link was determined with incremental place&route steps: Library 1.5mm 3mm 8mm 10mm 45nm 1 2 7 9 65nm 8 => Pipeline stages are inserted manually so to break the link into segments of equal length. Interestingly the trend for both the 45nm and 65nm library is the same! Switch 1 Switch 0
Gate delay Let S be the scaling factor (S=0.7): Load capacitance Voltage swing of interest Device delay Drive current of the device
Gate delay Shrinking of geometries Power and delay reduction Constant power density
Ideal scaling of MOS transistors Smaller interconnect yields larger delays due to the decreasing cross-sectional area L Very high level model which neglects sidewall coupling and fringing capacitances W H tdi E dielectric substrate There are two interconnect scaling scenarios: Local interconnects (10-500 um at 0.18 um) (length scale set by the size of a gate) Global interconnects (length scale set by functional unit size and chip edge) Thickness permittivity resistivity
Interconnect scaling Let S be the scaling factor (S=0.7): Ideal scaling: Horizontal and vertical dimensions are equally scaled to preserve packing density To preserve packing density For process integration For process integration Driven by gate shrinking Bad degradation! Tolerable RC stays constant in spite of the scaling trend Reliability problems
Interconnect scaling Improvement by means of: Quasi ideal scaling: wires scaled more in the horizontal rather than the vertical direction, so that: RC delay tracks S closer!! packing density preserved To preserve packing density To reduce resistance To keep capacitance limited Better than ideal scaling Should scale slightly, but sidewall capacitance accounted for. Tracks s closer Better
Interconnect scaling Ideal scaling: Horizontal and vertical dimensions are equally scaled to preserve packing density Increases with die size Degradation not tolerable
Interconnect scaling Improvement by means of: Constant dimension scaling: By maintaining wide and thick wires at the higher metal levels, RC delay can be controlled routing density penalized! Interconnect size unaffected Thanks to constant cross-section area Thanks to constant width and ILD Just the impact of increased wirelength. Much better!! But still this is a reverse scaling!!
0.13 um Cu interconnect stack
A cross-layer concern The physical-layer tricks documented above are complemented by other techniques to tackle the interconnect delay bottleneck: Migration to new bus architectures Link pipelining Placement-aware logic synthesis