Die Stacking (3D) Microarchitecture -- from Intel Corporation EEL6935 Paper Presentation Wang, Dexiang
3D Stacking Structure:
3D Stacking Recent Application: Embedded processor systems: Goals: low power, small die area 3D Architecture Forum: “ 3D Architectures for Semiconductor Integration and Packaging”: dedicated to transforming 3D design research ideas into products.
3D stacking advantages and challenge: 1. Wire length ↓ → latency ↓ 2. Higher inter-die bandwidth 3. Power consumption ↓ (30% consumed by wires in former design) Challenge: Thermal density maybe ↑ Trade-off: Performance, Power, Die Area and Hotspot Temperature
Configuration 1: Memory + Logic Original Baseline
Configuration 1: Memory + Logic (cont.) 3D Variations
Configuration 1: Memory + Logic (cont.) Simulation Infrastructure (heavy data access): Multi-threaded application → A full system multi-processor simulator / trace generator → trace record (cpu id, memory access address, instruction pointer address, unique identification number of an earlier trace) → trace driven multi-processor memory hierarchy simulator → CPMA (cycles per memory access) Benchmarks: RMS (Recognition, Mining and Synthesis) – Two Threaded Two categories: Application and Kernels Each benchmark: 1 billion total memory references (2.5 billion executed instructions)
Configuration 1: Memory + Logic (cont.) Applications Kernels
Configuration 2: Logic + Logic 2D to 3D transform: By 3D design method, the number of pipeline stage can shrink and the power consumption can be improved by reducing number of latches, repeaters and so on
Configuration 2: Logic + Logic (Cont.) Simulation Model: Tool: Single threaded microarchitecture performance simulator (developed by the Pentium 4 design team), model the wire delays due to block interconnections Benchmarks: over 650 single thread benchmark traces (SPECINT, SPECFP, hand written kernels, multimedia, internet, productivity and workstation applications)
Thermal Model: Chip-generated heat → IHS (integrated heat spreader) / Heat Sink → Forced Convection / Nature Convection
Thermal Formula: Boundary condition: Energy equation (Based one Fourier’s law): T is a function of space (x,y,z) and time (t) Boundary condition:
Thermal simulation method: FEM FEM (Finite Element Method) is a mathematic way to obtain approximate solution to partial differentiation equations, which partitions the spatial variables into small elements and transforms the differentiation equations into a great deal of linear algebra equation groups attached to each element to find the approximate solutions. The boundary and initial conditions is used to give sufficient spatial and time information to get the unique solution. The smaller the elements are partitioned, the accurate the results can be obtained.
Heat Dissipation Sensitivity: Because Cu Metal layer has more thermal sensitivity than actual bonding layer, 3D stacking is not fundamental thermal limitation.
Simulation Parameters for Memory + Logic: Microarchitecture Parameters for Intel Core 2 Duo
Simulation results for Memory + Logic Stacking: On average, increasing the L2 cache from 4MB to 32MB would reduce off-die BW requirement by 3x and CMPA by 13% with peak CMPA reduction of 50%
Simulation results for Memory + Logic Stacking: (cont.) Thermal Analysis Total power: 92W, FP units, RS and load/store units will consume more power
Simulation results for Memory + Logic Stacking: (cont.) 3D DRAM is low power compared to DDR3 because the 3D to die interconnect is much lower power than traditional off-die I/O
Simulation results for Memory + Logic Stacking: (cont.)
Simulation results for Logic + Logic Stacking: In average, 25% of all pipe stages are eliminated, resulting in 15% performance improvement
Simulation results for Logic + Logic Stacking: (Cont.) The risk of 3D stacking Carefully placing blocks and iterative optimization would lead to 1.3x power density (14 c temp increase); Worst case would lead to 2.0 power density (26 c temp increase)
Trade-off between performance and temperature: We can trade performance for a better thermal condition: 15% performance gain / 15% power reduction → 14 c temperature increase 34% power reduction / 8% performance gain → no peak temperature change 54% power reduction / 0% performance gain → 22 c temperature reduction
Question?