Download presentation
Presentation is loading. Please wait.
Published byBennett Terry Modified over 9 years ago
1
Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling
Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan Technological, Texas A&M} University ICCAD 2010
2
Outline Introductions Backgrounds
GPU-based full-chip thermal analysis with microchannels Preconditioned iterative method on GPU Experimental results and conclusions
3
Introduction Effective thermal management for 3D-ICs is becoming increasingly challenging. Increasing power density and chip design complexity. Traditional heat sinks are expected to quickly reach their limits for meeting the cooling needs of 3D-ICs.
4
Introduction (cont.) The integrated on-chip microchannel cooling has been considered as a very promising solution. i.e. liquid cooling An experiment on a liquid-cooled 2D-IC. Peak on-chip temperature: from 85℃ to 57℃ Maximum temperature variation: from 25℃ to 6℃
6
Introduction (cont.) Existing design and optimization procedure for integrated microchannels are performed without considering the full-chip thermal profiles. May not provide the most “economic” solution Drawbacks: design complexity, packaging cost, etc. Hence, a comprehensive design and optimization flow should be closely coupled with the full-chip thermal analysis.
7
Why GPUs? Finite difference (FD) method is more suitable for general 3D full-chip thermal simulations. Accurate 3D thermal analysis in a full-chip scale using FD method can be very expensive, which requires solving a huge linear system of equations including multi-million unknowns.
8
Why GPUs? (cont.) GPU-based parallel computing has been employed in various electrical design automation areas. Advantages High computing power in large-scale homogeneous computing, i.e. matrix multiplications Significantly high memory bandwidth
9
Contributions Proposes novel GPU-based full-chip thermal simulation methods for 3D-ICs with integrated microchannel cooling GPU-friendly data structures and algorithm flows Proposes a GPU-friendly two-step block relaxation scheme that integrates block-based vertical-line relaxations and liquid-flow-direction relaxations. Achieves good speedup. More than 35x fast to the CPU-based solver More than 360x fast to the direct solution solver
10
Background – liquid cooling in 3D ICs
The liquid-cooled microchannels are typically integrated inside a wafer-level package, where the microchannels are connected to the liquid inlets and outlets using fluidic through silicon vias (TSVs). The heat flux can be more effectively removed than ever before since the thermal resistance of such integrated liquidcooled heat sinks can be much lower than the thermal resistance of the traditional fan-cooled heat sinks.
12
Background – finite difference (FD) method
Replacing derivative expressions with approximately equivalent difference quotients to approximate the solutions to differential equations. For some small h
13
Background – full-chip thermal simulation
Discretize the PDE of the original thermal circuit analysis problem by FD method. Solve GT = b where G is the thermal resistance matrices. b is the information about the environment.
14
Background – GPU programming
15
Architecture of Nvidia GTX280
A collection of 30 multiprocessors, with 8 streaming processors each. The 30 multiprocessors share one off-chip global memory. Access time: about 300 clock cycles Each multiprocessor has a on-chip memory shared by that 8 streaming processors. Access time: 2 clock cycles
16
About some differences between GPU and CPU
GPU (NVIDIA GeForce 8800 GTX) CPU (Intel Pentium 4) flops 345.6G ~12G Memory bandwidth 86.4GB/s (900MHz memory clock, 384 bit interface, 2 issues) 6.4GB/s (800MHz memory clock, 32 bit interface, 2 issues) Access time of global memory Slow (about 500 memory clock cycles) Fast (about 5 memory clock cycles)
17
GPU-based full-chip thermal analysis with microchannels
Many things need to be considered for obtaining the most “economic” microchannel designs. Pumping power, placement, sizing, … Fine-grained thermal modeling and analysis including microchannel cooling is non-trivial due to the high modeling complexity and simulation costs. Model extraction cost and thermal simulation cost The characteristic is matched for GPU.
18
The proposed two-step block relaxation scheme
Considers two directions (Z and Y) of heat dissipations.
19
Details In the first step, the nodes that are included in a block of vertical lines are selected for doing relaxations (lines L1 to L3 shown in Fig. 4). Such relaxations allow fast solution updates in the vertical heat dissipation paths within the block. In the second step, a few relaxations in the microchannel routing direction (liquid-flow direction) are performed to allow heat solution updates in the liquid-flow direction.
20
But why? Efficiencies of typical iterative methods usually depend on
Efficiency of the sparse matrix-vector operations Effectiveness of the relaxation (iteration) scheme Existing iterative algorithms only focus vertical heat dissipations. Horizontal (plane) dissipations in traditional 2D ICs are negligible for relatively small thermal conductance But not in 3D ICs
21
Preconditioned iterative method on GPU
Two critical issues about run time. Matrix representation format Convergence rate of iterative method Use and ELL-like format and preconditio-ning technique.
22
Matrix representation format
GPU-based computations should guarantee that most of the global memory accesses are coalesced so that efficient data structure and its related memory accesses should be carefully designed. Use three 1D vector to fully represent the sparse matrix and fit memory coalescing. Diagonal, off-diagonal and its corresponding indices 2x to 3x compared with CSR format.
23
Example
24
Conjugate gradient (CG) method
The CG method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positive-definite. The CG is an iterative method, so it can be applied to sparse systems that are too large to be handled by direct methods such as the Cholesky decomposition. Such systems often arise when numerically solving partial differential equations Minimize Assuming exact arithmetics, CG converges in at most n steps where n is the size of the matrix of the system (here n=2).
25
Preconditioning Conjugate gradient (CG) method takes too much iterations since the matrix is usually ill-conditioned. Condition number Moreover, the total runtime can be even greater than CG if the preconditioning method is bad or high runtime cost. Though #iteration is less Three ways for comparison CG, diagonal preconditioned (DP)CG, multi-grid preconditioned (MGP)CG
26
Preconditioning (cont.)
Preconditioning is a procedure of an application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solution. Preconditioned system Preconditioned iterative method Practical preconditioner
27
Multi-grid preconditioner
Actually not that clear but the idea is to coarsen the grid to reduce complexity.
28
Experimental results Environment Intel Core 2 Quad 2.66GHz with one NVIDIA GeForce GTX 285 DRAM: 6G for CPU, 2G for GPU C++ and CUDA on Linux Inlet water temperature: 50℃ A set of 3D design stack 6 2D dies. Convergence criterion of iterative solver: residual norm < 10^-6. The error is negligible.
29
Experimental results (cont.)
Traditional smoothing is vertical line smooth. Significant speedup of at least 35x.
30
Conclusions Proposes GPU-based thermal simulation methods of 3D ICs with integrated liquid-cooled microchannels. GPU-friendly two-step block-based relaxation scheme. Highly accurate results with significant speed-up.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.