Download presentation
Presentation is loading. Please wait.
Published byKristin Perry Modified over 9 years ago
1
GPU Cluster for Scientific Computing Zhe Fan, Feng Qiu, Arie Kaufman, Suzanne Yoakum-Stover Center for Visual Computing and Department of Computer Science, Stony Brook University http://www.cs.sunysb.edu/~vislab/projects/gpgpu/GPU_Cluster/GPU_Cluster.html Stony Brook Visual Computing Cluster GPU Cluster GPU Cluster 35 nodes with nVIDIA GeForce FX 5800 Ultra 35 nodes with nVIDIA GeForce FX 5800 Ultra Gigabit Ethernet Gigabit Ethernet 70 Pentium Xeon 2.4GHz CPUs 70 Pentium Xeon 2.4GHz CPUs 35 VolumePro 1000 35 VolumePro 1000 9 HP Sepia-2A with ServerNet II 9 HP Sepia-2A with ServerNet II Stony Brook Visual Computing Cluster GPU Cluster GPU Cluster 35 nodes with nVIDIA GeForce FX 5800 Ultra 35 nodes with nVIDIA GeForce FX 5800 Ultra Gigabit Ethernet Gigabit Ethernet 70 Pentium Xeon 2.4GHz CPUs 70 Pentium Xeon 2.4GHz CPUs 35 VolumePro 1000 35 VolumePro 1000 9 HP Sepia-2A with ServerNet II 9 HP Sepia-2A with ServerNet II LBM on the GPU Application: large-scale CFD simulations using Lattice Boltzmann Model (LBM) Boltzmann Model (LBM) LBM Computation: Particles stream along lattice links Particles stream along lattice links Particles collide when they meet at a site Particles collide when they meet at a site Map to GPU: Pack 3D lattice states into a series of 2D textures Pack 3D lattice states into a series of 2D textures Update the lattice with fragment programs Update the lattice with fragment programs LBM on the GPU Application: large-scale CFD simulations using Lattice Boltzmann Model (LBM) Boltzmann Model (LBM) LBM Computation: Particles stream along lattice links Particles stream along lattice links Particles collide when they meet at a site Particles collide when they meet at a site Map to GPU: Pack 3D lattice states into a series of 2D textures Pack 3D lattice states into a series of 2D textures Update the lattice with fragment programs Update the lattice with fragment programs Scale up LBM to the GPU Cluster Each GPU computes a sub-lattice Each GPU computes a sub-lattice Particles stream out of the sub-lattice Particles stream out of the sub-lattice 1.Gather particle distributions in a texture 2.Read out from GPU in a single operation 3.Transfer through GigaE (MPI) 4.Write into neighboring GPU nodes Network performance optimization: Network performance optimization: 1.Conduct network transfer while computing 2.Schedule to reduce the likelihood of interruption 3.Simplify the connection pattern Scale up LBM to the GPU Cluster Each GPU computes a sub-lattice Each GPU computes a sub-lattice Particles stream out of the sub-lattice Particles stream out of the sub-lattice 1.Gather particle distributions in a texture 2.Read out from GPU in a single operation 3.Transfer through GigaE (MPI) 4.Write into neighboring GPU nodes Network performance optimization: Network performance optimization: 1.Conduct network transfer while computing 2.Schedule to reduce the likelihood of interruption 3.Simplify the connection pattern Times Square Area of NYC Flow Streamlines 0.31 second / step on 30 GPUs 0.31 second / step on 30 GPUs 4.6 times faster than software version on 30 CPUs 4.6 times faster than software version on 30 CPUs Times Square Area of NYC Flow Streamlines 0.31 second / step on 30 GPUs 0.31 second / step on 30 GPUs 4.6 times faster than software version on 30 CPUs 4.6 times faster than software version on 30 CPUs Acknowledgements NSF CCR0306438 NSF CCR0306438 Department of Homeland Security, Environment Measurement Lab Department of Homeland Security, Environment Measurement Lab HP HP Terarecon TerareconAcknowledgements NSF CCR0306438 NSF CCR0306438 Department of Homeland Security, Environment Measurement Lab Department of Homeland Security, Environment Measurement Lab HP HP Terarecon Terarecon GPU Cluster / CPU Cluster Speedup Each node computes an 80 x 80 x 80 sub-latticeEach node computes an 80 x 80 x 80 sub-lattice GeForce FX 5800 Ultra / Pentium Xeon 2.4GHzGeForce FX 5800 Ultra / Pentium Xeon 2.4GHz GPU Cluster / CPU Cluster Speedup Each node computes an 80 x 80 x 80 sub-latticeEach node computes an 80 x 80 x 80 sub-lattice GeForce FX 5800 Ultra / Pentium Xeon 2.4GHzGeForce FX 5800 Ultra / Pentium Xeon 2.4GHz Dispersion Plume 1.66 km x 1.13 km1.66 km x 1.13 km 91 blocks91 blocks 851 buildings851 buildings 480 x 400 x 80 lattice 480 x 400 x 80 lattice and Large-Scale Simulation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.