Presentation is loading. Please wait.

Presentation is loading. Please wait.

Does HPC really fit GPUs ? Davide Rossetti INFN – Roma Napoli, 26-27 January 2010 APE group January 25-27 2010Incontro di.

Similar presentations


Presentation on theme: "Does HPC really fit GPUs ? Davide Rossetti INFN – Roma Napoli, 26-27 January 2010 APE group January 25-27 2010Incontro di."— Presentation transcript:

1 Does HPC really fit GPUs ? Davide Rossetti INFN – Roma davide.rossetti@roma1.infn.it Napoli, 26-27 January 2010 APE group January 25-27 2010Incontro di lavoro della CCR1

2 January 25-27 2010Incontro di lavoro della CCR2 A case study: Lattice QCD V. Lubicz – CSN4 talk, settembre 2009

3 January 25-27 2010Incontro di lavoro della CCR3 A case study: Lattice QCD ● Most valuable product is the gauge configuration ● Different types: N f, schemes ● Different sizes ● A grid-enabled community (www. ildg.org)www. ildg.org ● Storage sites ● Production sites ● Analysis sites ● Gauge configuration production really expensive !!!

4 January 25-27 2010Incontro di lavoro della CCR4 HPC in INFN ● Focus on compute intensive Physics (excluding LHC stuff): LQCD, Astro, Nuclear, Medical ● Needs for 2010-2015: ● ~ 0.01-1 Pflops for single research group ● ~ 0.1-10 Pflops nationwide ● Translates to: ● Big infrastructure (cooling, power, …) ● High procurement costs (€/Gflops) ● High maintenance costs (W/Gflops)

5 January 25-27 2010Incontro di lavoro della CCR5 ● Story begins with video games (Egri, Fodor et al. 2006) ● Wilson-Dirac operator at 120Gflops (K.Ogawa 2009) ● Domain Wall fermions (Tsukuba/Taiwan 2009) ● Definitive work: Quda lib (M.A.Clark et al. 2009): o Double, Single, Half-precision o Half-prec solver with reliable updates > 100Gflops o MIT/X11 Open Source License LQCD on GPU ?

6 INFN on GPUs  2D Spin models (Di Renzo et al, 2008)  LQCD Stag. fermions on Chroma (Cossu, D'Elia et al, Ge+Pi 2009)  Bio-Computing on GPU (Salina, Rossi et al, ToV 2010?)  Gravitational wave analysis (Bosi, Pg 2010?)  Geant4 on GPU (Caccia, Rm 2010?) January 25-27 2010Incontro di lavoro della CCR6

7 How many GPUs ? Raw estimate for memory footprint: Full solver in GPU Gauge field + 15 fermion fields No symmetry tricks No half-prec tricks January 25-27 2010Incontro di lavoro della CCR7 Lattice size Single precision memory (GiB) Double precision memory (GiB) # GTX280# Tesla C1060 # Tesla C2070 24 3 x4812.1311 32 3 x643.36.74-821-2 48 3 x96173417-355-93-6 64 3 x1285410855-11014-289-18

8 January 25-27 2010Incontro di lavoro della CCR8 If one GPU is not enough Multi-GPU, the Fastra II* approach:Fastra II ● Stick 13 GPUs together ● 12TFLOPS @ 2KW ● CPU threads feed GPU kernels ● Embarrassingly parallel → great!!! ● Full problem fits → good! ● Enjoy the warm weather * University of Antwerp, Belgium

9 January 25-27 2010Incontro di lavoro della CCR9 multi-GPUs need scaling! Seems easy: 1. 1-2-4 GPUs in 1-2U system (or buy Tesla M1060) 2. Stack many 3. Add an interconnect (IB, Myrinet 10G, custom) & plug accurately :) 4. Simply write your program in C+MPI+CUDA/OpenCL(+threads) Multi-node parallelism Single GPU kernel Multi-GPU mgmt

10 January 25-27 2010Incontro di lavoro della CCR10 Some near-term solutions for LQCD Two INFN approved projects: ● QUonG: cluster of GPUs with custom 3D torus network APEnet+ talk by R.Ammendola ● Aurora: dual Xeon 5500 custom blade with IB & 3D first-neighbor network

11 INFN assets ● 20 years of experience in high-speed 3D torus interconnects (APE100, APEmille, apeNEXT, APEnet) ● 20 years writing parallel codes ● Control over HW architecture vs. algorithms January 25-27 2010Incontro di lavoro della CCR11

12 PCIexpressPCIexpress PCIexpressPCIexpress Wish list for multi-GPU computing Open the GPU to the world: Provide APIs to hook inside your drivers Allow PCIe to PCIe DMAs or better … … add some high-speed data I/O port toward an external device (FPGA, custom ASIC) Promote GPU from simple accelerator to main computing engine status !! GPU DRAM Main Memory January 25-27 2010Incontro di lavoro della CCR12

13 January 25-27 2010Incontro di lavoro della CCR13 In conclusions ● GPUs are good at small scales ● Scaling from single GPU, to multi-GPU, to multi- node, hierarchy deepen ● Programming complexity increases ● Watch GPU → Network latency ● Please, help us link your GPU to our 3D network !!! Game over :)


Download ppt "Does HPC really fit GPUs ? Davide Rossetti INFN – Roma Napoli, 26-27 January 2010 APE group January 25-27 2010Incontro di."

Similar presentations


Ads by Google