Presentation is loading. Please wait.

Presentation is loading. Please wait.

APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,

Similar presentations


Presentation on theme: "APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,"— Presentation transcript:

1 APE group Many-core platforms and HEP experiments computing davide.rossetti@roma1.infn.it XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1, 2011

2 APE group APE in a few words Our mission is providing solutions for theoretical numerical computing in INFN Our focus is in HPC architectures Lattice QCD is our main application, but historically other topics actively pursued (Glasses, CFD, Weather) 31/5/2011D.Rossetti for SuperB @ Elba2

3 APE group Multi- vs Many- core processor architectures Multi-coreMany-core 31/5/2011D.Rossetti for SuperB @ Elba3 It’s your laptop processor! It’s your laptop GPU!

4 APE group So what ? What are the differences ? Why should we bother ? 31/5/2011D.Rossetti for SuperB @ Elba4

5 APE group Intel Westmere-EX 31/5/2011D.Rossetti for SuperB @ Elba5 Lot’s of caches!!! Few processing: 4 FP units are probably 1 pixel wide !!!

6 APE group NVidia GPGPU 31/5/2011D.Rossetti for SuperB @ Elba6 Lot’s of computing units !!!

7 APE group So what ? What are the differences ? Why should we bother ? 31/5/2011D.Rossetti for SuperB @ Elba7 They show different trade-offs !! And the theory is…..

8 APE group Where the power is spent 31/5/2011D.Rossetti for SuperB @ Elba8 “chips are power limited and most power is spent moving data around”* 4 cm 2 chip 4000 64bit FPU fit Moving 64bits on chip == 10FMAs Moving 64bits off chip == 20FMAs *Bill Dally, Nvidia Corp. talk at SC09

9 APE group So what ? What are the differences? Why should we bother? 31/5/2011D.Rossetti for SuperB @ Elba9 Today: at least a factor 2 in perf/price ratio Tomorrow: CPU & GPU converging, see current ATI Fusion

10 APE group Cost Effective processing solution… GPU is an accelerator, it needs a GPU!!! –A cluster –A dual-socket node –Accelerated with 1-8 GPUs per node –Network 31/5/2011D.Rossetti for SuperB @ Elba10 Flexible: 1 GPU can be shared by multiple processes 1 process can use multiple GPUs

11 APE group With latest top GPUs… 31/5/2011D.Rossetti for SuperB @ Elba11 Dell PowerEdge C410x

12 APE group What is it for? Previous platform is good for throughput computing (capacity) –SuperB design ? –SuperB offline ? –Some Lattice QCD What else ? –SuperB online ? ROM ? –Grand challenge LQCD 31/5/2011D.Rossetti for SuperB @ Elba12 Strong scaling to multiple nodes on a single problem Capability

13 APE group The traditional flow 31/5/2011D.Rossetti for SuperB @ Elba13 NetworkCPUGPU Director kernel calc CPU memoryGPU memory transfer

14 APE group Improved network 31/5/2011D.Rossetti for SuperB @ Elba14 Network Processor CPUGPU Director kernel CPU memoryGPU memory transfer

15 APE group APEnet+ interconnect 31/5/2011D.Rossetti for SuperB @ Elba15 router 7x7 ports switch tor us link tor us link tor us link tor us link tor us link tor us link tor us link tor us link tor us link tor us link tor us link tor us link TX/RX FIFOs & Logic routin g logic arbite r X+X+ X-X- Y+Y+ Y-Y- Z+Z+ Z-Z- PCIe X8 Gen2 core NIOS II processor collective communicati on block memory controller DDR3 Module DDR3 Module 128@250MHz bus PCIe X8 Gen2 8@5 Gbps 100/1000 Eth port Altera Stratix IV FPGA blocks 3 link test board 3D Torus, scaling up to thousands of nodes packet auto-routing 6 x 30+30Gbps links PCIe X8 gen2 (peak BW 4+4 GB/s) A Network Processor Powerful zero-copy RDMA CPU interface On-board processing Experimental direct GPU interface SW: MPI (high-level), RDMA API (low-level)

16 APE group Some eye candies … 31/5/2011D.Rossetti for SuperB @ Elba16 APEnet+ final board, 4+2 links Cable options: copper or fibre

17 APE group Our reference platform 31/5/2011D.Rossetti for SuperB @ Elba17 Today: 7 GPU nodes with Infiniband for applications development: 2 C1060 + 3 M2050 + S2050 2 nodes HW devel: C2050 + 3 links card APEnet+ Next steps, green and cost effective system within 2011 Elementary unit: multi-core Xeon (packed in 2 1U rackable system) S2070 FERMI GPU system (4 TFlops) 2 APEnet+ board 42U rack system: 60 TFlops/rack peak 25 kW/rack (i.e. 0.4 kW/TFlops) 300 k€/rack (i.e. 5 K€/Tflops)

18 APE group As of SuperB A GPU-accelerated APEnet+ cluster: Highly compact MC simulation engine APEnet+ FPGA has lots of free space… with optical cables, usable for Readout Module test-bed ? Could it be the prototype of the SuperB computing platform ? 31/5/2011D.Rossetti for SuperB @ Elba18

19 APE group Game over… Thank you for your patience! 31/5/2011D.Rossetti for SuperB @ Elba19


Download ppt "APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,"

Similar presentations


Ads by Google