Kyle Spafford Jeremy S. Meredith Jeffrey S. Vetter

Kyle Spafford Jeremy S. Meredith Jeffrey S. Vetter http://ft.ornl.gov

2Managed by UT-Battelle for the U.S. Department of Energy S3DDCA++ Early Work

3Managed by UT-Battelle for the U.S. Department of Energy “ An experimental high performance computing system of innovative design. ” “ Outside the mainstream of what is routinely available from computer vendors. ” -National Science Foundation, Track2D Call Fall 2008

4Managed by UT-Battelle for the U.S. Department of Energy Keeneland ID @ GT\ORNL

5Managed by UT-Battelle for the U.S. Department of Energy Inside a Node 4 Hot plug SFF (2.5”) HDDs 1 GPU module in the rear, lower 1U 2 GPU modules in upper 1U Dual 1GbE Dedicated management iLO3 LAN & 2 USB ports VGA UID LED & Button Health LED Serial (RJ45) Power Button QSFP (QDR IB) 2 Non-hot plug SFF (2.5”) HDD

6Managed by UT-Battelle for the U.S. Department of Energy Node Block Diagram DDR3 PCIe x16 CPU GPU (6GB) RAM QPI Infiniband QPI I/O Hub GPU (6GB) integrated PCIe x16 QPI

7Managed by UT-Battelle for the U.S. Department of Energy Why a dual I/O hub? 8 8 8 GPU #0 GPU #1 PCIe Switch Tesla 1U IOH

8Managed by UT-Battelle for the U.S. Department of Energy Why a dual I/O hub? 8 8 8.0 GPU #0 GPU #1 PCIe Switch Tesla 1U IOH Bottleneck!

9Managed by UT-Battelle for the U.S. Department of Energy Why a dual I/O hub? 8 8 8.0 GPU #0 GPU #1 PCIe Switch Tesla 1U IOH 8.0 CPU #0 CPU #1 GPU #1 GPU #2 12.8 IOH GPU #0 8.0 12.8 IOH Bottleneck!

10Managed by UT-Battelle for the U.S. Department of Energy Introduction of NUMA 8.0 CPU #0 GPU #1 IOH 12.8 IOH CPU #0 GPU #0 IOH 12.8 8.0 Short Path Long Path

11Managed by UT-Battelle for the U.S. Department of Energy Bandwidth Penalty CPU #0 H->D Copy

12Managed by UT-Battelle for the U.S. Department of Energy Bandwidth Penalty CPU #0 D->H Copy ~2 GB/s

13Managed by UT-Battelle for the U.S. Department of Energy Other Benchmark Results MPI Latency – 26% penalty for large messages, 12% small messages SHOC Benchmarks – Mismap penalty shown below – gives this effect context

14Managed by UT-Battelle for the U.S. Department of Energy Given a Multi-GPU app, how should processes be pinned?

15Managed by UT-Battelle for the U.S. Department of Energy Given a Multi-GPU app, how should processes be pinned? 0 0 1 1 2 2

16Managed by UT-Battelle for the U.S. Department of Energy CPU #1 CPU #0 GPU #1 IOH Infiniband GPU #0 GPU #2 Maximize GPU Bandwidth

17Managed by UT-Battelle for the U.S. Department of Energy CPU #1 CPU #0 GPU #1 IOH Infiniband GPU #0 GPU #2 0 0 1 1 2 2 Maximize GPU Bandwidth

18Managed by UT-Battelle for the U.S. Department of Energy CPU #1 CPU #0 GPU #1 IOH Infiniband GPU #0 GPU #2 0 0 1 1 2 2 Maximize MPI Bandwidth

19Managed by UT-Battelle for the U.S. Department of Energy CPU #1 CPU #0 GPU #1 IOH Infiniband GPU #0 GPU #2 0 0 1 1 2 2 Maximize MPI Bandwidth Pretty easy, right?

20Managed by UT-Battelle for the U.S. Department of Energy Pinning with numactl numactl --cpunodebind=0 --membind=0./program

21Managed by UT-Battelle for the U.S. Department of Energy if [[ $OMPI_COMM_WORLD_LOCAL_RANK == "2" ]] then numactl --cpunodebind=1 --membind=1./prog else if [[ $OMPI_COMM_WORLD_LOCAL_RANK == "1" ]] then numactl --cpunodebind=1 --membind=1./prog else # rank = 0 numactl --cpunodebind=0 --membind=0./prog fi 0-1-1 Pinning with numactl 0-1-1 Pinning with numactl

22Managed by UT-Battelle for the U.S. Department of Energy HPL Scaling Sustained MPI and GPU ops Uses other CPU cores via Intel MKL

23Managed by UT-Battelle for the U.S. Department of Energy What Happened with 0-1-1? CPU #1 CPU #0 0 0 1 1 2 2 MPI Tasks

24Managed by UT-Battelle for the U.S. Department of Energy What Happened with 0-1-1? 0 0 1 1 2 2 CPU #0 CPU #1 MPI Tasks

25Managed by UT-Battelle for the U.S. Department of Energy What Happened with 0-1-1? 0 0 1 1 2 2 CPU #0 CPU #1 MPI Tasks MKL Threads

26Managed by UT-Battelle for the U.S. Department of Energy What Happened with 0-1-1? 0 0 1 1 2 2 CPU #0 CPU #1 MPI Tasks MKL Threads Threads inherit pinning!

27Managed by UT-Battelle for the U.S. Department of Energy What Happened with 0-1-1? 0 0 1 1 2 2 CPU #0 CPU #1 MKL Threads

28Managed by UT-Battelle for the U.S. Department of Energy What Happened with 0-1-1? 0 0 1 1 2 2 CPU #0 CPU #1 MKL Threads Two idle cores, 1 oversubscribed socket!

29Managed by UT-Battelle for the U.S. Department of Energy NUMA Impact on Apps

30Managed by UT-Battelle for the U.S. Department of Energy Well… time

31Managed by UT-Battelle for the U.S. Department of Energy Can we improve utilization by sharing a Fermi among multiple tasks?

32Managed by UT-Battelle for the U.S. Department of Energy Bandwidth of Most Bottlenecked Task

33Managed by UT-Battelle for the U.S. Department of Energy Is the second IO hub worth it?

34Managed by UT-Battelle for the U.S. Department of Energy Is the second IO hub worth it? Aggregate bandwidth to GPUs is 16.9 GB/s What about real app behavior? – Scenario A: “HPL” -- 1 MPI & 1 GPU task per GPU – Scenario B: A + 1 MPI for each other core

35Managed by UT-Battelle for the U.S. Department of Energy Contention Penalty

36Managed by UT-Battelle for the U.S. Department of Energy Puzzler – Pinning Redux Do ranks 1 and 2 always have a long path?

37Managed by UT-Battelle for the U.S. Department of Energy Puzzler – Pinning Redux Do ranks 1 and 2 always have a long path? CPU #0 GPU #1 IOH

38Managed by UT-Battelle for the U.S. Department of Energy Puzzler – Pinning Redux Do ranks 1 and 2 always have a long path? CPU #0 GPU #1 IOH CPU #1 Infiniband IOH

39Managed by UT-Battelle for the U.S. Department of Energy Split MPI and GPU – MPI Latency

40Managed by UT-Battelle for the U.S. Department of Energy Split MPI and GPU – PCIe bandwidth

41Managed by UT-Battelle for the U.S. Department of Energy Takeaways Dual IO hubs deliver – But add complexity Ignoring the complexity will sink some apps – Wrong pinning sunk HPL – Bandwidth bound kernels & “function offload” apps Threads and libnuma can help – but can be tedious to use

42Managed by UT-Battelle for the U.S. Department of Energy Thanks!kys@ornl.govhttp://kylespafford.com/

Kyle Spafford Jeremy S. Meredith Jeffrey S. Vetter

Similar presentations

Presentation on theme: "Kyle Spafford Jeremy S. Meredith Jeffrey S. Vetter"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kyle Spafford Jeremy S. Meredith Jeffrey S. Vetter

Similar presentations

Presentation on theme: "Kyle Spafford Jeremy S. Meredith Jeffrey S. Vetter"— Presentation transcript:

Similar presentations

About project

Feedback