Download presentation
Presentation is loading. Please wait.
Published byRoger Goodman Modified over 8 years ago
1
Parallel Scaling of parsparsecircuit3.c Tim Warburton
2
1 process per node In these tests we only use one out of two processors per node.
3
blackbear: 16 processors, 16 nodes
4
Apart from the mpi_allreduce calls, this is an almost perfect picture of parallelism
5
2 Processes Per Node We use both processors on each node
6
blackbear 8 nodes, 16 processes Notice, the prevelance of waitany. Clearly this code is not working as well as it does when running with 1 process per node.
7
blackbear 8 nodes, 16 processes (zoom in) I suspect that the threaded mpi communicators for the unblocked isend and irecv are competing for cpu time with the user code. Also – there could be competition for the memory bus and the network bus between the processors.
8
Timings for M=1024 (N=1024^2) (blackbear –O3) nodesNprocswallclock time 1219.4909 249.85369 485.01486 8163.19801 16323.77791 1119.2675 2210.2486 445.43999 882.79451 16 1.43782
9
Timings for Two Processes Per Nodes on Los Lobos nodesNprocswallclock time 12 8.9453 24 4.47474 48 2.17246 816 1.15644 Timings courtesy of Zhaoxian Zhou
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.