Download presentation
Presentation is loading. Please wait.
Published byMark Howard Modified over 8 years ago
1
Case Study 5: Molecular Dynamics (MD) Simulation of a set of bodies under the influence of physical laws. Atoms, molecules, forces acting on them... Have same basic structure other n-body problems.
2
Molecular Dynamics (Skeleton) for some number of timesteps { for all molecules i for all other molecules j force[i] += f( loc[i], loc[j] ); for all molecules i loc[i] = g( loc[i], force[i] ); }
3
Molecular Dynamics (continued) To reduce amount of computation, account for interaction only with nearby molecules.
4
Molecular Dynamics (continued) for some number of timesteps { for all molecules i for all nearby molecules j force[i] += f( loc[i], loc[j] ); for all molecules i loc[i] = g( loc[i], force[i] ); }
5
Molecular Dynamics (continued) for each molecule i number of nearby molecules count[i] array of indices of nearby molecules index[j] ( 0 <= j < count[i])
6
Molecular Dynamics (continued) for some number of timesteps { for( i=0; i<num_mol; i++ ) for( j=0; j<count[i]; j++ ) force[i] += f(loc[i],loc[index[j]]); for( i=0; i<num_mol; i++ ) loc[i] = g( loc[i], force[i] ); }
7
Molecular Dynamics (simple) for some number of timesteps { #pragma omp parallel for for( i=0; i<num_mol; i++ ) for( j=0; j<count[i]; j++ ) force[i] += f(loc[i],loc[index[j]]); #pragma omp parallel for for( i=0; i<num_mol; i++ ) loc[i] = g( loc[i], force[i] ); }
8
Molecular Dynamics (simple) Simple to program. Possibly poor load balance –block distribution of i iterations (molecules) –could lead to uneven neighbor distribution –cyclic does not help Possible poor temporal locality/cache reuse
9
Better Load Balance Assign iterations such that each processor has ~ the same number of neighbors.
10
Molecular Dynamics (advanced) Array of assign records –size: number of processors –two elements: beginning i value (molecule) ending i value (molecule) Compute assign values such that each processor has ~ same number of neighbors.
11
Molecular Dynamics (continued) for some number of timesteps { #pragma omp parallel pr = omp_get_thread_num(); for( i=assign[pr]->b; i e; i++ ) for( j=0; j<count[i]; j++ ) force[i] += f(loc[i],loc[index[j]]); #pragma omp parallel for for( i=0; i<num_mol; i++ ) loc[i] = g( loc[i], force[i] ); }
12
Frequency of Balancing Every time neighbor list is recomputed. –once during initialization. –every iteration. –every n iterations. Extra overhead vs. better approximation and better load balance.
13
Locality Optimizations Organize assignment array so that each processor has iterations that work on the atoms/molecules that are close together. Tradeoff between locality and load balancing. Process the second parallel for loop in the same way.
14
Guided Self-Scheduling (GSS) Chooses granularity trying to take into account both –load balance, and –overhead. Does repeated block distribution over threads. Available in OpenMP (but again note the lack of user control).
15
Whats Next? Dealing with Sparse Matrices.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.