Monte Carlo Integration Using MPI Barry L. Kurtz Chris Mitchell Appalachian State University
Monitoring MPI Performance Goals We will use MPI We will parallelize the algorithm to increase accuracy We will parallelize the algorithm to increase speed We will vary the number of processors from 1 to 8 under these conditions Node performance monitoring Graphical plot of CPU usage on each node Separates out types of CPU tasks
Integration Using Monte Carlo Main idea Similar to the PI program demonstrated with MATLAB place random points in a rectangular area and find the percentage of points that satisfy the given criteria Our functions will be in the first quadrant only Variables Number of processors used The function being integrated The number of histories in the sample space The low and high range for the interval
Example: f(x) = 2 x2 Given the range 0 to 5 The analytic solution is 2/3 x3 evaluated from 0 to 5 giving 83 1/3 Sample Calculation: # Hits = 3 Total pts = 10 Area of rectangle = 250 Estimate of Integral 250*3/10 = 75
Parallelization Techniques Increase the number of points by giving each processor the specified number of points As number of processors increases we expect accuracy to increase due to the larger number of total points Computation time should not change dramatically Divide a specified number of points “equally” between the processors As number of processors increases we expect accuracy to stay the same Total computation time should decrease
Three Test Functions f(x) = 2x2 – Strictly increasing function g(x) = e-x – Strictly decreasing function h(x) = 2 + sin(x) – Oscillating function How will we find the area of the enclosing rectangle? Issues arise with finding maximum value of the given function on the given interval Think of a solution that could apply to all three functions given above
Finding the Maximum
MPI Code for Finding Max double findMax(double low, double high, double(*fp)(double)) { double i, interval, /* size of steps between tests */ result, /* function return*/ max = 0; /* holds max value thus far found */ interval = (high - low)/100; for(i = low; i < high; i += interval) result = fp(i); if(result > max) max = result; } return max;
MPI Initialization for Accuracy MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); ::: MPI_Bcast(&numHist, 1, MPI_INT, MASTER, MPI_COMM_WORLD); MPI_Bcast(&low, 1, MPI_DOUBLE, MASTER, MPI_Bcast(&high, 1, MPI_DOUBLE,MASTER,
MPI Code for Accuracy /* history calculation loop */ for(i = 0; i < numHist; i++ ) { x = ((double)random()/((double)(RAND_MAX) + (double)(1))); x *= (high - low); x += low; y = ((double)random()/((double)(RAND_MAX) + (double)(1))) * max; /* if point is below the function value, it's a hit */ if(y < fp(x)) /* fp is the function to be integrated */ hits++; } total++; /* calculate this process' estimate of function's area */ subArea = ((double)(hits)/(double)(total)) * (max * (high - low));
Gather the Data and Calculate the Result /* calculate total hits and histories generated by all processes */ MPI_Reduce(&hits, &allHits, 1, MPI_INT, MPI_SUM, MASTER, MPI_COMM_WORLD); MPI_Reduce(&total, &allTotal, 1, MPI_INT, MPI_SUM, MASTER, MPI_COMM_WORLD); if(rank == MASTER) { area = ((double)(allHits)/(double)(allTotal)) * (max * (high - low)); printf("\nArea of function between %5.3f and %5.3f is: %f\n", low, high, area);
MPI Initialization for Speed MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); ::: numHist = numHist/size; MPI_Bcast(&numHist, 1, MPI_INT, MASTER, MPI_COMM_WORLD); MPI_Bcast(&low, 1, MPI_DOUBLE, MASTER, MPI_Bcast(&high, 1, MPI_DOUBLE, MASTER,
What Are Your Predictions? Will accuracy increase linearly with the number of processors? Will the execution time decrease linearly with the number of processors? How important is the random number generation? Would you expect occasional anomalies?
The Performance Monitor Monitors Performance on a Local Cluster Separates the following types of CPU usage User % System % Easy % Total % Provides a quick, intuitive view of the load balancing for the algorithm distribution Developed at Appalachian State by Keith Woodie and Michael Economy
Results for Increasing Accuracy Number of Histories per processor = 10,000,000 # Processors Time Result ABS Err 1 2.469 83.4247 0.091366667 2 2.604 83.344062 0.010728667 3 2.470 83.3935 0.060166667 4 2.612 83.3531 0.019766667 5 2.637 83.344005 0.010671667 6 2.616 83.318346 0.014987333 7 2.602 83.344739 0.011405667 8 2.618 83.334184 0.000850667
Results for Increasing Speed Total Number of Histories = 10,000,000 # Processors Time Result ABS Err 1 2.524 83.335025 0.001691667 2 1.261 83.316375 0.016958333 3 0.838 83.296083 0.037250333 4 0.629 83.313425 0.019908333 5 0.515 83.263475 0.069858333 6 0.421 83.323333 0.010000333 7 0.361 83.26295 0.070383333 8 0.317 83.368875 0.035541667