Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Random Number Generation

Similar presentations


Presentation on theme: "Parallel Random Number Generation"— Presentation transcript:

1 Parallel Random Number Generation
Ashok Srinivasan Florida State University If random numbers were really random, then parallelization would not make any difference … and this talk would be unnecessary But we use pseudo-random numbers, which only pretend to be random, and this causes problems These problems can usually be solved if you use SPRNG! Thanks for introduction PPRNG: Use of random numbers on a parallel machine Parallelization matters because the numbers are only pseudo-random Important for massively parallel machines Goal: Acquaint audience with what can go wrong, and how to avoid it by using the SPRNG library that we developed a long time back

2 Outline Introduction Random Numbers in Parallel Monte Carlo
Parallel Random Number Generation SPRNG Libraries Conclusions Intro – PRNG concepts and terminology Common use of PRNGs in parallel applications PRNG parallelization techniques and potential problems SPRNG Libraries – makes things simple Conclusions – rules of thumb

3 Introduction Applications of Random Numbers Terminology
Desired Features Common Generators Errors Due to Correlations Mention common applications Introduce the idea of PRNG and terminology Describe desired features Some common generators Show how errors are possible 3

4 Applications of Random Numbers
Multi-dimensional integration using Monte Carlo An important focus of this talk Based on relating the expected value to an integral Modeling random processes Cryptography Not addressed in this talk Games MC integration, based on relating expected value to an integral – focus of the talk, important in supercomputing, includes most random walk computations Modeling random processes Cryptography – calls for different algorithms Games – quality not as important 4

5 Terminology T: Transition function Period: Length of the cycle States
Mapping to RN Fixed state size implies cycle Period, long period desired, but not sufficient (for many applications) Seed (not necessarily the starting random number) Random number stream How new distributions can be generated from a uniform one (ex, acceptance-rejection, inverse transform) Efficient sampling from discrete distributions possible using the alias method T: Transition function Period: Length of the cycle 5

6 Desired Features Sequential Pseudo-Random Number Generators
Randomness Uniform distribution in high dimensions Reproducibility Helps in debugging Speed Large period Portability Parallel Pseudo-Random Number Generators Sequences on different processors should be uncorrelated Dynamic creation of new random number streams Absence of inter-processor communication Uniformity in 2-D Sequential Uniformity in high dimensions Reproducibility Speed (not as important) Large period (but correlations occur much before the period is exhausted) Portability (to help with debugging) Parallel Absence of inter-stream correlations Dynamic creation of new random streams in some applications No inter-processor communication, despite the above 6

7 Common Generators Linear Congruential Generator (LCG)
xn = a xn-1 + p (mod m) Additive Lagged Fibonacci Generator (LFG) xn = xn-r + xn-s (mod m) Multiple Recursive Generator (MRG) Example: xn = a xn-1 + b xn-5 (mod m) Combined Multiple Recursive Generators (CMRG) combine multiple such generators Multiplicative Lagged Fibonacci Generator (MLFG) xn = xn-r  xn-s (mod m) Mersenne Twister, etc Many of these have known defects, but it is better than using generators with unknown defects. But, we have modified these generators to fix problems. Also, some common errors are easily avoided. For example, LCGs with power of two modulus have bits with power of two periods. The least significant bits are highly correlated. So, if you want to perform a high power of two dimensional integration (such as 1024-dimensional integration), then you should discard a few random numbers after each 1024-RNs, to avoid these correlations affecting the coverage of the 1024-dimensional space. The LCGs have points that fall along planes. However, many generators that attempt to improve on these perform worse in practical applications. Furthermore, the use of multiple generators will identify errors. The Mersenne-twister has high-dimensional equidistribution, and looks like a good sequential generator. However, I would like to see larger empirical tests on practical applications. This generator can also be parallelized effectively. 7

8 Error Due to Correlations
Decide on flipping state, using a random number An Ising model simulation maintains a lattice of points, where each point has a spin that is up or down. From the relative spins of neighboring particles for each particle, some properties, such as specific heat, can be computed. Spins are changed depending on values from a random number generator. The way this is done varies between different algorithms. A variety of states are generated in this manner, and averaged to yield the property of interest. The exact solution can be computed. Long range correlations can affect the result, and so this is a good test of random number quality. Relate error to standard deviation. Ising model results with Metropolis algorithm on a 16 x 16 lattice using the LFG random The error is usually estimated from the standard deviation (x-axis), which should decrease as (sample size)-1/2 8

9 Random Numbers in Parallel Monte Carlo
Monte Carlo Example: Estimating p Monte Carlo Parallelization Low Discrepancy Sequences So far: Explained PRNG theory, and shown that errors can arise Introduce the idea of Monte Carlo with an example Explain how MC is parallelized Explain potential problems due to inter-stream correlations Discuss the possibility of LDS 9

10 Monte Carlo Example: Estimating 
Uniform in 1-D but not in 2-D Generate pairs of random numbers (x, y) in the square Estimate  as: 4 (Number in circle)/(Total number of pairs) This is a simple example of Monte Carlo integration Monte Carlo integration can be performed based on the observation that E f(x) = ∫ f(y) (y) dy, where x is sampled from the distribution  With N samples, error  N-0.5 Example: r = ¼, f(x) = 1 in the circle, and 0 outside, to estimate p/4 Explain estimation of p Explain how this is an example of finite dimensional Monte Carlo integrations Mention the error with number of samples Explain how correlations can affect the sampling and lead to errors 10

11 Monte Carlo Parallelization
Process 1 RNG stream 1 Process 2 RNG stream 2 Process 3 RNG stream 3 Results 3.1 3.6 2.7 Combined result 3.13 Explain traditional parallelization The error with N*P samples should be similar to a sequential simulation with N*P samples Correlations can make the error higher. Example: if all RNGs are identical, then all the parallelization is wasted. In fact, the answer can even be wrong, because we are generating from the wrong distribution. Distinguish inter-stream and intra-stream correlations. Explain that intra-stream is worse in this embarrassingly parallel simulation. Conventionally, Monte Carlo is “embarrassingly parallel” Same algorithm is run on each processor, but with different random number sequences For example, run the same algorithm for computing  Results on the different processors can be combined together 11

12 Low Discrepancy Sequences
Random Low Discrepancy Sequence For integration, uniformity is often more important than randomness. LDS avoid clustering points, and are often useful for moderate dimensional integration. Parallelization is more complicated, and I will not talk about them further here. However, they have been found useful in many financial applications, and you may want to consider them in your problems. Uniformity is often more important than randomness Low discrepancy sequences attempt to fill a space uniformly Integration error can be bound:  logdN/N, with N samples in d dimensions Low discrepancy point sets can be used when the number of samples is known 12

13 Parallel Random Number Generation
Parallelization through Random Seeds Leap-Frog Parallelization Parallelization through Blocking Parameterization Test Results In the pervious section: How applications commonly need PRNGs in parallel. Next: How can we provides PRNGs with the desired properties Mention about sequence splitting and its various versions Mention about parameterization, and its scalability, and show that it overcomes the above errors 13

14 Parallelization through Random Seeds
Consider a single random number stream Each processor chooses a start state randomly Hope that each start state is sufficiently far apart in the original stream Overlap of sequences possible, if the start states are not sufficiently far apart Correlations between sequences possible, even if the start states are far apart Mention how long-range correlations in the original sequence can become short-range inter-stream correlations. 14

15 Leap-Frog Parallelization
Consider a single random number stream On P processors, split the above stream by having each processor get every P th number from the original stream Long-range correlations in the original sequence can become short-range intra-stream correlations, which are dangerous Explain Original sequence 1 2 3 4 5 6 7 8 9 10 11 12 Processor 1 1 4 7 10 Processor 2 2 5 8 11 Processor 3 3 6 9 12 15

16 Parallelization through Blocking
Each processor gets a different block of numbers from an original random number stream Long-range correlations in the original sequence can become short-range inter-stream correlations, which may be harmful Example: The 48-bit LCG ranf fails the blocking test (add many numbers and see if the sum is normally distributed) with 1010 random numbers Sequences on different processors may overlap Original sequence 1 2 3 4 5 6 7 8 9 10 11 12 Explain Processor 1 1 2 3 4 Processor 2 5 6 7 8 Processor 3 9 10 11 12 16

17 Parameterization Each processor gets an inherently different stream
Parameterized iterations Create a collection of iteration functions Stream i is associated with iteration function i LCG example: xn = a xn-1 + pi (mod m) on processor i pi is the i th prime Cycle parameterization Some random number generators inherently have a large number of distinct cycles Ensure that each processor gets a start state from a different cycle Example: LFG The existence of inherently different streams does not imply that the streams are uncorrelated Explain 17

18 Test Results 1 Identical start states used different iteration functions. This slide demonstrates that (i) different sequences does not imply absence of correlations, and (ii) shows that is easy to get errors on real applications using what look like reasonable parallel random number generators. Ising model results with Metropolis algorithm on a 16 x 16 lattice using a parallel LCG with (i) identical start states (dashed line) and (ii) different start states (solid line), at each site Around 95% of the points should be below the dotted line 18

19 Test Results 2 Shows good results with sequential MLFG. Ising model results with Metropolis algorithm on a 16 x 16 lattice using a sequential MLFG 19

20 Test Results 3 Shows good results with parallel MLFG. Ising model results with Metropolis algorithm on a 16 x 16 lattice using a parallel MLFG 20

21 SPRNG Libraries SPRNG Features Simple Interface General Interface
Spawning New Streams Test Suite Test Results Summary SPRNG Versions In the previous section: discussed different parallelization strategies and their potential pitfalls SPRNG features Using SPRNG: ease of parallelization, changing code to use SPRNG Also mention that SPRNG permits multiple RNG streams on each process, but the simple interface assumes one distinct stream on each process. Test suite and result, limits on number of streams tested Different SPRNG versions, including SPRNG Cell 21

22 SPRNG Features Libraries for parallel random number generation
Three LCGs, a modified LFG, MLFG, and CMRG Parallelization is based on parameterization Periods up to 21310, and up to distinct streams Applications can dynamically spawn new random number streams No communication is required PRNG state can be checkpointed and restarted in a machine independent manner A test suite is included, to enable testing the quality of parallel random number generators An extensibility template enables porting new generators into SPRNG format Usable in C/C++ and Fortran programs Mention that MLFG is the only non-linear generator Mention that we preferred simpler generators that were well studied to newer ones whose defects were not adequately known. We think that using a few different well-known RNGs is better than trusting a single RNG with good theoretical properties. Explain about variants for each generator (parameter argument). Mention that large period does not imply the most of the period can be effectively used, due to long-range correlations. We planned to include the Mersenne Twister too, but never got around to doing it. 22

23 Simple Interface #include <stdio.h> #include <mpi.h>
#define SIMPLE_SPRNG #define USE_MPI #include "sprng.h" main(int argc, char *argv[]) { double rn; int i, myid; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); for (i=0;i<3;i++) { rn = sprng(); printf("Process %d, random number %d: %.14f\n", myid, i+1, rn); } MPI_Finalize(); #include <stdio.h> #define SIMPLE_SPRNG #include "sprng.h” main() { double rn; int i; printf(" Printing 3 random numbers in [0,1):\n"); for (i=0;i<3;i++) rn = sprng(); /* double precision */ printf("%f\n",rn); } MPI is not really needed for SPRNG. Its sole use is in ensuring consistent seeding across processes. Mention about the SPRNG seed actually being an encoded seed, and you are expected to use the same encoded seed on all processes. Left side example: For sequential use (note that SPRNG generators are faster and better than the standard ones available). Right side example: The use of USE_MPI ensures that the process rank is used to create a different stream on each process. 23

24 General Interface #include <stdio.h> #include <mpi.h> #define USE_MPI #include "sprng.h” main(int argc, char *argv[]) { int streamnum, nstreams, seed, *stream, i, myid, nprocs; double rn; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); streamnum = myid; nstreams = nprocs; seed = make_sprng_seed(); stream = init_sprng(streamnum, nstreams, seed, SPRNG_DEFAULT); for (i=0;i<3;i++) { rn = sprng(stream); printf("process %d, random number %d: %f\n", myid, i+1, rn); } free_sprng(stream); MPI_Finalize(); Mention arguments to init_sprng and its return type, which is a handle to an RNG. Mention make_sprng_seed, and how USE_MPI ensures that it is replicated on all processes. It is better not to use a random seed, to help debugging by ensuring reproducibility. Mention about the need for nstreams argument, due to the need for spawning in some applications. Mention about the parameter argument. Mention about free_sprng. 24

25 Spawning New Streams Can be useful in ensuring reproducibility
Each new entity is given a new random number stream #include <stdio.h> #include "sprng.h" #define SEED main() { int streamnum, nstreams, *stream, **new; double rn; int i, nspawned; streamnum = 0; nstreams = 1; stream = init_sprng(streamnum, nstreams, SEED, SPRNG_DEFAULT); for (i=0;i<20;i++) rn = sprng(stream); nspawned = spawn_sprng(stream, 2, &new); printf(" Printing 2 random numbers from second spawned stream:\n"); for (i=0;i<2;i++) { rn = sprng(new[1]); printf("%f\n", rn); } free_sprng(stream); free_sprng(new[0]); free_sprng(new[1]); free(new); Mention how spawning can ensure reproducibility with dynamic creation of entities and load balancing. Otherwise, this is not needed to ensure reproducibility. 25

26 Converting Code to Use SPRNG
#include <stdio.h> #include <mpi.h> #define SIMPLE_SPRNG #define USE_MPI #include "sprng.h" #define myrandom sprng double myrandom(); /* Old PRNG */ main(int argc, char *argv[]) { int seed, i, myid; double rn; MPI_Init(&argc, &argv); for (i=0;i<3;i++) rn = myrandom(); printf("Process %d, random number %d: %.14f\n", myid, i+1, rn); } MPI_Finalize(); Just define your old RNG to sprng, and use SIMPLE_SPRNG. For (embarrassingly parallel) parallelizing an application, also use USE_MPI. The final result on each process should also be averaged. 26

27 Test Suite Sequential and parallel tests to check for absence of correlations Tests run on sequential or parallel machines Parallel tests interleave different streams to create a new stream The new streams are tested with sequential tests Interleaving changes inter-stream correlations to intra-stream correlation when used with a conventional sequential test. The Ising model tests used a different stream on each lattice site to expose correlations between streams. 27

28 Test Results Summary Sequential and parallel versions of DIEHARD and Knuth’s tests Application-based tests Ising model using Wolff and Metropolis algorithms, random walk test Sequential tests 1024 streams typically tested for each PRNG variant, with a total of around 1011 – 1012 random numbers used per test per PRNG variant Parallel tests A typical test creates four new streams by combining 256 streams for each new stream A total of around 1011 – 1012 random numbers were used for each test for each PRNG variant All SPRNG generators pass all the tests Some of the largest PRNG tests conducted One test (gap) used 10^13 random numbers. Note that in parallel tests, only the first 1024 streams have been tested. This may not be adequate on 10K process machines. The total number of RNs may not be adequate either. So, users on such machines should certainly use multiple types of RNGs and check if the results match, before combining the results. 28

29 SPRNG Versions All the SPRNG versions use the same generators, with the same code used in SPRNG 1.0 The interfaces alone differ SPRNG 1.0: An application can use only one type of generator Multiple streams can be used, of course Ideal for the typical C/Fortran application developer, usable from C++ too SPRNG 2.0: An application can use multiple types of generators There is some loss in speed Useful for those developing new generators by combining existing ones SPRNG 4.0: C++ wrappers for SPRNG 2.0 SPRNG Cell: SPRNG for the SPUs of the Cell processor Available from Sri Sathya Sai University, India No difference in underlying RNGs between the different versions. In this section: told them how to use SPRNG 29

30 Conclusions Quality of sequential and parallel random number generators is important in applications that use a large number of random numbers, or those that use several processors Speed is probably less important, to a certain extent It is difficult to prove the quality, theoretically or empirically Use different types of generators, verify if their results are similar using the individual solutions and the estimated standard deviation, and then combine the results if they are similar It is important to ensure reproducibility, to ease debugging Use SPRNG! sprng.scs.fsu.edu It is easy to get errors due to parallel random numbers, if you are not careful It is easy to overcome them, if you use SPRNG Ensure reproducibility, to help with debugging Limitations of current tests on massively parallel processors 30


Download ppt "Parallel Random Number Generation"

Similar presentations


Ads by Google