Parallel Random Number Generation

Slides:



Advertisements
Similar presentations
PARALLEL RANDOM NUMBER GENERATION
Advertisements

Idan Zaguri Ran Tayeb 2014 Parallel Random Number Generator.
High Performance Computing 1 Random Numbers. High Performance Computing 1 What is a random number generator? Most random number generators generate a.
Practical techniques & Examples
Generating Random Numbers
Random variables 1. Note  there is no chapter in the textbook that corresponds to this topic 2.
Random Number Generation. Random Number Generators Without random numbers, we cannot do Stochastic Simulation Most computer languages have a subroutine,
 1  Outline  generation of random variates  convolution  composition  acceptance/rejection  generation of uniform(0, 1) random variates  linear.
Random Numbers. Two Types of Random Numbers 1.True random numbers: True random numbers are generated in non- deterministic ways. They are not predictable.
CDA6530: Performance Models of Computers and Networks Chapter 5: Generating Random Number and Random Variables TexPoint fonts used in EMF. Read the TexPoint.
Using random numbers Simulation: accounts for uncertainty: biology (large number of individuals), physics (large number of particles, quantum mechanics),
The Problem With The Linpack Benchmark 1.0 Matrix Generator Jack J. Dongarra and Julien Langou International Journal of High Performance Computing Applications.
1 Random Number Generation H Plan: –Introduce basics of RN generation –Define concepts and terminology –Introduce RNG methods u Linear Congruential Generator.
Lecture 10 Outline Monte Carlo methods Monte Carlo methods History of methods History of methods Sequential random number generators Sequential random.
Genetic Algorithms Can Be Used To Obtain Good Linear Congruential Generators Presented by Ben Sproat.
Pseudorandom Number Generators
Distinguishing Features of Simulation Time (CLK)  DYNAMIC focused on this aspect during the modeling section of the course Pseudorandom variables (RND)
K. Desch – Statistical methods of data analysis SS10
Lattices for Distributed Source Coding - Reconstruction of a Linear function of Jointly Gaussian Sources -D. Krithivasan and S. Sandeep Pradhan - University.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CSCE Monte Carlo Methods When you can’t do the math, simulate the process with random numbers Numerical integration to get areas/volumes Particle.
1 Simulation Modeling and Analysis Pseudo-Random Numbers.
APPENDIX D RANDOM NUMBER GENERATION
Random Number Generation Fall 2013
Efficient Pseudo-Random Number Generation for Monte-Carlo Simulations Using GPU Siddhant Mohanty, Subho Shankar Banerjee, Dushyant Goyal, Ajit Mohanty.
Fall 2011 CSC 446/546 Part 6: Random Number Generation.
ETM 607 – Random Number and Random Variates
Introduction to Monte Carlo Methods D.J.C. Mackay.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
Random Number Generators CISC/QCSE 810. What is random? Flip 10 coins: how many do you expect will be heads? Measure 100 people: how are their heights.
A SCALABLE LIBRARY FOR PSEUDORANDOM NUMBER GENERATION ALGORITHM 806: SPRNG.
Random Numbers CSE 331 Section 2 James Daly. Randomness Most algorithms we’ve talked about have been deterministic The same inputs always give the same.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Bug Localization with Machine Learning Techniques Wujie Zheng
CPSC 531: RN Generation1 CPSC 531:Random-Number Generation Instructor: Anirban Mahanti Office: ICT Class Location:
Chapter 7 Random-Number Generation
CS 450 – Modeling and Simulation Dr. X. Topics What Does Randomness Mean? Randomness in games Generating Random Values Random events in real life: measuring.
Basic Concepts in Number Theory Background for Random Number Generation 1.For any pair of integers n and m, m  0, there exists a unique pair of integers.
Pseudorandom Number Generation on the GPU Myles Sussman, William Crutchfield, Matthew Papakipos.
Random Number Generators 1. Random number generation is a method of producing a sequence of numbers that lack any discernible pattern. Random Number Generators.
Monte Carlo Methods.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
Hybrid MPI and OpenMP Parallel Programming
Week 7 - Wednesday.  What did we talk about last time?  scanf()  Memory allocation  malloc()  free()
APPENDIX D R ANDOM N UMBER G ENERATION Organization of chapter in ISSO* – General description and linear congruential generators Criteria for “good” random.
Validating a Random Number Generator Based on: A Test of Randomness Based on the Consecutive Distance Between Random Number Pairs By: Matthew J. Duggan,
Testing Random-Number Generators Andy Wang CIS Computer Systems Performance Analysis.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
0 Simulation Modeling and Analysis: Input Analysis 7 Random Numbers Ref: Law & Kelton, Chapter 7.
Chapter 3 Generating Uniform Random Variables. In any kind of simulation, we need data, or we have to produce them. Especially in Monte Marco simulation.
MONTE CARLO METHOD DISCRETE SIMULATION RANDOM NUMBER GENERATION Chapter 3 : Random Number Generation.
Random Numbers All stochastic simulations need to “generate” IID U(0,1) “random numbers” Other random variates coming from other distribution can be generated.
1.  How does the computer generate observations from various distributions specified after input analysis?  There are two main components to the generation.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Parallel Programming in C with MPI and OpenMP
Chapter 7 Random Number Generation
Chapter 7 Random-Number Generation
Properties of Random Numbers
Lecture 2 – Monte Carlo method in finance
Computing and Statistical Data Analysis Stat 3: The Monte Carlo Method
Embarrassingly Parallel Computations
Validating a Random Number Generator
Statistical Data Mining
Computer Simulation Techniques Generating Pseudo-Random Numbers
Random Number Generation
Presentation transcript:

Parallel Random Number Generation Ashok Srinivasan Florida State University asriniva@cs.fsu.edu If random numbers were really random, then parallelization would not make any difference … and this talk would be unnecessary But we use pseudo-random numbers, which only pretend to be random, and this causes problems These problems can usually be solved if you use SPRNG! Thanks for introduction PPRNG: Use of random numbers on a parallel machine Parallelization matters because the numbers are only pseudo-random Important for massively parallel machines Goal: Acquaint audience with what can go wrong, and how to avoid it by using the SPRNG library that we developed a long time back

Outline Introduction Random Numbers in Parallel Monte Carlo Parallel Random Number Generation SPRNG Libraries Conclusions Intro – PRNG concepts and terminology Common use of PRNGs in parallel applications PRNG parallelization techniques and potential problems SPRNG Libraries – makes things simple Conclusions – rules of thumb

Introduction Applications of Random Numbers Terminology Desired Features Common Generators Errors Due to Correlations Mention common applications Introduce the idea of PRNG and terminology Describe desired features Some common generators Show how errors are possible 3

Applications of Random Numbers Multi-dimensional integration using Monte Carlo An important focus of this talk Based on relating the expected value to an integral Modeling random processes Cryptography Not addressed in this talk Games MC integration, based on relating expected value to an integral – focus of the talk, important in supercomputing, includes most random walk computations Modeling random processes Cryptography – calls for different algorithms Games – quality not as important 4

Terminology T: Transition function Period: Length of the cycle States Mapping to RN Fixed state size implies cycle Period, long period desired, but not sufficient (for many applications) Seed (not necessarily the starting random number) Random number stream How new distributions can be generated from a uniform one (ex, acceptance-rejection, inverse transform) Efficient sampling from discrete distributions possible using the alias method T: Transition function Period: Length of the cycle 5

Desired Features Sequential Pseudo-Random Number Generators Randomness Uniform distribution in high dimensions Reproducibility Helps in debugging Speed Large period Portability Parallel Pseudo-Random Number Generators Sequences on different processors should be uncorrelated Dynamic creation of new random number streams Absence of inter-processor communication Uniformity in 2-D Sequential Uniformity in high dimensions Reproducibility Speed (not as important) Large period (but correlations occur much before the period is exhausted) Portability (to help with debugging) Parallel Absence of inter-stream correlations Dynamic creation of new random streams in some applications No inter-processor communication, despite the above 6

Common Generators Linear Congruential Generator (LCG) xn = a xn-1 + p (mod m) Additive Lagged Fibonacci Generator (LFG) xn = xn-r + xn-s (mod m) Multiple Recursive Generator (MRG) Example: xn = a xn-1 + b xn-5 (mod m) Combined Multiple Recursive Generators (CMRG) combine multiple such generators Multiplicative Lagged Fibonacci Generator (MLFG) xn = xn-r  xn-s (mod m) Mersenne Twister, etc Many of these have known defects, but it is better than using generators with unknown defects. But, we have modified these generators to fix problems. Also, some common errors are easily avoided. For example, LCGs with power of two modulus have bits with power of two periods. The least significant bits are highly correlated. So, if you want to perform a high power of two dimensional integration (such as 1024-dimensional integration), then you should discard a few random numbers after each 1024-RNs, to avoid these correlations affecting the coverage of the 1024-dimensional space. The LCGs have points that fall along planes. However, many generators that attempt to improve on these perform worse in practical applications. Furthermore, the use of multiple generators will identify errors. The Mersenne-twister has high-dimensional equidistribution, and looks like a good sequential generator. However, I would like to see larger empirical tests on practical applications. This generator can also be parallelized effectively. 7

Error Due to Correlations Decide on flipping state, using a random number An Ising model simulation maintains a lattice of points, where each point has a spin that is up or down. From the relative spins of neighboring particles for each particle, some properties, such as specific heat, can be computed. Spins are changed depending on values from a random number generator. The way this is done varies between different algorithms. A variety of states are generated in this manner, and averaged to yield the property of interest. The exact solution can be computed. Long range correlations can affect the result, and so this is a good test of random number quality. Relate error to standard deviation. Ising model results with Metropolis algorithm on a 16 x 16 lattice using the LFG random The error is usually estimated from the standard deviation (x-axis), which should decrease as (sample size)-1/2 8

Random Numbers in Parallel Monte Carlo Monte Carlo Example: Estimating p Monte Carlo Parallelization Low Discrepancy Sequences So far: Explained PRNG theory, and shown that errors can arise Introduce the idea of Monte Carlo with an example Explain how MC is parallelized Explain potential problems due to inter-stream correlations Discuss the possibility of LDS 9

Monte Carlo Example: Estimating  Uniform in 1-D but not in 2-D Generate pairs of random numbers (x, y) in the square Estimate  as: 4 (Number in circle)/(Total number of pairs) This is a simple example of Monte Carlo integration Monte Carlo integration can be performed based on the observation that E f(x) = ∫ f(y) (y) dy, where x is sampled from the distribution  With N samples, error  N-0.5 Example: r = ¼, f(x) = 1 in the circle, and 0 outside, to estimate p/4 Explain estimation of p Explain how this is an example of finite dimensional Monte Carlo integrations Mention the error with number of samples Explain how correlations can affect the sampling and lead to errors 10

Monte Carlo Parallelization Process 1 RNG stream 1 Process 2 RNG stream 2 Process 3 RNG stream 3 Results 3.1 3.6 2.7 Combined result 3.13 Explain traditional parallelization The error with N*P samples should be similar to a sequential simulation with N*P samples Correlations can make the error higher. Example: if all RNGs are identical, then all the parallelization is wasted. In fact, the answer can even be wrong, because we are generating from the wrong distribution. Distinguish inter-stream and intra-stream correlations. Explain that intra-stream is worse in this embarrassingly parallel simulation. Conventionally, Monte Carlo is “embarrassingly parallel” Same algorithm is run on each processor, but with different random number sequences For example, run the same algorithm for computing  Results on the different processors can be combined together 11

Low Discrepancy Sequences Random Low Discrepancy Sequence For integration, uniformity is often more important than randomness. LDS avoid clustering points, and are often useful for moderate dimensional integration. Parallelization is more complicated, and I will not talk about them further here. However, they have been found useful in many financial applications, and you may want to consider them in your problems. Uniformity is often more important than randomness Low discrepancy sequences attempt to fill a space uniformly Integration error can be bound:  logdN/N, with N samples in d dimensions Low discrepancy point sets can be used when the number of samples is known 12

Parallel Random Number Generation Parallelization through Random Seeds Leap-Frog Parallelization Parallelization through Blocking Parameterization Test Results In the pervious section: How applications commonly need PRNGs in parallel. Next: How can we provides PRNGs with the desired properties Mention about sequence splitting and its various versions Mention about parameterization, and its scalability, and show that it overcomes the above errors 13

Parallelization through Random Seeds Consider a single random number stream Each processor chooses a start state randomly Hope that each start state is sufficiently far apart in the original stream Overlap of sequences possible, if the start states are not sufficiently far apart Correlations between sequences possible, even if the start states are far apart Mention how long-range correlations in the original sequence can become short-range inter-stream correlations. 14

Leap-Frog Parallelization Consider a single random number stream On P processors, split the above stream by having each processor get every P th number from the original stream Long-range correlations in the original sequence can become short-range intra-stream correlations, which are dangerous Explain Original sequence 1 2 3 4 5 6 7 8 9 10 11 12 Processor 1 1 4 7 10 Processor 2 2 5 8 11 Processor 3 3 6 9 12 15

Parallelization through Blocking Each processor gets a different block of numbers from an original random number stream Long-range correlations in the original sequence can become short-range inter-stream correlations, which may be harmful Example: The 48-bit LCG ranf fails the blocking test (add many numbers and see if the sum is normally distributed) with 1010 random numbers Sequences on different processors may overlap Original sequence 1 2 3 4 5 6 7 8 9 10 11 12 Explain Processor 1 1 2 3 4 Processor 2 5 6 7 8 Processor 3 9 10 11 12 16

Parameterization Each processor gets an inherently different stream Parameterized iterations Create a collection of iteration functions Stream i is associated with iteration function i LCG example: xn = a xn-1 + pi (mod m) on processor i pi is the i th prime Cycle parameterization Some random number generators inherently have a large number of distinct cycles Ensure that each processor gets a start state from a different cycle Example: LFG The existence of inherently different streams does not imply that the streams are uncorrelated Explain 17

Test Results 1 Identical start states used different iteration functions. This slide demonstrates that (i) different sequences does not imply absence of correlations, and (ii) shows that is easy to get errors on real applications using what look like reasonable parallel random number generators. Ising model results with Metropolis algorithm on a 16 x 16 lattice using a parallel LCG with (i) identical start states (dashed line) and (ii) different start states (solid line), at each site Around 95% of the points should be below the dotted line 18

Test Results 2 Shows good results with sequential MLFG. Ising model results with Metropolis algorithm on a 16 x 16 lattice using a sequential MLFG 19

Test Results 3 Shows good results with parallel MLFG. Ising model results with Metropolis algorithm on a 16 x 16 lattice using a parallel MLFG 20

SPRNG Libraries SPRNG Features Simple Interface General Interface Spawning New Streams Test Suite Test Results Summary SPRNG Versions In the previous section: discussed different parallelization strategies and their potential pitfalls SPRNG features Using SPRNG: ease of parallelization, changing code to use SPRNG Also mention that SPRNG permits multiple RNG streams on each process, but the simple interface assumes one distinct stream on each process. Test suite and result, limits on number of streams tested Different SPRNG versions, including SPRNG Cell 21

SPRNG Features Libraries for parallel random number generation Three LCGs, a modified LFG, MLFG, and CMRG Parallelization is based on parameterization Periods up to 21310, and up to 239618 distinct streams Applications can dynamically spawn new random number streams No communication is required PRNG state can be checkpointed and restarted in a machine independent manner A test suite is included, to enable testing the quality of parallel random number generators An extensibility template enables porting new generators into SPRNG format Usable in C/C++ and Fortran programs Mention that MLFG is the only non-linear generator Mention that we preferred simpler generators that were well studied to newer ones whose defects were not adequately known. We think that using a few different well-known RNGs is better than trusting a single RNG with good theoretical properties. Explain about variants for each generator (parameter argument). Mention that large period does not imply the most of the period can be effectively used, due to long-range correlations. We planned to include the Mersenne Twister too, but never got around to doing it. 22

Simple Interface #include <stdio.h> #include <mpi.h> #define SIMPLE_SPRNG #define USE_MPI #include "sprng.h" main(int argc, char *argv[]) { double rn; int i, myid; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); for (i=0;i<3;i++) { rn = sprng(); printf("Process %d, random number %d: %.14f\n", myid, i+1, rn); } MPI_Finalize(); #include <stdio.h> #define SIMPLE_SPRNG #include "sprng.h” main() { double rn; int i; printf(" Printing 3 random numbers in [0,1):\n"); for (i=0;i<3;i++) rn = sprng(); /* double precision */ printf("%f\n",rn); } MPI is not really needed for SPRNG. Its sole use is in ensuring consistent seeding across processes. Mention about the SPRNG seed actually being an encoded seed, and you are expected to use the same encoded seed on all processes. Left side example: For sequential use (note that SPRNG generators are faster and better than the standard ones available). Right side example: The use of USE_MPI ensures that the process rank is used to create a different stream on each process. 23

General Interface #include <stdio.h> #include <mpi.h> #define USE_MPI #include "sprng.h” main(int argc, char *argv[]) { int streamnum, nstreams, seed, *stream, i, myid, nprocs; double rn; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); streamnum = myid; nstreams = nprocs; seed = make_sprng_seed(); stream = init_sprng(streamnum, nstreams, seed, SPRNG_DEFAULT); for (i=0;i<3;i++) { rn = sprng(stream); printf("process %d, random number %d: %f\n", myid, i+1, rn); } free_sprng(stream); MPI_Finalize(); Mention arguments to init_sprng and its return type, which is a handle to an RNG. Mention make_sprng_seed, and how USE_MPI ensures that it is replicated on all processes. It is better not to use a random seed, to help debugging by ensuring reproducibility. Mention about the need for nstreams argument, due to the need for spawning in some applications. Mention about the parameter argument. Mention about free_sprng. 24

Spawning New Streams Can be useful in ensuring reproducibility Each new entity is given a new random number stream #include <stdio.h> #include "sprng.h" #define SEED 985456376 main() { int streamnum, nstreams, *stream, **new; double rn; int i, nspawned; streamnum = 0; nstreams = 1; stream = init_sprng(streamnum, nstreams, SEED, SPRNG_DEFAULT); for (i=0;i<20;i++) rn = sprng(stream); nspawned = spawn_sprng(stream, 2, &new); printf(" Printing 2 random numbers from second spawned stream:\n"); for (i=0;i<2;i++) { rn = sprng(new[1]); printf("%f\n", rn); } free_sprng(stream); free_sprng(new[0]); free_sprng(new[1]); free(new); Mention how spawning can ensure reproducibility with dynamic creation of entities and load balancing. Otherwise, this is not needed to ensure reproducibility. 25

Converting Code to Use SPRNG #include <stdio.h> #include <mpi.h> #define SIMPLE_SPRNG #define USE_MPI #include "sprng.h" #define myrandom sprng double myrandom(); /* Old PRNG */ main(int argc, char *argv[]) { int seed, i, myid; double rn; MPI_Init(&argc, &argv); for (i=0;i<3;i++) rn = myrandom(); printf("Process %d, random number %d: %.14f\n", myid, i+1, rn); } MPI_Finalize(); Just define your old RNG to sprng, and use SIMPLE_SPRNG. For (embarrassingly parallel) parallelizing an application, also use USE_MPI. The final result on each process should also be averaged. 26

Test Suite Sequential and parallel tests to check for absence of correlations Tests run on sequential or parallel machines Parallel tests interleave different streams to create a new stream The new streams are tested with sequential tests Interleaving changes inter-stream correlations to intra-stream correlation when used with a conventional sequential test. The Ising model tests used a different stream on each lattice site to expose correlations between streams. 27

Test Results Summary Sequential and parallel versions of DIEHARD and Knuth’s tests Application-based tests Ising model using Wolff and Metropolis algorithms, random walk test Sequential tests 1024 streams typically tested for each PRNG variant, with a total of around 1011 – 1012 random numbers used per test per PRNG variant Parallel tests A typical test creates four new streams by combining 256 streams for each new stream A total of around 1011 – 1012 random numbers were used for each test for each PRNG variant All SPRNG generators pass all the tests Some of the largest PRNG tests conducted One test (gap) used 10^13 random numbers. Note that in parallel tests, only the first 1024 streams have been tested. This may not be adequate on 10K process machines. The total number of RNs may not be adequate either. So, users on such machines should certainly use multiple types of RNGs and check if the results match, before combining the results. 28

SPRNG Versions All the SPRNG versions use the same generators, with the same code used in SPRNG 1.0 The interfaces alone differ SPRNG 1.0: An application can use only one type of generator Multiple streams can be used, of course Ideal for the typical C/Fortran application developer, usable from C++ too SPRNG 2.0: An application can use multiple types of generators There is some loss in speed Useful for those developing new generators by combining existing ones SPRNG 4.0: C++ wrappers for SPRNG 2.0 SPRNG Cell: SPRNG for the SPUs of the Cell processor Available from Sri Sathya Sai University, India No difference in underlying RNGs between the different versions. In this section: told them how to use SPRNG 29

Conclusions Quality of sequential and parallel random number generators is important in applications that use a large number of random numbers, or those that use several processors Speed is probably less important, to a certain extent It is difficult to prove the quality, theoretically or empirically Use different types of generators, verify if their results are similar using the individual solutions and the estimated standard deviation, and then combine the results if they are similar It is important to ensure reproducibility, to ease debugging Use SPRNG! sprng.scs.fsu.edu It is easy to get errors due to parallel random numbers, if you are not careful It is easy to overcome them, if you use SPRNG Ensure reproducibility, to help with debugging Limitations of current tests on massively parallel processors 30