High Performance Computing 1 Random Numbers. High Performance Computing 1 What is a random number generator? Most random number generators generate a.

Slides:



Advertisements
Similar presentations
Generating Random Numbers
Advertisements

Random Number Generation. Random Number Generators Without random numbers, we cannot do Stochastic Simulation Most computer languages have a subroutine,
Random Numbers. Two Types of Random Numbers 1.True random numbers: True random numbers are generated in non- deterministic ways. They are not predictable.
Random number generation Algorithms and Transforms to Univariate Distributions.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 15 Chances, Probabilities, and Odds 15.1Random Experiments and.
Random Number Generators. Why do we need random variables? random components in simulation → need for a method which generates numbers that are random.
Using random numbers Simulation: accounts for uncertainty: biology (large number of individuals), physics (large number of particles, quantum mechanics),
Pseudorandom Number Generators
Evaluating Hypotheses
Statistics.
Random Number Generation
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Continuous Random Variables and Probability Distributions
Lecture Slides Elementary Statistics Twelfth Edition
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
15-853Page :Algorithms in the Real World Generating Random and Pseudorandom Numbers.
Random-Number Generation. 2 Properties of Random Numbers Random Number, R i, must be independently drawn from a uniform distribution with pdf: Two important.
Random Number Generation Fall 2013
Fall 2011 CSC 446/546 Part 6: Random Number Generation.
ETM 607 – Random Number and Random Variates
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.
Random Number Generators CISC/QCSE 810. What is random? Flip 10 coins: how many do you expect will be heads? Measure 100 people: how are their heights.
A SCALABLE LIBRARY FOR PSEUDORANDOM NUMBER GENERATION ALGORITHM 806: SPRNG.
Random Numbers CSE 331 Section 2 James Daly. Randomness Most algorithms we’ve talked about have been deterministic The same inputs always give the same.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
1 2. Independence and Bernoulli Trials Independence: Events A and B are independent if It is easy to show that A, B independent implies are all independent.
Random-Number Generation Andy Wang CIS Computer Systems Performance Analysis.
CS433 Modeling and Simulation Lecture 15 Random Number Generator Dr. Anis Koubâa 24 May 2009 Al-Imam Mohammad Ibn Saud Islamic University College Computer.
CPSC 531: RN Generation1 CPSC 531:Random-Number Generation Instructor: Anirban Mahanti Office: ICT Class Location:
Chapter 7 Random-Number Generation
Basic Concepts in Number Theory Background for Random Number Generation 1.For any pair of integers n and m, m  0, there exists a unique pair of integers.
Modeling and Simulation Random Number Generators
Random Number Generators 1. Random number generation is a method of producing a sequence of numbers that lack any discernible pattern. Random Number Generators.
Monte Carlo Methods.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Introduction to Behavioral Statistics Probability, The Binomial Distribution and the Normal Curve.
Biostatistics Class 3 Discrete Probability Distributions 2/8/2000.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Monte Carlo Methods So far we have discussed Monte Carlo methods based on a uniform distribution of random numbers on the interval [0,1] p(x) = 1 0  x.
Relative Values. Statistical Terms n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the data  not sensitive to.
Computer simulation Sep. 9, QUIZ 2 Determine whether the following experiments have discrete or continuous out comes A fair die is tossed and the.
R ANDOM N UMBER G ENERATORS Modeling and Simulation CS
Parallel Random Number Generation
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
Chapter 3 Generating Uniform Random Variables. In any kind of simulation, we need data, or we have to produce them. Especially in Monte Marco simulation.
0 Simulation Modeling and Analysis: Input Analysis 7 Random Numbers Ref: Law & Kelton, Chapter 7.
Chapter 3 Generating Uniform Random Variables. In any kind of simulation, we need data, or we have to produce them. Especially in Monte Marco simulation.
MONTE CARLO METHOD DISCRETE SIMULATION RANDOM NUMBER GENERATION Chapter 3 : Random Number Generation.
1.  How does the computer generate observations from various distributions specified after input analysis?  There are two main components to the generation.
Theory of Computational Complexity M1 Takao Inoshita Iwama & Ito Lab Graduate School of Informatics, Kyoto University.
Statistical analysis.
Statistical analysis.
Generating Random Numbers
Relative Values.
Parallel Programming in C with MPI and OpenMP
Random Number Generators
Random-Number Generation
Chapter 7 Random Number Generation
CS 475/575 Slide set 4 M. Overstreet Old Dominion University
Chapter 7 Random-Number Generation
Properties of Random Numbers
Discrete Event Simulation - 4
Lecture 2 – Monte Carlo method in finance
Quantitative Reasoning
Computer Simulation Techniques Generating Pseudo-Random Numbers
Random Number Generation
Generating Random and Pseudorandom Numbers
Generating Random and Pseudorandom Numbers
Presentation transcript:

High Performance Computing 1 Random Numbers

High Performance Computing 1 What is a random number generator? Most random number generators generate a sequence of integers by a recurrence (linear congruent generator): x 0 = given x n+1 = P 1 x n + P 2 (mod N) n=0,1,2,... divide by N to get a number in [0,1]

High Performance Computing 1 A sample sequence x0 =79, N = 100, P1 = 263, and P2 = 71 x1 = 79* (mod 100) = (mod 100) = 48, x2 = 48* (mod 100) = (mod 100) = 95, x3 = 95* (mod 100) = (mod 100) = 56, x4 = 56* (mod 100) = (mod 100) = 99, Subsequent numbers are: 8, 75, 96, 68, 36, 39, 28, 35, 76, 59, 88, 15, 16, 79, 48. The sequence then repeats

High Performance Computing 1 Sequences P1, P2, and N determine the characteristics of the random number generator The choice of x0 (the seed ) determines the particular sequence of random numbers that is generated.

High Performance Computing 1 What makes a good random number generator? A sequence is good if it passes several well established statistical tests. Or, it's good if it gives good results in particular applications (where the meaning of "good results" is heavily dependent upon the context).

High Performance Computing 1 One Test – plot pairs 100 points (xi, xi+1). It's clear that there are only 20 points; 100 were drawn, so five lie on top of each other The dots appear to lie along six slanted lines. Not very ‘random’

High Performance Computing 1 Second plot P1 = 16807, P2 = 0, N= = Distinctly better

High Performance Computing 1 Linear generators All sequences generated by a linear congruent formula will eventually enter a cycle which repeats itself endlessly; a good generator will produce a long sequence of numbers before repeating. The max. length of course is N. A linear congruent formula will generate a sequence of maximum length if and only if the following conditions are met (See Knuth: i) P2 is relatively prime to N; ii) B = (P1 - 1) is a multiple of p, for every prime p dividing N; iii) B = (P1 - 1) is a multiple of 4, if N is a multiple of 4.

High Performance Computing 1 THE MTH$RANDOM ALGORITHM The VMS Run-Time Library provides a random number generator routine called MTH$RANDOM. SEED = (69069*SEED + 1) MOD 2**32 X = SEED/2**32 Note MTH$RANDOM satisfies the conditions above: i) 1 is relatively prime to 2**32 since 1 is relatively prime to all numbers. ii) is a multiple of 2, which is the only prime dividing 2**32. iii) is a multiple of 4.

High Performance Computing 1 THE MTH$RANDOM ALGORITHM Note for the MTH$RANDOM function if SEED is initially an ODD value then the new value of SEED will always be an even value. And if SEED is an EVEN value, then the new value of SEED will be an ODD value. Thus if the algorithm is repeatedly called, the value of SEED will alternate between EVEN and ODD values.

High Performance Computing 1 THE MTH$RANDOM ALGORITHM More important than starting the MTH$RANDOM generator to get one random sequence is the problem of restarting the generator to get several different sequences. You may wish to run a simulation several times and use a different random sequence each time.

High Performance Computing 1 THE MTH$RANDOM ALGORITHM To come up with a good "random" initial SEED value to start the generator, use the generator itself to produce a random SEED value to start our random number generator with! Select an initial nonrandom SEED value, then run the random number generator a few cycles to generate a random SEED value. We then restart our random number generator with the new random SEED value, and the output is then a properly initialized random sequence.

High Performance Computing 1 THE RANDU ALGORITHM The VMS FORTRAN Run-Time Library contains a random number generator RANDU, first introduced by IBM IN This turned out to be a poor random number generator, but nonetheless it has been widely spread. INTEGER*4 SEED INTEGER*2 W(2) EQUIVALENCE( SEED, W(1) ) R = FOR$IRAN( W(1), W(2) ) R is the return value between [0,1). W(1) and W(2) together is the seed value for the generator. This goes back to the PDP-11 days of 16 bit integers. SEED is really a 32 bit integer, but it was represented as two 16 bit integers.

High Performance Computing 1 THE RANDU ALGORITHM SEED = (65539*SEED) MOD 2**31 X = SEED/2**31 Note if SEED is initially an odd value, the new SEED generated will also be an odd value. Similarly, if SEED is initially an even value. Thus there are at least two disjoint cycles for the RANDU generator.

High Performance Computing 1 THE RANDU ALGORITHM Actually, the situation is even worse than that. For odd SEED values there are two separate disjoint cycles, one generated by the SEED value 1, and one generated by the SEED value 5. The cycles and each contain 536,870,912 values. Together they account for all of the (2**31)/2 possible odd SEED values.

High Performance Computing 1 THE RANDU ALGORITHM There are 30 different disjoint cycles using even SEED values. TABLE RANDU WITH EVEN VALUES OF SEED CYCLE LENGTH OF CYCLE

High Performance Computing 1 THE RANDU ALGORITHM There are a total of (2**31)/2 = 1,073,741,824 possible even SEED values; we've accounted for 1,073,709,056 of them. The remaining SEED values are ones for which the 31 bit binary representation of them has the lower 16 bits set to 0. These SEED values are treated by RANDU as if the SEED value were 1, and they result in the cycle.

High Performance Computing 1 Tests The 1-D TEST is a frequency test. Imagine a number line stretching from 0 to 1. Use the random number generator to plot random points on this line. First divide the line into a number of "bins" | | | | | See how randomly the random number generator fills our bins. If the bins are filled too unevenly, the Chi-Square test will give a value that's high, indicating the points do not appear random.

High Performance Computing 1 Tests In 2D, divide the plane into squares. You can think of similar tests in higher dimensions also. Define: N = number of trials k = number of possible outcomes of the chance experiment f(ZETA_i) = number of occurrences of ZETA_i in N trials E(ZETA_i) = The expected number of occurrences of ZETA_i in N trials. E(ZETA_i) = N*Pr(ZETA_i). i=k [ f(ZETA_i) - E(ZETA_i) ]**2 CHISQ = SUM i=1 E(ZETA_i)

High Performance Computing 1 Tests MTH$RANDOM SEED = (69069*SEED + 1) mod 2**32 X = SEED/2**32 returns real in range [0,1) RATING: Fails 1-D above 350,000 bpd (bins per dimension) Fails 2-D above 600 bpd Fails 3-D above 100 bpd Fails 4-D above 27 bpd Comments: This generator is also used by the VAX FORTRAN intrinsic function RAN, and by the VAX BASIC function RND.

High Performance Computing 1 Tests RANDU SEED = (65539*SEED) mod 2**31 X = SEED/2**31 returns real in range [0,1) RATING: Fails 1-D above 200,000 bpd Fails 2-D above 400 bpd Fails 3-D above 3 bpd Fails 4-D above 6 bpd Comments: Note the extremely poor performance for dimensions 3 and above. This generator is obsolete.

High Performance Computing 1 Tests ANSI C ( rand() ) SEED = ( *SEED ) mod 2**31 X = SEED returns integer in range [0, 2**31) RATING: Fails 1-D above 500,000 bpd Fails 2-D above 600 bpd Fails 3-D above 80 bpd Fails 4-D above 21 bpd

High Performance Computing 1 Shuffling A simple way to greatly improve any random number generator is to shuffle the output. Start with an array of dimension around 100 (exact size is not important.) Initialize the array by filling it with random numbers from your generator. When the program wants a random number, randomly choose one from the array and output it to the program. Replace the number chosen in the array with a new random number from the random number generator. Note that this shuffling method uses two numbers from the random number generator for each random number output to the calling program.

High Performance Computing 1 Tests with shuffling ANSI C rand() WITHOUT SHUFFLING WITH SHUFFLING 1-D Fails above 500,000 bpd Fails above 400,000 bpd 2-D Fails above 600 bpd Fails above 3100 bpd 3-D Fails above 80 bpd Fails above 210 bpd 4-D Fails above 21 bpd Fails above 55 bpd

High Performance Computing 1 Lagged Fibonacci Generators The name lfg comes from the Fibonacci sequence 1, 1, 2, 3, 5, 8,......Xn = Xn-1 + Xn-2. LFGs generate random numbers from the following iterative scheme: Xn = Xn-i + Xn-k (mod m) the lags i and k satisfy the conditions i > k > 0. i initial values X0, X1,.....,Xi-1 are needed. For most applications m is power of 2, and with proper choice of i, k, and the first i values of X, the period is (2i - 1)2(M-1). One problem with LFG is that i words of memory must be kept current, whereas LCG requires only that the last value of X be saved.

High Performance Computing 1 Parallel Random Number Generators there should be no inter-processor correlation sequences generated on each processor should satisfy the qualities of serial random number generators it should generate same sequence for different number of processors it should work for any number of processors there should be no data movement between processors

High Performance Computing 1 Sequence Splitting A serial random number sequence is partitioned into non-overlapping contiguous sections. If there are N processors, and the period of the serial sequence is P, then the first processor gets the first P/N random numbers, the second processor gets the second P/N random numbers, etc. If the user happens to consume more random numbers than expected, then the sequences could overlap. Another possible problem is that long-range correlations that exist in serial generators could become short-range inter-stream or inter-processor correlations in parallel generators

High Performance Computing 1 Leapfrog In this approach, the sequence of a serial generator is partitioned in turn among multiple processors like a deck of cards dealt to card players. If there are N processors, each processor leapfrogs by N in the sequence. For example, processor i gets Xi, Xi+N, Xi+2N, etc. This again has the problem that long-range correlations in the original sequence can become short-range inter-stream correlations in the parallel generator.

High Performance Computing 1 Splitting and Leapfrog Both approaches result in non-scalable parallel random number generators, i.e., the number of different random numbers that can be used on each processor decreases as the number of processors are increased.

High Performance Computing 1 Parallel Random Number Generators Parameterization The parameterization method is one of the latest methods of generating parallel random numbers. The exact meaning of parameterization depends on the type of parallel random number generator. This method identifies a parameter in the underlying recursion of a serial random number generator that can be varied. Each valid value of this parameter leads to a recursion that produces a unique, full-period stream of random numbers.

High Performance Computing 1 Parallel LFG Parallelize a lagged Fibonacci generator by running the same sequential generator on each processor, but with different initial lag tables The initialization of the lag tables on each processor is a critical part of this algorithm. Any correlations within the seed tables or between different seed tables could have dire consequences. Since the initial seeds are chosen at random, there is no guarantee that the sequences generated on different processors will not overlap. However using a large lag eliminates this problem to all practical purposes, since the period of these generators is so long

High Performance Computing 1 Parallel Random Number Generators Scalable Parallel Random Number Generator (SPRNG) is a library containing several random number generators for serial and parallel computation, developed jointly by the University of Southern Mississippi and NCSA. It is callable from Fortran, C, and C++ programs and has been subjected to some of the largest random number tests (both statistical and physical). Its speed is also very competitive with the faster generators.

High Performance Computing 1 Parallel Random Number Generators SPRNG contains the following different random number generators: Modified Additive Lagged Fibonacci Generator (lfg) Multiplicative Lagged Fibonacci Generator (mlfg) 48 bit Linear Congruential Generator (lcg) 64 bit Linear Congruential Generator (lcg64) Combined Multiple Recursive Generator (cmrg) Prime Modulus Linear Congruential Generator (pmlcg) (this one is not automatically installed)

High Performance Computing 1 Hardware generators A hardware (true) random number generator is a piece of electronics that plugs into a computer and produces genuine random numbers - as opposed to the pseudo-random numbers that are produced by a computer program. A typical method is to amplify noise generated by a resistor or a semi- conductor diode and feed this to a comparator or Schmitt trigger. If you sample the output (not too quickly) you (hope to) get a series of bits which are statistically independent. These can be assembled into bytes, integers or floating point numbers and then, if necessary, into random numbers from other distributions using methods.

High Performance Computing 1 The Marsaglia CD-ROM George Marsaglia produced a CD-ROM containing 600 megabytes of random numbers. These were produced using the best pseudo-random number generators, but were then combined bytes from a variety of random sources or semi-random sources (such as rap music).CD-ROM Suppose X and Y are independent random bytes (integer values 0 to 255), and at least one of them is uniformly distributed over the values 0 to 255. Then both the bitwise exclusive-or of X and Y, and X+Y mod 256, are uniformly distributed over 0 to 255. In addition if both X and Y are approximately uniformly distributed, then the combination will be more closely uniformly distributed. In the Marsaglia CD-ROM the idea is to get the excellent properties of the pseudo-random number generator but to break up any remaining patterns with the random or semi-random generators.

High Performance Computing 1 Transformations We now have an idea of how to generate a uniform probability distribution, so that the probability of generating a number between x and x + dx, denoted p(x)dx, is given by p(x)dx = dx 0 < x < 1 0 otherwise Now suppose that we generate a uniform deviate x and then take some prescribed function of it, y(x). The probability distribution of y, denoted p(y)dy, is determined by the fundamental transformation law of probabilities, which is simply |p(y)dy| = |p(x)dx| or p(y) = p(x) dxdy

High Performance Computing 1 Exponential As an example, suppose that y(x) ≡ −ln(x), and that p(x) is as given by a uniform deviate. Then p(y)dy = |dx/dy| dy = e −y dy which is distributed exponentially. This exponential distribution occurs frequently in real problems, usually as the distribution of waiting times between independent Poisson-random events, for example the radioactive decay of nuclei.

High Performance Computing 1 Gaussian Another example is the Box-Muller method for generating random deviates with a normal (Gaussian) distribution, p(y)dy =1/√2π e −y2 /2 dy

High Performance Computing 1 Gaussian

High Performance Computing 1 Gaussian Since this is the product of a function of y2 alone and a function of y1 alone, each y is independently distributed according to the normal distribution

High Performance Computing 1 Gaussian One further trick is useful: suppose that, instead of picking uniform deviates x1 and x2 in the unit square, we instead pick v1 and v2 as the ordinate and abscissa of a random point inside the unit circle around the origin. Then the sum of their squares, R 2 ≡ v v 2 2 is a uniform deviate, which can be used for x1, while the angle that (v1, v2) defines with respect to the v1 axis can serve as the random angle 2πx2.

High Performance Computing 1 Finding Pi Choose a random point in the unit square by finding two random numbers (x1, x2). If this point lies inside the unit circle, consider the choice a ‘hit’, if not, a ‘miss’. Compute the area of a circle by finding the ratio of hits to the total number of points chosen, N. Write an OpenMP code to do this calculation

High Performance Computing 1 Finding Pi Does it matter (e.g. timing) whether each thread calculates its own random sequence or if a master dishes out the random points? How does the error scale with N?