Random-Number Generation Andy Wang CIS 5930-03 Computer Systems Performance Analysis
Generate Random Values Two steps Random-number generation Get a sequence of random numbers distributed uniformly between 0 and 1 Random-variate generation Transform the sequence to produce random values satisfying the desired distribution
Background The most common method Use a recursive function xn = f(xn-1, xn-2, …)
Example xn = (5xn-1 + 1) %16 Suppose x0 = 5 The first 32 numbers are between 0 and 15 Divide xn by 15 to get numbers between 0 and 1
Basic Terms x0 = seed Generated numbers are pseudo random Given a function, the entire sequence can be regenerated with x0 Generated numbers are pseudo random Deterministic Can pass statistical tests for randomness Preferred to fully random numbers so that simulated results can be repeated
Cycle Length Note that starting with the 17th number, the sequence repeats Cycle length of 16
More Terms Some generators do not repeat the initial part (tail) of the sequence Period of a generator = tail + cycle length tail cycle length period
Question How to choose seeds and random-number generation functions? Efficiently computable Heavily used in simulations The period should be large Successive values should be independent and uniformly distributed
Types of Random-Number Generators Linear-congruential generators Tausworth generators Extended Fibonacci generators Combined generators Others
Linear-Congruential Generators In 1951, Lehmer found residues of successive powers of a number have good randomness properties xn = an % m = aan-1 % m = axn-1 % m Lehmer’s choices of a and m a = 23 (multiplier) m = 108 + 1 (modulus) Implemented on ENIAC
(Mixed) Linear-Congruential Generators (LCG) xn = (axn-1 + b) % m xn is between 0 and m – 1 a and b are non-negative integers “Mixed” using both multiplication by a and addition by b
The Choice of a, b, and m m should be large To compute % m efficiently Period is never longer than m To compute % m efficiently Make m = 2k Just truncate the result by k bits
The Choice of a, b, and m If b > 0, maximum period m is obtained when m = 2k a = 4c + 1 b is odd c, b, and k are positive integers
Full-Period Generators Generators with maximum possible periods Not equally good Look for low autocorrelations between successive numbers xn = ((234 + 1)xn-1 + 1) % 235 has an autocorrelation of 0.25 xn = ((218 + 1)xn-1 + 1) % 235 has an autocorrelation of 2-18
Multiplicative LCG xn = axn-1 % m, b = 0 Can compute more efficiently when m = 2k However, maximum period is only 2k-2 Problem: Cyclic patterns with lower bits
Multiplicative LCG with m = 2k When a = 8i ± 3 E.g., xn = 5xn-1 % 25 Period is only 8 Which is ¼ of 25 When a ≠ 8i ± 3 E.g., xn = 7xn-1 % 25 Period is only 4
Multiplicative LCG with m ≠ 2k To get a longer period, use m = prime number With proper choice of a, it is possible to get a period of m – 1 a needs to be a prime root of m If and only if an % m ≠ 1 for n = 1..m - 2
Multiplicative LCG with m ≠ 2k xn = 3xn-1 % 31 x0 = 1 Period is 30 3 is a prime root of 31
Multiplicative LCG with m ≠ 2k xn = 75xn-1 % (231 – 1) 75 is a prime root of 231 – 1 But watch out for computational errors Multiplication overflow Need to apply tricks mentioned in p. 442 Truncation due to the number of digits available
Tausworthe Generations How to generate large random numbers? The Tausworthe generator produces a random sequence of binary digits The generator then divides the sequence into strings of desired lengths Based on a characteristic polynomial
Tausworthe Example Suppose we use the following characteristic polynomial x7 + x3 + 1 The corresponding generation function is bn+7 bn+3 bn = 0 Or bn = bn-4 bn-7 Need a 7-bit seed
Tausworthe Example The bit stream sequence 1111111000011101111001011001…. Convert to random numbers between 0 and 1, with 8-bit numbers x0 = 0.111111102 = 0.9921910 x1 = 0.000111012 = 0.1132810 x2 = 0.111001012 = 0.8945310 …
Tausworthe Generator Characteristics For the L-bit numbers generated +E[xn] = ½ +V[xn] = 1/12 +The serial correlation is zero + Good results over the complete cycle - Poor local behavior within a sequence
Tausworthe Example If a characteristic polynomial of order q has a period of 2q – 1, it is a primitive polynomial For x7 + x3 + 1 q = 7 Sequence repeats after 127 bits = 27 - 1 A primitive polynomial
Tausworthe Implementation Can be easily generated via linear-feedback shift-registers For x5 + x3 + 1 bn bn-1 bn-2 bn-3 bn-4 bn-5
Extended Fibonacci Generators xn = (xn-1 + xn-2) % m Does not have good randomness properties High serial correlation An extension xn = (xn-5 + xn-17) % 2k
Combined Generations Add random numbers by two or more generators Can considerably increase the period and randomness xn = 40014xn-1 % 2147483563 yn = 40692yn-1 % 2147483399 wn = (xn - yn) % 2147483562 This generator has a period of 2.3 x 1018
Combined Generators wn = 157wn-1 % 32363 xn = 146xn-1 % 31727 yn = 142yn-1 % 31657 vn = (wn - xn + yn) % 32362 This generator has a period of 8.1 x 1012 Can avoid the multiplication overflow problem
Combined Generators XOR random numbers by two or more generators
Combined Generators Shuffle One sequence as an index To an array filled with random numbers generated by the second sequence The chosen number in the second sequence is replaced by a new random number Problem Cannot skip to the nth random number
A Survey of Random-number Generators Some published generator functions xn = 75xn-1 % (231 – 1) Full period of 231 – 2 Low-order bits are randomly distributed Many others (see textbook) All have problems General lessons: Use established ones; Do not invent your own
Seed Selection If the generator has a full period Only one random variable is required Any seed value is good However, with more than one random variable, the story is different for multistream simulations E.g., random arrival and service times Should use two streams of random numbers
Seed Selection Guidelines Do not use zero Not good for multiplicative LCGs and Tausworthe generators Avoid even values Not good if a generator does not have a full period Do not use one stream for all variables May yield strong correlations among variables
Seed Selection Guidelines Use nonoverlapping streams Each stream requires a separate seed Otherwise… A long interarrival time may correlate with a long service time Suppose we need 10,000 random numbers for interarrival times; 10,000 for service times, use seeds 1 and 10,001 xn = [anx0 + c(an – 1)/(a – 1)] % m For multiplicative LCGs, c = 0
Seed Selection Guidelines Not to reuse seeds in successive simulation runs No point to run a simulation again with the same seed Just continue with the last random number as the seed for the successive runs
Seed Selection Guidelines Do not use random random-number generator seeds E.g., do not use the time of day, or /dev/random to seed simulations Simulations should be repeatable Cannot guarantee that multiple streams will not overlap Do not use numbers generated by random-number generators as seeds
Myths About Random-number Generation A complex set of operations leads to random results Hard to guess does not mean random Random numbers are not predictable Given a few successive numbers from an LCG Can solve a, c, and m Not suitable for cryptographic applications
Myths about Random- number Generation Some seeds are better than others True Avoid generators whose period and randomness depend on the seed Accurate implementation is not important Watch out for overflows and truncations
Myths about Random- number Generation Bits of successive words generated by a random-number generator are equally randomly distributed Nope
Myths about Random- number Generation xn = (25173xn-1 + 13849) % 216 x0 = 1 Least significant bit is always 1 Bit 2 is always 0 Bit 3 has a cycle of 2 Bit 4 has a cycle of 4 Bit 5 has a cycle of 8 n decimal binary 1 25173 01100010 01010101 2 12345 00110000 00111001 3 54509 11010100 11101101 4 27825 01101100 10110001 5 55493 11011000 11000101 6 25449 01100011 01101001 7 13277 00110011 11011101
Myths about Random- number Generation For all multiplicative LCGs The Lth bit has a period that is at most 2L For LCGs, with the form xn = axn-1 % 2k The least significant bit is always 0 or 1 High-order bits are more random
More on Random Number Generations Mersenne twister Period =~ 219937-1 /dev/random Extract randomness from physical devices Truly random
White Slide