Validating a Random Number Generator

Slides:



Advertisements
Similar presentations
Statistical Techniques I
Advertisements

CHI-SQUARE(X2) DISTRIBUTION
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Outline input analysis input analyzer of ARENA parameter estimation
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Ch11 Curve Fitting Dr. Deshi Ye
Hypothesis Testing IV Chi Square.
Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.
Statistics.
Distinguishing Features of Simulation Time (CLK)  DYNAMIC focused on this aspect during the modeling section of the course Pseudorandom variables (RND)
Properties of Random Numbers
APPENDIX D RANDOM NUMBER GENERATION
Fall 2011 CSC 446/546 Part 6: Random Number Generation.
SIMPLE LINEAR REGRESSION
1 Statistical Distribution Fitting Dr. Jason Merrick.
Chapter 7 Random-Number Generation
Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Modeling and Simulation Random Number Generators
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Validating a Random Number Generator Based on: A Test of Randomness Based on the Consecutive Distance Between Random Number Pairs By: Matthew J. Duggan,
R ANDOM N UMBER G ENERATORS Modeling and Simulation CS
0 Simulation Modeling and Analysis: Input Analysis 7 Random Numbers Ref: Law & Kelton, Chapter 7.
MONTE CARLO METHOD DISCRETE SIMULATION RANDOM NUMBER GENERATION Chapter 3 : Random Number Generation.
Generating Random Variates
Virtual University of Pakistan
CS 9633 Machine Learning Support Vector Machines
Step 1: Specify a null hypothesis
Regression and Correlation
Chapter 4 Basic Estimation Techniques
Chapter 7. Classification and Prediction
Random Numbers All stochastic simulations need to “generate” IID U(0,1) “random numbers” Other random variates coming from other distribution can be generated.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Nonparametric test Nonparametric tests are decoupled from the distribution so the tested attribute may also be used in the case of arbitrary distribution,
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
P-values.
Chapter 11: Simple Linear Regression
Chapter 11 Hypothesis Testing II
Random Number Generators
Elementary Statistics
Goodness-of-Fit Tests
Chapter 10 Verification and Validation of Simulation Models
CHAPTER 29: Multiple Regression*
Chapter 11 Analysis of Variance
Chapter 7 Random Number Generation
Chapter 7 Random-Number Generation
Properties of Random Numbers
Discrete Event Simulation - 4
Discrete Event Simulation - 5
10701 / Machine Learning Today: - Cross validation,
Chapter 8 Random-Variate Generation
Goodness-of-Fit Tests Applications
INTRODUCTION TO HYPOTHESIS TESTING
One-Way Analysis of Variance
Monte Carlo I Previous lecture Analytical illumination formula
Testing Hypotheses about a Population Proportion
Product moment correlation
Computer Simulation Techniques Generating Pseudo-Random Numbers
SIMPLE LINEAR REGRESSION
The Examination of Residuals
ENM 310 Design of Experiments and Regression Analysis Chapter 3
Random Number Generation
Essentials of Statistics for Business and Economics (8e)
Comparison of data distributions: the power of Goodness-of-Fit Tests
Chapter 9 Estimation: Additional Topics
Multiple Regression Berlin Chen
Generating Random Variates
15 Chi-Square Tests Chi-Square Test for Independence
Empirical Distributions
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Validating a Random Number Generator Based on: A Test of Randomness Based on the Consecutive Distance Between Random Number Pairs By: Matthew J. Duggan, John H. Drew, Lawrence M. Leemis Presented By: Sarah Daugherty MSIM 852 Fall 2007

Introduction Random numbers are critical to Monte Carlo simulation, discrete event simulation, and bootstrapping There is a need for RNG with good statistical properties. One of the most popular methods for generating random numbers in a computer program is a Lehmer RNG.

Lehmer Random Number Generators Lehmer’s algorithm: an iterative equation produces a stream of random numbers. Requires 3 inputs: m, a, and x0. m = modulus, a large fixed prime number a = multiplier, a fixed positive integer < m x0 = initial seed, a positive integer < m Produces integers in the range (1, m-1)

Problem Lehmer RNG are not truly random With carefully chosen m and a, it’s possible to generate output that is “random enough” from a statistical point of view. However, still considered good generators because their output can be replicated, they’re portable, efficient, and thoroughly documented. Marsaglia (1968) discovered too much regularity in Lehmer RNG’s.

Marsaglia’s Discovery He observed a lattice structure when consecutive random numbers were plotted as overlapping ordered pairs. ((x0, x1, x2,…, xn), (x1, x2,…, xn+1)) Lattice created using m = 401, a = 23. Does not appear to be random at all; BUT a degree of randomness MAY be hidden in it.

Solution Find the hidden randomness in the order in which the points are generated. The observed distribution of the distance between consecutive RN’s should be close to the theoretical distance. Develop a test based on these distances. Hoping to observe that points generally are not generated in order along a plane or in a regular pattern between planes. The assumption being made is that generators that yield a distribution of distances similar to the purely random points will be better random number generators.

Overlapping vs. Non-overlapping Pairs Considering distance between consecutive pairs of random numbers, points can be overlapping or non-overlapping. Overlapping: (xi, xi+1), (xi+1, xi+2) Non-overlapping: (xi, xi+1), (xi+2, xi+3) Both approaches are valid. The non-overlapping case is mathematically easier in that the 4 numbers represented are independent therefore the 2 points they represent are also independent.

Non-overlapping Theoretical Distribution If we assume X1, X2, X3, X4 are IID U(0,1) random variables, we can find the distance between (X1, X2) and (X3, X4) by:

Non-overlapping Theoretical Distribution The cumulative distribution, F(x), of D.

Goodness-of-Fit Test Now we can compare our theoretical distribution against the Lehmer generator. Convert the distances between points into an empirical distribution, F(x), which will allow us to perform a hypothesis test. ^ ^ ^ N(x) = # of values that do not exceed x n = # of distances collected

Hypothesis Testing Kolmogorov-Smirnov (KS) test KS test statistic is the largest distance between F(x) and F(x) for all values of x. A large value indicates a poor fit, thus rejecting H0. Cramer-von Mises (CVM) test Integral of the squared differences of the theoretical and empirical distribution. If this value is larger than the tabulated value, reject H0. Anderson-Darling (AD) test Designed to detect discrepancies in the tails of distributions by giving weights to the differences. Weights are largest for F(x) close to 1 (right tail) and F(x) close to 0 (left tail). ^ KS test - Compares the closeness between F and F-hat. AD test – a drawback of KS test is that the same weight is given to the differences for every value of x ^ ^

Classification of Results Based on results of 3 hypothesis tests (KS, CVM, and AD tests), each RNG can be classified as: Good – the null hypothesis was not rejected in any test. Suspect – the null hypothesis was rejected in 1 or 2 of the tests. Bad – the null hypothesis was rejected in all 3 tests.

Results Interesting cases are when a multiplier is rejected by only 1 or 2 of the 3 tests. See a = 3 in table.

Distances connecting pairs Random number pairs Distances connecting pairs F(x) (solid) vs. F(x) (dotted) ^ Good Suspect Bad

Summary A test of randomness was developed for Lehmer RNG’s based on distance between consecutive pairs of random numbers. Since some multipliers are rejected by only one or two of the 3 hypothesis tests, the distance between parallel hyperplanes should not be used as the only basis for a test of randomness. The order in which pairs are generated is a second factor to consider.

Critique Potential – limited. Many other tests exist for validating RNG’s. Impact – minimal. Frequently used RNG’s use a modulus much larger than the m=401 used here. Overall – paper is well written; in it’s current state, this test is a justified addition to collection of tests for RNG’s. Future – use larger modulus; improve theoretical distribution by improving numerical calculations of integral for cdf; test other non-Lehmer generators such as additive linear, composite, or quadratic.