Statistical Methods for Data Analysis Random number generators Luca Lista INFN Napoli.

Slides:



Advertisements
Similar presentations
Statistical Methods for Data Analysis Multivariate discriminators with TMVA Luca Lista INFN Napoli.
Advertisements

Statistical Methods for Data Analysis Random numbers with ROOT and RooFit Luca Lista INFN Napoli.
RSLAB-NTU Lab for Remote Sensing Hydrology and Spatial Modeling 1 An Introduction to R Pseudo Random Number Generation (PRNG) Prof. Ke-Sheng Cheng Dept.
9. Two Functions of Two Random Variables
Generating Random Numbers
Fast Algorithms For Hierarchical Range Histogram Constructions
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 6.
Monte Carlo Methods and Statistical Physics
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Random number generation Algorithms and Transforms to Univariate Distributions.
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
CF-3 Bank Hapoalim Jun-2001 Zvi Wiener Computational Finance.
Discrete Event Simulation How to generate RV according to a specified distribution? geometric Poisson etc. Example of a DEVS: repair problem.
Pricing an Option Monte Carlo Simulation. We will explore a technique, called Monte Carlo simulation, to numerically derive the price of an option or.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Statistics.
Generating Continuous Random Variables some. Quasi-random numbers So far, we learned about pseudo-random sequences and a common method for generating.
K. Desch – Statistical methods of data analysis SS10
CSCE Monte Carlo Methods When you can’t do the math, simulate the process with random numbers Numerical integration to get areas/volumes Particle.
A) Transformation method (for continuous distributions) U(0,1) : uniform distribution f(x) : arbitrary distribution f(x) dx = U(0,1)(u) du When inverse.
Properties of Random Numbers
The Monte Carlo Method: an Introduction Detlev Reiter Research Centre Jülich (FZJ) D Jülich
Lecture II-2: Probability Review
Random Number Generation Fall 2013
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Introduction to Monte Carlo Methods D.J.C. Mackay.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
Simulation of Random Walk How do we investigate this numerically? Choose the step length to be a=1 Use a computer to generate random numbers r i uniformly.
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
Random Number Generators CISC/QCSE 810. What is random? Flip 10 coins: how many do you expect will be heads? Measure 100 people: how are their heights.
01/24/05© 2005 University of Wisconsin Last Time Raytracing and PBRT Structure Radiometric quantities.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Continuous Probability Distributions  Continuous Random Variable  A random variable whose space (set of possible values) is an entire interval of numbers.
Random Sampling, Point Estimation and Maximum Likelihood.
Monte Carlo Simulation and Personal Finance Jacob Foley.
General Principle of Monte Carlo Fall 2013 By Yaohang Li, Ph.D.
F.F. Assaad. MPI-Stuttgart. Universität-Stuttgart Numerical approaches to the correlated electron problem: Quantum Monte Carlo.  The Monte.
1 Lesson 3: Choosing from distributions Theory: LLN and Central Limit Theorem Theory: LLN and Central Limit Theorem Choosing from distributions Choosing.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Functions of Two Random.
Monte Carlo I Previous lecture Analytical illumination formula This lecture Numerical evaluation of illumination Review random variables and probability.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
Monte Carlo Methods.
Experimental Method and Data Process: “Monte Carlo Method” Presentation # 1 Nafisa Tasneem CHEP,KNU
Monte Carlo Methods Versatile methods for analyzing the behavior of some activity, plan or process that involves uncertainty.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Monte Carlo Methods So far we have discussed Monte Carlo methods based on a uniform distribution of random numbers on the interval [0,1] p(x) = 1 0  x.
Monté Carlo Simulation  Understand the concept of Monté Carlo Simulation  Learn how to use Monté Carlo Simulation to make good decisions  Learn how.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
1 Sampling Distribution of Arithmetic Mean Dr. T. T. Kachwala.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © 2005 Dr. John Lipp.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
Introduction to Computer Simulation of Physical Systems (Lecture 10) Numerical and Monte Carlo Methods (CONTINUED) PHYS 3061.
Lesson 8: Basic Monte Carlo integration
Introduction to Monte Carlo Method
Parallel Programming in C with MPI and OpenMP
Monte Carlo Approximations – Introduction
Chapter 7 Sampling Distributions.
Lecture 2 – Monte Carlo method in finance
Chapter 7 Sampling Distributions.
Monte Carlo I Previous lecture Analytical illumination formula
Lecture 4 - Monte Carlo improvements via variance reduction techniques: antithetic sampling Antithetic variates: for any one path obtained by a gaussian.
Chapter 7 Sampling Distributions.
Statistical Methods for Data Analysis Random number generators
Chapter 7 Sampling Distributions.
Presentation transcript:

Statistical Methods for Data Analysis Random number generators Luca Lista INFN Napoli

Luca ListaStatistical Methods for Data Analysis2 Pseudo-random generators Requirement: –Simulate random process with a computer E.g.: radiation interaction with matter, cosmic rays, particle interaction generators, … But also: finance, videogames, 3D graphics,... Problem: –Generate random (or almost random…) variables with a computer –… but computers are deterministic!

Luca ListaStatistical Methods for Data Analysis3 Pseudo-random numbers Definition: –Deterministic numeric sequences whose behavior is not easily predictable with simple analytic expressions –(Re-) producible with an algorithm based on mathematical formulae Statistical behavior similar to real random sequences

Luca ListaStatistical Methods for Data Analysis4 Example from chaos transition Lets fix an initial value x 0 Define by recursion the sequence: x n+1 = x n (1 – x n ) Depending on, the sequence will have different possible behaviors If the sequence converges, we would have, for n the limit x solving the equation: x = x (1 – x) x = (1- )/, 0

Luca ListaStatistical Methods for Data Analysis5 Stable behavior Actually, for sufficiently small starting from: x 0 = 0.5 the sequence converges xnxn n > 200

Luca ListaStatistical Methods for Data Analysis6 Bifurcation For > 3 the series does not converge, but oscillates between two values: x a = x b (1 – x b ) x b = x a (1 – x a ) xnxn n > 200

Luca ListaStatistical Methods for Data Analysis7 Bifurcation II, III, … Bifurcation repeats when grows Sequences of 4, 8, 16, … repeating values xnxn n > 200

Luca ListaStatistical Methods for Data Analysis8 Chaotic behavior xnxn 200 < n < For even larger the sequence is unpredictable. For instance, for values densely fills the interval [0, 1]

Luca ListaStatistical Methods for Data Analysis9 Transition to chaos

Luca ListaStatistical Methods for Data Analysis10 Another complete view

Luca ListaStatistical Methods for Data Analysis11 Properties of Random Numbers A good random sequence: {x 1, x 2, …, x n, …} should be made of elements that are independent and identically distributed (i.i.d.) : –P(x i ) = P(x j ), i, j –P(x n | x n 1 ) = P(x n ), n

Luca ListaStatistical Methods for Data Analysis12 (Pseudo-)random generators The standard C function drand48 is based on sequences of 48 bit integer numbers The sequence is defined as: x n+1 = (a x n + c) mod m where: m = 2 48 a = = 5DEECE66D (hex) c = 11 = B (hex) man drand48 for further information! Those numbers give a uniform distribution

Luca ListaStatistical Methods for Data Analysis13 Pseudo-random generators To convert into a floating-point number, just divide the integer by The result will be uniformly distributed from 0 to 1 (with precision 1/2 48 ) drand48, mrand48, lrand48 return random numbers with different precision using a sufficiently large number of bits from the main integer sequence

Luca ListaStatistical Methods for Data Analysis14 Random generators in ROOT TRandom (low period: 10 9 ) TRandom1 (Ranlux, F.James) TRandom2 (period: ) TRandom3 (period: ) ROOT::Math generators –GSL based, relatively new See dedicated slides

Luca ListaStatistical Methods for Data Analysis15 Probability distribution Within precision, the distribution is uniform (flat) r = drand48() n / r

Luca ListaStatistical Methods for Data Analysis16 Non uniform sequences In order to obtain a Gaussian distribution: average many numbers with any limited distribution –Central limit theorem r = 0; for ( int i = 0; i < n; i++ ) r += drand48(); r /= n; –Works, but inefficient!

Luca ListaStatistical Methods for Data Analysis17 Distribution of 1 / n i=1,n r i

Luca ListaStatistical Methods for Data Analysis18 Comparison with true Gaussians

Luca ListaStatistical Methods for Data Analysis19 Generate a known PDF Given a PDF: Its cumulative distribution is defined as:

Luca ListaStatistical Methods for Data Analysis20 Inverting the cumulative If the inverse of the cumulative distribution is known (or easily computable numerically) a variable x defined as: x = F 1 (r) is distributed according to the PDF f(x) if r is uniformly distributed between 0 and 1

Luca ListaStatistical Methods for Data Analysis21 Demonstration As r = F(x), then: hence: If r has a uniform distribution, then dP/dr = 1, hence dP/dx = f(x)

Luca ListaStatistical Methods for Data Analysis22 Example Exponential distribution: Normalization: 1 r and r have both uniform distribution between 0 and 1

Luca ListaStatistical Methods for Data Analysis23 Generate uniformly over a sphere Generate and. Factorize the PDF:

Luca ListaStatistical Methods for Data Analysis24 Generating Gaussian numbers Gaussian cumulative not easily invertible (erf) Solution: –Generate simultaneously two independently Gaussian numbers From the inversion of 2D radial cumulative function: Box-Muller transformation: float r = sqrt(-2*log(drand48()); float phi = 2*pi*drand48(); float y1 = r*cos(phi), y2 = r*sin(phi); Other faster alternative are available (e.g.: Ziggurat)

Luca ListaStatistical Methods for Data Analysis25 Hit or miss Monte Carlo Reproduce a generic distribution: 1.Extract x flat from a to b 2.Compute f = f(x) 3.Extract r from 0 to m, where m max x f(x) 4.If r > f repeat extraction, if r < f accept In this way, the density is proportional to f(x) May be inefficient if the function is very peaked! Finding maximum of f may be slow in many dimensions x f(x) a b m hit miss

Luca ListaStatistical Methods for Data Analysis26 Example: compute an integral double f(double x){ return pow(sin(x)/x, 2); } int main() { const double a = 0, b = , m = 1; int tot = 0; for(int i = 0; i < 10000; ++i) { do { double x = a + (b – a) * drand48(); double ff = f(x); ++tot; double r = drand48() * m; } while (r > ff); } double ratio = double(hit)/double(tot); double error = sqrt(ratio * (1 – ratio)/tot); double area = (b – a) * m * ratio; return 0; }

Luca ListaStatistical Methods for Data Analysis27 Importance sampling The same method can be repeated in different regions: 1.Extract x in one of the regions (1), (2), or (3) with prob. proportional to the areas 2.Apply hit-or-miss in the randomly chosen region The density is still prop. to f(x), but a smaller number of extraction is sufficient (and the program runs faster!) Variation: use hit or miss within an envelope PDF whose cumulative has is easily invertible… x f(x) a0a0 a3a3 m a1a1 a2a2

Luca ListaStatistical Methods for Data Analysis28 Exercise Generate according to the following distribution ( 0 x < ):

Luca ListaStatistical Methods for Data Analysis29 Estimate the error on MC integral MC can also be a mean to estimate integrals Accepting n over N extractions, binomial distribution can be applied: n 2 = N (1 ) Where = n/N is the best estimate of. The error on the estimate of is: 2 = n/N 2 = (1 )/N

Luca ListaStatistical Methods for Data Analysis30 Multi-dimensional integral estimates The same Monte Carlo technique can be applied for multi-dimensional integral estimates, extracting independently the N coordinates (x 1, …, x n ) The error is always proportional to 1/ N, regardless of the dimension N –This is and advantage w.r.t. the standard numerical integration Difficulties: –Finding maximum of f numerically may be slow in many dimensions –Partitioning the integration range (importance sampling) may be non trivial to do automatically

Luca ListaStatistical Methods for Data Analysis31 References Logistic map, bifurcation and chaos – PDG: review of random numbers and Monte Carlo – GENBOD: phase space generator –F. James, Monte Carlo Phase Space, CERN (1968)