SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

CS479/679 Pattern Recognition Dr. George Bebis
Monte Carlo Methods and Statistical Physics
Efficient Cosmological Parameter Estimation with Hamiltonian Monte Carlo Amir Hajian Amir Hajian Cosmo06 – September 25, 2006 Astro-ph/
NASSP Masters 5003F - Computational Astronomy Lecture 5: source detection. Test the null hypothesis (NH). –The NH says: let’s suppose there is no.
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Markov-Chain Monte Carlo
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian Reasoning: Markov Chain Monte Carlo
BAYESIAN INFERENCE Sampling techniques
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Bayesian Analysis of X-ray Luminosity Functions A. Ptak (JHU) Abstract Often only a relatively small number of sources of a given class are detected in.
Chapter 6 Introduction to Sampling Distributions
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Lecture II-2: Probability Review
Introduction to Monte Carlo Methods D.J.C. Mackay.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
1 Physical Fluctuomatics 5th and 6th Probabilistic information processing by Gaussian graphical model Kazuyuki Tanaka Graduate School of Information Sciences,
1 Lesson 3: Choosing from distributions Theory: LLN and Central Limit Theorem Theory: LLN and Central Limit Theorem Choosing from distributions Choosing.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Random Numbers and Simulation  Generating truly random numbers is not possible Programs have been developed to generate pseudo-random numbers Programs.
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Monte Carlo Methods So far we have discussed Monte Carlo methods based on a uniform distribution of random numbers on the interval [0,1] p(x) = 1 0  x.
An Efficient Sequential Design for Sensitivity Experiments Yubin Tian School of Science, Beijing Institute of Technology.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
Sampling and estimation Petter Mostad
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
Lecture 8 Source detection NASSP Masters 5003S - Computational Astronomy
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Tutorial I: Missing Value Analysis
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
NASSP Masters 5003F - Computational Astronomy Lecture 4: mostly about model fitting. The model is our estimate of the parent function. Let’s express.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Introduction to Sampling based inference and MCMC
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Computing and Statistical Data Analysis / Stat 7
Computing and Statistical Data Analysis / Stat 10
Presentation transcript:

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy University of Glasgow

6.Advanced Numerical Methods Part 1:Monte Carlo Methods Part 2:Fourier Methods SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

1.Uniform random numbers Generating uniform random numbers, drawn from the pdf U[0,1], is fairly easy. Any scientific Calculator will have a RAN function… Better examples of U[0,1] random number generators can be found in Numerical Recipes p(x) x Part 1: Monte Carlo Methods SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

6.1.Uniform random numbers Generating uniform random numbers, drawn from the pdf U[0,1], is fairly easy. Any scientific Calculator will have a RAN function… Better examples of U[0,1] random number generators can be found in Numerical Recipes. In what sense are they better?… p(x) x Algorithms only generate pseudo-random numbers: very long (deterministic) sequences of numbers which are approximately random (i.e. no discernible pattern). The better the RNG, the better it approximates U[0,1]

We can test pseudo-random numbers for randomness in several ways: (a) Histogram of sampled values. We can use hypothesis tests to see if the sample is consistent with the pdf we are trying to model. e.g. Chi-squared test, applied to the to the numbers in each histogram bin. x 1 0

We can test pseudo-random numbers for randomness in several ways: (a) Histogram of sampled values. We can use hypothesis tests to see if the sample is consistent with the pdf we are trying to model. e.g. Chi-squared test, applied to the to the numbers in each histogram bin. x 1 0 Assume the bin number counts are subject to Poisson fluctuations, so that Note: no. of degrees of freedom = n bin – 1 since we know the total sample size.

(b) Correlations between neighbouring pseudo-random numbers Sequential patterns in the sampled values would show up as structure in the phase portraits – scatterplots of the i th value against the (i+1) th value etc. (see Gregory, Chapter 5) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

(b) Correlations between neighbouring pseudo-random numbers Sequential patterns in the sampled values would show up as structure in the phase portraits – scatterplots of the i th value against the (i+1) th value etc. We can compute the Auto-correlation function j is known as the Lag If the sequence is uniformly random, we expect  j) = 1 j = 0 for  j) = 0 otherwise {

The procedure is similar to changing variables in integration. Let be monotonic Then Probability of number between y and y+dy Probability of number between x and x+dx Because probability must be positive 6.2.Variable transformations Generating random numbers from other pdfs can be done by transforming random numbers drawn from simpler pdfs. Suppose, e.g.

We can extend the expression given previously to the case where is not monotonic, by calculating so that SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Example 1 Suppose we have Then Define So i.e. or b 0 a 1 b-a y p(y) p(x) x

Example 2 Numerical recipes provides a program to turn into Suppose we want We define so that Now so Normal pdf with mean zero and standard deviation unity SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Example 2 Numerical recipes provides a program to turn into Suppose we want We define so that Now so Normal pdf with mean zero and standard deviation unity Variable transformation formula also the basis for ‘error propagation’ formulae we use in data analysis - see also SUPAIDA course

Question 13: If and, the pdf of is ABCDABCD

Question 13: If and, the pdf of is ABCDABCD

Suppose we can compute the CDF of some desired random variable 6.3. Probability integral transform One particular variable transformation merits special attention. SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

1)Sample a random variable 2)Compute such that i.e. SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

1)Sample a random variable 2)Compute such that i.e. SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

1)Sample a random variable 2)Compute such that i.e. 3)Then i.e. is drawn from the pdf corresponding to the cdf SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Example (from Gregory) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

6.4. Rejection sampling Suppose we want to sample from some pdf and we know that 1)Sample from 2)Sample (Suppose we have an ‘easy’ way to do this) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

6.4. Rejection sampling Suppose we want to sample from some pdf and we know that 1)Sample from 2)Sample 3)If ACCEPT otherwise REJECT (Suppose we have an ‘easy’ way to do this) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

6.4. Rejection sampling Suppose we want to sample from some pdf and we know that 1)Sample from 2)Sample 3)If ACCEPT otherwise REJECT (Suppose we have an ‘easy’ way to do this) Set of accepted values are a sample from (following Mackay) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Method can be very slow if the shaded region is too large. Ideally we want to find a pdf that is: (a) easy to sample from (b) close to 6.4. Rejection sampling SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

6.5. Genetic Algorithms (Charbonneau 1995) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

6.5. Genetic Algorithms (Charbonneau 1995) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

6.5. Genetic Algorithms (Charbonneau 1995) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

see Genetic Algorithms SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

see Genetic Algorithms SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

6.6. Markov Chain Monte Carlo This is a very powerful, new (at least in astronomy!) method for sampling from pdfs. (These can be complicated and/or of high dimension). MCMC widely used e.g. in cosmology to determine ‘maximum likelihood’ model to CMBR data. Angular power spectrum of CMBR temperature fluctuations ML cosmological model, depending on 7 different parameters. (Hinshaw et al 2006)

Consider a 2-D example (e.g. bivariate normal distribution); Likelihood function depends on parameters a and b. Suppose we are trying to find the maximum of 1)Start off at some randomly chosen value 2)Compute and gradient 3)Move in direction of steepest +ve gradient – i.e. is increasing fastest 4)Repeat from step 2 until converges on maximum of likelihood b a L(a,b) ( a 1, b 1 ) L( a 1, b 1 ) ( a n, b n ) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Consider a 2-D example (e.g. bivariate normal distribution); Likelihood function depends on parameters a and b. Suppose we are trying to find the maximum of 1)Start off at some randomly chosen value 2)Compute and gradient 3)Move in direction of steepest +ve gradient – i.e. is increasing fastest 4)Repeat from step 2 until converges on maximum of likelihood b a L(a,b) ( a 1, b 1 ) L(a,b) L( a 1, b 1 ) ( a n, b n ) OK for finding maximum, but not for generating a sample from or for determining errors on the the ML parameter estimates.

MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) Slice through L(a,b) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) Slice through L(a,b) 1.Sample random initial point P1P1 P 1 = ( a 1, b 1 ) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) Slice through L(a,b) 1.Sample random initial point 2.Centre a new pdf, Q, called the proposal density, on P1P1 P1P1 P 1 = ( a 1, b 1 ) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) Slice through L(a,b) 1.Sample random initial point 2.Centre a new pdf, Q, called the proposal density, on 3.Sample tentative new point from Q P1P1 P’ P1P1 P’ = ( a’, b’ ) P 1 = ( a 1, b 1 ) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) Slice through L(a,b) 1.Sample random initial point 2.Centre a new pdf, Q, called the proposal density, on 3.Sample tentative new point from Q 4.Compute P1P1 P’ P1P1 P’ = ( a’, b’ ) P 1 = ( a 1, b 1 ) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

5.If R > 1 this means is uphill from. We accept as the next point in our chain, i.e. 6.If R < 1 this means is downhill from. In this case we may reject as our next point. In fact, we accept with probability R. P’P1P1 P 2 = P’ P’P1P1 How do we do this?… (a) Generate a random number x ~ U[0,1] (b) If x < R then accept and set (c) If x > R then reject and set P’P 2 = P’ P’P 2 = P 1 SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

5.If R > 1 this means is uphill from. We accept as the next point in our chain, i.e. 6.If R < 1 this means is downhill from. In this case we may reject as our next point. In fact, we accept with probability R. P’P1P1 P 2 = P’ P’P1P1 How do we do this?… (a) Generate a random number x ~ U[0,1] (b) If x < R then accept and set (c) If x > R then reject and set P’P 2 = P’ P’P 2 = P 1 Acceptance probability depends only on the previous point - Markov Chain

So the Metropolis Algorithm generally (but not always) moves uphill, towards the peak of the Likelihood Function. Remarkable facts Sequence of points represents a sample from the LF Sequence for each coordinate, e.g. samples the marginalised likelihood of We can make a histogram of and use it to compute the mean and variance of ( i.e. to attach an error bar to ) { } P 1, P 2, P 3, P 4, P 5, … L(a,b) a 1, a 2, a 3, a 4, a 5, … a { } a 1, a 2, a 3, a 4, a 5, …, a n a a SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Why is this so useful?… Suppose our LF was a 1-D Gaussian. We could estimate the mean and variance quite well from a histogram of e.g samples. But what if our problem is, e.g. 7 dimensional? ‘Exhaustive’ sampling could require (1000) 7 samples! MCMC provides a short-cut. To compute a new point in our Markov Chain we need to compute the LF. But the computational cost does not grow so dramatically as we increase the number of dimensions of our problem. This lets us tackle problems that would be impossible by ‘normal’ sampling. Sampled value No. of samples

Example: CMBR constraints from WMAP 3 year data ( + 1 year data) Angular power spectrum of CMBR temperature fluctuations ML cosmological model, depending on 7 different parameters. (Hinshaw et al 2006) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Question 14: When applying the Metropolis algorithm, if the width of the proposal density is very small A the Markov Chain will move around the parameter space very slowly B the Markov Chain will converge very quickly to the true pdf C the acceptance rate of proposed steps in the Markov Chain will be very small D most steps in the Markov Chain will explore regions of very low probability

Question 14: When applying the Metropolis algorithm, if the width of the proposal density is very small A the Markov Chain will move around the parameter space very slowly B the Markov Chain will converge very quickly to the true pdf C the acceptance rate of proposed steps in the Markov Chain will be very small D most steps in the Markov Chain will explore regions of very low probability

A number of factors can improve the performance of the Metropolis algorithm, including: using parameters in the likelihood function which are (close to) independent (i.e. their Fisher matrix is approx. diagonal). adopting a judicious choice of proposal density, well matched to thewell matched shape of the likelihood function. using a simulated annealing approach – i.e. sampling from a modified posterior likelihood function of the form for large the modified likelihood is a flatter version of the true likelihood SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Temperature parameter starts out large, so that the acceptance rate for ‘downhill’ steps is high – search is essentially random. (This helps to avoid getting stuck in local maxima) is gradually reduced as the chain evolves, so that ‘downhill’ steps become increasingly disfavoured. In some versions, the evolution of is carried out automatically – this is known as adaptive simulated annealing. See, for example, Numerical Recipes Section 10.9, or Gregory Chapter 11, for more details. SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

A related idea is parallel tempering (see e.g. Gregory, Chap 12) Series of MCMC chains, with different, set off in parallel, with a certain probability of swapping parameter states between chains. High temperature chains are effective at mapping out the global structure of the likelihood surface. Low temperature chains are effective at mapping out the shape of local likelihood maxima. SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Example: spectral line fitting, from Section 3. Conventional MCMC MCMC with parallel tempering SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Example: spectral line fitting, from Section 3. Conventional MCMC MCMC with parallel tempering SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Average likelihood, weighted by prior Calculating the evidence can be computationally very costly (e.g. CMBR spectrum in cosmology) How to proceed?... 1.Information criteria (Liddle 2004, 2007) 1.Laplace and Savage-Dickey approximations (Trotta 2005) 3.Nested sampling(Skilling 2004, 2006; ) (Mukherjee et al. 2005, 2007; Sivia 2006) Approximating the Evidence SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Nested Sampling (Skilling 2004, 2006; Mukherjee et al 2005, 2007) Key idea: We can rewrite the Evidence as where X is a 1-D variable known as the prior mass uniformly distributed on [0,1] SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Skilling (2006) Example: 2-D Likelihood function SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Example: 2-D Likelihood function (from Mackay 2005) Contours of constant likelihood, L Each contour encloses a different fraction, X, of the area of the square Each point in the plane has an associated value of L and X However, mapping systematically the relationship between L and X everywhere may be computationally very costly SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

However, mapping systematically the relationship between L and X everywhere may be computationally very costly SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Skilling (2006) Approximation procedure SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Skilling (2006) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009