STAT 534: Statistical Computing

Slides:



Advertisements
Similar presentations
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Advertisements

Monte Carlo Methods and Statistical Physics
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Markov Chains 1.
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
The Rate of Concentration of the stationary distribution of a Markov Chain on the Homogenous Populations. Boris Mitavskiy and Jonathan Rowe School of Computer.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Lecture 3: Markov processes, master equation
Bayesian Reasoning: Markov Chain Monte Carlo
Graduate School of Information Sciences, Tohoku University
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
1 CE 530 Molecular Simulation Lecture 8 Markov Processes David A. Kofke Department of Chemical Engineering SUNY Buffalo
What if time ran backwards? If X n, 0 ≤ n ≤ N is a Markov chain, what about Y n = X N-n ? If X n follows the stationary distribution, Y n has stationary.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Approximation Algorithms
Monte Carlo Methods in Partial Differential Equations.
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
Introduction to Monte Carlo Methods D.J.C. Mackay.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
1 Lesson 3: Choosing from distributions Theory: LLN and Central Limit Theorem Theory: LLN and Central Limit Theorem Choosing from distributions Choosing.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
Monte Carlo Methods.
Monte Carlo Methods Versatile methods for analyzing the behavior of some activity, plan or process that involves uncertainty.
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 3.
Simulated Annealing.
SECTION 13.8 STOKES ’ THEOREM. P2P213.8 STOKES ’ VS. GREEN ’ S THEOREM  Stokes ’ Theorem can be regarded as a higher- dimensional version of Green ’
Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ.
1 8. One Function of Two Random Variables Given two random variables X and Y and a function g(x,y), we form a new random variable Z as Given the joint.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
STAT 534: Statistical Computing Hari Narayanan
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
7. Metropolis Algorithm. Markov Chain and Monte Carlo Markov chain theory describes a particularly simple type of stochastic processes. Given a transition.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Javier Junquera Importance sampling Monte Carlo. Cambridge University Press, Cambridge, 2002 ISBN Bibliography.
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Monte Carlo Simulation of Canonical Distribution The idea is to generate states i,j,… by a stochastic process such that the probability  (i) of state.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
COMPUTERS SIMULATION IN MATHEMATICS EDUCATION Dr. Ronit Hoffmann Kibbutzim College of Education, Israel.
Announcements Topics: -sections 7.3 (definite integrals), 7.4 (FTC), and 7.5 (additional techniques of integration) * Read these sections and study solved.
Computer Graphics Lecture 07 Ellipse and Other Curves Taqdees A. Siddiqi
The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.
Introduction to Sampling based inference and MCMC
MCMC Output & Metropolis-Hastings Algorithm Part I
Markov Chain Monte Carlo methods --the final project of stat 6213
Optimization of Monte Carlo Integration
Advanced Statistical Computing Fall 2016
Markov Chains Mixing Times Lecture 5
Path Coupling And Approximate Counting
Jun Liu Department of Statistics Stanford University
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Haim Kaplan and Uri Zwick
Markov Chain Monte Carlo: Metropolis and Glauber Chains
8. One Function of Two Random Variables
Metropolis Light Transit
8. One Function of Two Random Variables
Presentation transcript:

STAT 534: Statistical Computing Hari Narayanan harin@uw.edu

Course objectives Write programs in R and C tailored to specifics of statistics problems you want to solve Familiarize yourself with: optimization techniques Markov Chain Monte Carlo (mcmc)

Logistics Class: Office hours: Textbooks: Evaluation: Tuesdays and Thursdays 12:00pm – 1:20pm Office hours: Thursday 2:30pm – 4pm (Padelford B-301) or by appt Textbooks: Robert & Casella : Introducing Monte Carlo Methods with R Kernighan & Ritchie : C Programming Language Evaluation: 4 assignments 2 quizzes Final project

Note to students These slides contain more material than Was covered in class. The main source http://pages.uoregon.edu/dlevin/MARKOV/markovmixing.pdf is referred to at http://faculty.washington.edu/harin/stat534-lectures.html

Review of MCMC done so far Natural questions 1. What is the Monte carlo method? an elementary example - (a) finding the area of an irregular region on the plane. (b) integrating a nonnegative function over an area

(a) Area of an irregular region Bound the region by a rectangle Pick points uniformly with rectangle Area of region/Area of rectangle= (Number in)/Total Example: Area of a circle of radius 1 /(area of square of side 2)= π

Approximating Pi Draw a circle with radius =1 Inside a 2x2 square – [-1,1] x [-1,1] Pick points (x,y) taking x<- rnorm(n, min=-1,max=1); y<- rnorm(n, min=-1,max=1) Let int be the number of points which fell inside the circle. Approx_pi = int/4

Integrating a nonnegative function over an area Pick points according to  a suitable distribution How to pick points according to a given distribution?

Picking points randomly for area of irregular region pick points uniformly at random within a larger regular region and count how many fall within the region in question. for finding the integral of a nonnegative function over an irregular region, pick points according to  a suitable distribution How to pick points according to a given distribution?

The markov chain method Pick a Markov chain whose steady state distribution is the desired distribution. Start from any point and let the markov chain run for sometime. Pick the point you reach. You would have picked the point you reached with approximately the steady state probability for that point

Markov Chain + filter=New MC Here you begin with a certain Markov chain with a symmetric transition matrix But you make your move to the next point with an additional condition of acceptance or rejection

Details of the Metropolis Filter Let T be a symmetric transition matrix. (T is reversible with respect to the uniform Probability distribution on the sample space Ω.) We will modify the transitions made according to T to obtain a chain with stationary distribution π Ωπ

Details of Metropolis Filter Evolution of the new chain : When at state x, candidate move generated from T(x,.). If proposed new state is y. then the move is censored with probability (1-a(x,y)) , equivalently with probability a(x,y) the new state is the proposed state and with probability (1-a(x,y)), the chain remains at x. Note : This is a new Markov chain with transition matrix, say P Ωπ

What is the new transition matrix? P(x,y) = T(x,y) a(x,y) if y not equal to x = (1 -∑z T(x,z)a(x,z)), z not equal to x if y=x We know transition matrix P has stationary distribution π if π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x) for all x not equal to y.

Transition matrix P has stationary distribution π if π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x) for all x not equal to y. Proof: For p to be stationary distribution for transition matrix P(x,y), we need πP= π, i.e ∑ over x( π (x)P(x,y)) = π (y). Here take P(x,y)=T(x,y)a(x,y). But π(x)P(x,y)= π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x) = π (y)P(y,x) So ∑ over x( π (x)P(x,y)) = ∑ over x( π (y)P(y,x)) = π (y) (∑ over x P(y,x))= π (y) since (∑ over x P(y,x))= 1.

Back to the filter The acceptance probability of a move is a(x,y) Rejection of moves slows the chain. So make a(x,y) as large as possible. How large can it be? We will force the `nice’ condition π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x),i.e., π(x)a(x,y)= π(y)a(y,x). Since a(x,y) is a probability π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x) <= π(x) and π(y) So a(x,y)<= min(π(y)/ π(x), 1), since T is symmetric and a(x,y) is a probability

Metropolis chain of a Markov chain T(x,y) Transition matrix of Markov Chain P(x,y) Transition matrix of Metropolis Chain Is therefore defined as follows P(x,y)= T(x,y) (min( π(y)/ π(x), 1), if y not = x = 1- ∑ over z not=x (P(x,z)) if y = x For this P(x,y) , π is a stationary distribution

Why all this complication? Note that the Metropolis chain is `just another Markov Chain’. So why not start with it right away instead of starting with some other Markov Chain and then do this acceptance/rejection?

Reasons for the indirect procedure 1. May be difficult to build a Markov Chain directly for a stationary distribution π 2. The Metropolis chain depends only on the ratio π(y)/ π(x). Suppose we know π only within some normalizing constant M which is difficult to compute. Metropolis chain does not care what M is.

Numbers assigned to vertices of a graph. Find max value vertex If you `greedily’ move from any point to a higher neighbour you can get trapped at a local maximum Consider a regular graph Random walk has a symmetric transition matrix For k>=1, define πk (x)= k^(f(x))/Z (k) where f(x) is the number assigned to the vertex and Z(k)= ∑ k^(f(x)) over all x, making πk (x), a probability measure.

Suppose number of vertices Large Markov chain corresponding to π, difficult to compute because Z difficult to compute. But Metropolis chain of the `random walk Markov Chain’ accepts transition with probability k^(f(x)-f(y)). We will show that for large k this randomized algorithm ultimately gets to a maximum vertex. I

Proof that Metropolis method hits max value vertex Since πk (x)= k^(f(x))/Z (k), this distribution favours vertices with large value of f(x). Let f* be the max value that a vertex has. Define Ω* = collection of all vertices with max value. Now πk (x) can be rewritten as ( k^(f(x))/k^(f*)) / (Z (k)/ k^(f*)) (Z (k)/ k^(f*)) can be written as |Ω*|(k^f*/k^f*) + summation over the remaining elements of the expression k^f(x)/k^f* For large k, therefore, πk (x) converges to 1 over the vertices in Ω*, i.e with probability 1 we hit one of the vertices with max value.