Introduction to Monte Carlo Methods D.J.C. Mackay Rasit Onur Topaloglu Ph.D. candidate

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Design Rule Generation for Interconnect Matching Andrew B. Kahng and Rasit Onur Topaloglu {abk | rtopalog University of California, San Diego.

Fast Algorithms For Hierarchical Range Histogram Constructions

Monte Carlo Methods and Statistical Physics

FTP Biostatistics II Model parameter estimations: Confronting models with measurements.

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.

Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)

11 - Markov Chains Jim Vallandingham.

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Bayesian statistics – MCMC techniques

Suggested readings Historical notes Markov chains MCMC details

BAYESIAN INFERENCE Sampling techniques

Visual Recognition Tutorial

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources Lerong Cheng 1, Jinjun Xiong 2, and Prof. Lei He 1 1 EE Department, UCLA.

Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.

1 Task Leader : Alex Orailoglu, UC San Diego Students : Rasit Onur Topaloglu, UC San Diego, 2007 Industrial Liaisons : Hosam Haggag, National Semiconductor.

MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.

Machine Learning CUNY Graduate Center Lecture 7b: Sampling.

Evaluating Hypotheses

Forward Discrete Probability Propagation Rasit Onur Topaloglu Ph.D. candidate

Maximum Likelihood (ML), Expectation Maximization (EM)

1 Rasit Onur Topaloglu and Alex Orailoglu University of California, San Diego Computer Science and Engineering Department.

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Monte Carlo Methods in Partial Differential Equations.

Introduction to Monte Carlo Methods D.J.C. Mackay.

Photo-realistic Rendering and Global Illumination in Computer Graphics Spring 2012 Stochastic Radiosity K. H. Ko School of Mechatronics Gwangju Institute.

Introduction to Adaptive Digital Filters Algorithms

01/24/05© 2005 University of Wisconsin Last Time Raytracing and PBRT Structure Radiometric quantities.

1 Lesson 3: Choosing from distributions Theory: LLN and Central Limit Theorem Theory: LLN and Central Limit Theorem Choosing from distributions Choosing.

SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Monte Carlo I Previous lecture Analytical illumination formula This lecture Numerical evaluation of illumination Review random variables and probability.

Module 1: Statistical Issues in Micro simulation Paul Sousa.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.

1 MGT 821/ECON 873 Numerical Procedures. 2 Approaches to Derivatives Valuation How to find the value of an option?  Black-Scholes partial differential.

Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen

Simulated Annealing.

For a new configuration of the same volume V and number of molecules N, displace a randomly selected atom to a point chosen with uniform probability inside.

CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.

Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.

Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,

Sequential Monte-Carlo Method -Introduction, implementation and application Fan, Xin

An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Machine Learning 5. Parametric Methods.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

2005 Unbinned Point Source Analysis Update Jim Braun IceCube Fall 2006 Collaboration Meeting.

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.

Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

Introduction to Sampling based inference and MCMC

Data Mining: Concepts and Techniques

Advanced Statistical Computing Fall 2016

Markov chain monte carlo

Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.

Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Monte Carlo I Previous lecture Analytical illumination formula

Expectation-Maximization & Belief Propagation

Presentation transcript:

Introduction to Monte Carlo Methods D.J.C. Mackay Rasit Onur Topaloglu Ph.D. candidate

Outline Problems Monte Carlo (MC) deals with Uniform Sampling Importance Sampling Rejection Sampling Metropolis Gibbs Sampling Speeding up MC : Hybrid MC and Over-relaxation

Definition of Problems Problem1: Generate samples from a distribution Problem2: Estimate expectation of a function given probability distribution for a variable

Monte Carlo for High Dimensions The accuracy is independent of the dimensionality of the space sampled Solving first problem => solving second one Just evaluate function using the samples and average them out

Sampling from P(x) If P*(x) can be evaluated, still a hard problem as: Z is notknown Not easy to draw at high dimensions (except Gaussian) ex: A sample from univariate Gaussian can be calculated as : Assume we can evaluate P(x) within a multiplicative constant u1 and u2 are uniform in [0,1]

Difficulty of Sampling at High Dimensions Can discretize the function (figure on right) and sample the discrete samples This is costly at high dimensions: B N for B bins and N dimensions

Uniform Sampling Draw samples uniformly from state space For distributions that have peaks at a small region, lots of points must be sampled so that  (x) is calculated a number of times => requires lots of samples Thus, uniform distribution seldom useful Z R is the normalizing constant: Estimate by: Tries to solve the 2 nd problem

Importance Sampling Tries to solve the 2 nd problem Introduce a simpler density Q Values of x where Q(x)>P(x) are over-represented; values of x where Q(x) introduce weights

Reliability of Importance Sampling An importance sampler should have heavy tails in problems where infrequent samples might be influential The estimate  vs number of samples Gaussian sampler Cauchy Sampler

Rejection Sampling Again, a Q (proposal density) assumed Also assume we know a constant c s.t.: Generate x using Q*(x) Find r.v. u uniformly in interval [0,cQ*(x)] If u<=P*(x), accept and add x to the random number list

Transition to Markov Chain MC Importance and rejection sampling work well if Q is similar to P In complex problems, it is difficult to find a single such Q over the state space

Metropolis Method Proposal density Q depends on current state x (t) :Q(x`;x (t) ) Accept new state if a  1, Else; accept with probability a: In comparison to rejection sampling, the rejected points are not discarded and hence influential on consequent samples => samples correlated => Metropolis may have to be run more to generate independent samples from P(x)!

Disadvantages of Large Step Size Useful for high dimensions Uses a length scale  much smaller than state space L Because: Large steps are highly unlikely to be accepted => Limited or no movement in space! => biased

A Lower Bound for Independent Samples Metropolis will explore using a random walk Random walks take a long time After T steps of size , state only moves T  As Monte Carlo is trying to achieve independent samples, requires ~(L/  ) 2 samples to get a sample independent of the initial condition This rule of thumb be used for a lower bound

An Example using this Bound time 100 th iteration 400 th iteration 1200 th iteration Metropolis still provides biases estimates even after a large number of iterations Takes ~ 10 2 =100 steps to reach an end state

Gibbs Sampling As opposed to previous methods, at least 2 dimensional distributions required Q defined in terms of conditional distributions of joint distribution P(x) The assumption is P(x) complex to evaluate but P(x i |{x j } j  i ) is tractable

Gibbs Sampling on an Example Start with x=(x 1,x 2 ), fix x 2 (t) and sample x 1 from P(x 1 |x 2 ) Fix x 1 and sample x 2 from P(x 2 |x 1 )

Gibbs Sampling with K Variables In comparison to Metropolis, every proposal is always accepted In bigger models, groups of variables jointly sampled:

Comparison of MC Methods in High Dimensions Importance and rejection sampling result in high weights and constants respectively, resulting in inaccurate or length simulations => not practical Metropolis requires at least (  max /  min ) 2 samples to acquire independent samples => might be lengthy Gibbs sampling has similar properties with Metropolis. No adjustable parameters => most practical

Practical Questions About Monte Carlo Can we predict how long it takes for equilibrium: Use the simple bound proposed Can we determine convergence in a running simulation: yet another difficult problem Can we speed up convergence time and time between independent samples?

Reducing Random Walk in Metropolis : Hybrid MC Most probabilities can be written in the form: Introduce a momentum variable p: K is kinetic energy: Create asymptotic samples from joint distribution: Pick p randomly from: Update x and p: This results in linear time convergence

An Illustrative Example for Hybrid MC Hybrid Random walk Random walk Over a number of iterations, hybrid trajectories indicate less correlated samples

Reducing Random Walk in Gibbs : Over-relaxation Use former value x (t) as well to calculate x (t+1) : Useful for Gaussians, not straightforward for other types of conditional distributions Suitable to speed up the process when variables are highly correlated

An Illustrative Example for Over-Relaxation State space for a bi-variate Gaussian for 40 iterations Over-relaxation samples better covers the state space

Reducing Random : Simulated Annealing Introduce a parameter T (temperature) and gradually reduce it to 1: High T corresponds to being able to make transitions easier As opposed to its use in optimization, T is not reduced to 0 but 1

Applications of Monte Carlo Differential Equations Ex: Steady-state temperature distribution of an annulus: The finite difference approximation using grid dimension h: With ¼ probability, one of the states is selected The value when a boundary is reached is kept, the mean of many such runs gives the solution

Applications of Monte Carlo Integration To evaluate: Bound the function with a box Ratio of points under function to all points within the box gives the ratio of the areas of the function and the box

Applications of Monte Carlo Image Processing :Automatic eye-glass removal: MCMC used instead of a gradient-based methods to solve MAP criterion that is used to locate points of eye-glasses [C. Wu, C. Liu, H.-Y. Shum, Y.-Q. Xu and Z. Zhang, “Automatic Eyeglasses Removal from Face Images”, IEEE Tran. On Pattern Analysis and Machine Intelligence, Vol.26, No.3, pp , Mar.2004]

Applications of Monte Carlo Image segmentation: Data-driven techniques such as edge detection, tracing and clustering combined with MCMC to speed up the search [Z. Tu and S.-C. Zhu, “Image Segmentation by Data-Driven Markov Chain Monte Carlo”, IEEE Tran. On Pattern Analysis and Machine Intelligence, Vol.24, No.5, pp , May 2002]

Forward Discrete Probability Propagation Rasit Onur Topaloglu Ph.D. candidate

The Problem The tree relates physicalparameters to circuit parameters Structured according to SPICE formula hierarchy Given pdf’s for lowest level parameters; find pdf’s at highest level Lowest Level: High level: Ex: gm=  (2*k*Id)

Motivation for Probability Propagation Find a novel propagation method Estimation of distributions of high level parameters needed to examine effects of process variations Gaussian assumption attributed to these parameters no longer accurate in current technologies GOALS Determinism : a stochastic output using known formulas Algebraic tractability : enabling manual applicability Speed & Accuracy : be comparable or outperform Monte Carlo and other parametric methods

Parametric Belief Propagation Each node receives and sends messages to parents and children until equilibrium Parent to child (  ) : causal information Parent to parent ( ) : diagnostic information Calculations handled at each node:

Parametric Belief Propagation When arrows in the hierarchy tree indicate linear addition operations on Gaussians, analytic formulations possible Not straightforward for other distributions or non- standard distributions

Shortcomings of Monte Carlo Non-determinism : Not manually applicable Limited for certain distributions : Random number generators in most CAD packages only provide certain distributions Accuracy : May miss points that are less likely to occur due to random sampling unless very large number of samples used; limited by the performance of random number generator

P1 P2 Monte Carlo – FDPP Comparison one-to-many relationships and custom pdf’s P3 P4 Non-standard pdf’s not possible without a custom random number generator Monte Carlo overestimates in one-to-many relationships as same sample is used

Operations Necessary to Implement FDPP F (Forward) : Given a function, estimates the distribution of next node in the formula hierarchy Q (Quantize) : Discretize a pdf to operate on its samples B (Bandpass) : Decrements number of samples for computational efficiency R (Re-bin) : Reduces number of samples for computational efficiency Analytic operation on continuous distributions difficult; instead operations on discrete distributions implemented :

T NSUB PHIf Necessary Operators (Q, F, B, R) on a Hierarchical Tree QQ F B,R Repeated until we acquire the high level distribution (ex. G)

spdf(X) or  (X) pdf(X) p-domainr-domain Probability Discretization Theory: Q N Operator; p and r domains Q N band-pass filter pdf(X) and divide into bins N in Q N indicates number or bins Certain operators easy to apply in r-domain

spdf(X) or  (X) r-domain Characterizing an spdf can write spdf(X) as : where : p i : probability for i’th impulse w i : value of i’th impulse

F Operator F operator implements a function over spdf’s spdf(X) or  (X) X i, Y : random variables p X s : Set of all samples s belonging to X Function applied to individual impulses Individual probabilities multiplied

Band-pass, B e, Operator Eliminate samples having values out of range Margin-based Definition: Error-based Definition: Eliminate samples having probabilities least likely to occur

Re-bin, R N, Operator Samples falling into the same bin congregated in one where : Impulses after F Unite into one  bin Resulting spdf(X)

Error Analysis Total distortion: Variance of quantization error: If quantizer uniform and  small, quantization error random variable Q is uniformly distributed Distortion caused by representing samples in a bin by a single sample: m i : center or i’th bin

Algorithm Implementing the F Operator While each random variable has its spdf computed For each rv. which has all ancestor spdf’s computed For each sample in X 1 For each sample in X r Place an impulse with height p 1,..,p r at x=f(v 1,..,v r ) Apply B and R algorithms to this rv.

Algorithm for the B and R Operators Divide this range into M bins For each bin Place a quantizing impulse at the center of the bin with a height p i equal to the sum of all impulses within bin Find maximum probability, p i-max, of quantized impulses within bins Find new maximum and minimum values w i within impulses Divide this range into N bins Find maximum and minimum values w i within impulses Eliminate impulses within bins which have a quantized impulse with smaller probability than error-rate*p i-max For each bin Place an impulse at the center of the bin with height equal to sum of all impulses within bin

A close match is observed after interpolation Monte Carlo – FDPP Comparison solid : FDPP dotted : Monte Carlo Pdf of V th Pdf of I D

Monte Carlo – FDPP Comparison with a Low Sample Number Monte Carlo inaccurate for moderate number of samples Indicates FDPP can be manually applied without major accuracy degradation solid : FDPP,100 samples Pdf of  F noisy : Monte Carlo, 1000 and samples respectively

Edges define a linear sum, ex: n5=n2+n3 Monte Carlo – FDPP Comparison Pdf of n7Benchmark example solid : FDPP dotted : Monte Carlo triangles:belief propagation

Distributions at internal nodes n4, n5, n6 should be re-sampled using Monte Carlo Faulty Application of Monte Carlo Not optimal for internal nodes with non-standard distributions Pdf of n7Benchmark example solid : FDPP dotted : Monte Carlo triangles:belief propagation

Conclusions Forward Discrete Probability Propagation is introduced as an alternative to Monte Carlo based methods FDPP should be preferred when low probability samples are important, algebraic intuition needed, non-standard pdf’s are present or one-to-many relationships are present