Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.

Slides:



Advertisements
Similar presentations
Copula Representation of Joint Risk Driver Distribution
Advertisements

Contrastive Divergence Learning
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Bayesian Estimation in MARK
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Part 24: Bayesian Estimation 24-1/35 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Use of moment generating functions. Definition Let X denote a random variable with probability density function f(x) if continuous (probability mass function.
The General Linear Model. The Simple Linear Model Linear Regression.
Markov Chains 1.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
Visual Recognition Tutorial
Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Visual Recognition Tutorial
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Today Today: Chapter 8 Assignment: 5-R11, 5-R16, 6-3, 6-5, 8-2, 8-8 Recommended Questions: 6-1, 6-2, 6-4, , 8-3, 8-5, 8-7 Reading: –Sections 8.1,
Correlations and Copulas Chapter 10 Risk Management and Financial Institutions 2e, Chapter 10, Copyright © John C. Hull
The Monte Carlo Method: an Introduction Detlev Reiter Research Centre Jülich (FZJ) D Jülich
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Bayes Factor Based on Han and Carlin (2001, JASA).
Material Model Parameter Identification via Markov Chain Monte Carlo Christian Knipprath 1 Alexandros A. Skordos – ACCIS,
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Stochastic Monte Carlo methods for non-linear statistical inverse problems Benjamin R. Herman Department of Electrical Engineering City College of New.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Use of moment generating functions 1.Using the moment generating functions of X, Y, Z, …determine the moment generating function of W = h(X, Y, Z, …).
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
Markov Random Fields Probabilistic Models for Images
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Lecture 11 Pairs and Vector of Random Variables Last Time Pairs of R.Vs. Marginal PMF (Cont.) Joint PDF Marginal PDF Functions of Two R.Vs Expected Values.
Monte-Carlo method for Two-Stage SLP Lecture 5 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
Tracking Multiple Cells By Correspondence Resolution In A Sequential Bayesian Framework Nilanjan Ray Gang Dong Scott T. Acton C.L. Brown Department of.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
John Lafferty Andrew McCallum Fernando Pereira
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
STOCHASTIC HYDROLOGY Stochastic Simulation of Bivariate Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo methods --the final project of stat 6213
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Correlations and Copulas
Hidden Markov Autoregressive Models
EC 331 The Theory of and applications of Maximum Likelihood Method
Ch13 Empirical Methods.
Learning From Observed Data
Presentation transcript:

Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna

This presentation is based on the paper “Improved Cross-Entropy Method for Estimation” By Dirk P.Kroese & Joshua C.Chan This presentation is based on the paper “Improved Cross-Entropy Method for Estimation” By Dirk P.Kroese & Joshua C.Chan

Rare Events Estimation

We wish to estimate - Random vector taking values in some set Function on

Rare Events Estimation We can rewrite it as - And estimate with a crude Monte Carlo

Rare Events Estimation Lets say, for example, that Direct Calculation Simulated

Rare Events Estimation

Importance Sampling

And the importance sampling estimator will be

Importance Sampling What would be a good choice for the importance density

Importance Sampling We shall take a look at the Kullback Leibler divergence: The zero variance density = The density from the family of with parameter

CE Algorithm In the article, 2 problematic issues were mentioned regarding the multilevel CE: The parametric family within which the optimal importance density g is obtained might not be large enough when the dimension of the problem is large, the likelihood ratio involved in obtaining becomes unstable. In the article, 2 problematic issues were mentioned regarding the multilevel CE: The parametric family within which the optimal importance density g is obtained might not be large enough when the dimension of the problem is large, the likelihood ratio involved in obtaining becomes unstable. Importance Sampling

Solution Sample directly from g*

Importance Sampling Our goal is to find Stochastic Version Deterministic Version

Importance Sampling But how the hell are we supposed to sample from ? ? ?

Importance Sampling This observation grants us the opportunity to apply the useful tool of gibbs sampling.

Gibbs Sampler In Brief

an algorithm to generate a sequence of samples from the joint probability distribution Gibbs sampling is a special case of the Metropolis–Hastings algorithm, and thus an example of a Markov chain Monte Carlo algorithm Gibbs sampling is applicable when the joint distribution is not known explicitly, but the conditional distribution of each variable is known It can be shown that the sequence of samples constitutes a Markov chain, and the stationary distribution of that Markov chain is just the sought-after joint distribution an algorithm to generate a sequence of samples from the joint probability distribution Gibbs sampling is a special case of the Metropolis–Hastings algorithm, and thus an example of a Markov chain Monte Carlo algorithm Gibbs sampling is applicable when the joint distribution is not known explicitly, but the conditional distribution of each variable is known It can be shown that the sequence of samples constitutes a Markov chain, and the stationary distribution of that Markov chain is just the sought-after joint distribution Gibbs Sampler In Brief

Gibbs Sampler In Brief The Gibbs sampler algorithm Given Generate Return

Improved Cross Entropy

Improved Cross Entropy The Improved CE consists of 3 steps: 1. Generate via gibbs sampler, N RVs 2. Solve 3. Estimate

Improved Cross Entropy Considerwhere and we would like to estimate under the improved cross entropy scheme.

Improved Cross Entropy Lets set and imply the new proposed algorithm

Improved Cross Entropy Step 1 – generate RVs from First we need to find

Improved Cross Entropy Step 1 – generate RVs from cont. Set Generate Set For

Improved Cross Entropy Step 2 – Solve the optimization problem

Improved Cross Entropy Step 3 – Estimate via importance sampling

Improved Cross Entropy Multilevel CE Vs. Improved CE

Improved Cross Entropy

CE N= iterations Total budget CE N= iterations Total budget Gibbs Sampler 10 parallel chains Each has 1000 length Total budget Gibbs Sampler In Brief

Obligors Probability of the obligor to default for a given threshold Monetary loss if the obligor defaults

t Copula Model

Known methods for the rare event estimation Exponential Change of MeasureHazard Rate Twisting Bounded relative errorLogarithmically efficient Needs to generate RVs from non standard distribution 10 times more variance reduction then ECM

The Improved CE for Estimating the Prob. of a Rare Loss

Step I – Sampling from g*

Sampling From g* Now we will show how we find the conditional probabilities of g* to apply the gibbs sampler For generating RVs from g*

Sampling From g* Define and arrange them is ascending order Let denote the ordered value and the corresponding loss Then the event occurs iff where Via Inverse Transform

Sampling From g* Define and arrange them is ascending order Let denote the ordered value and the corresponding loss Then the event occurs iff where Via Inverse Transform

Sampling From g* Multivariate truncated normal distribution Sequentially draw from if then else

After we got we are ready to move to the next step…

Step II – Solving the Opt. Problem

Solving Opt. Problem In our model

Solving Opt. Problem Since any member of the group is a product of densities, standard techniques of maximum likelihood estimation can be applied to find the optimal v*.

Solving Opt. Problem Once we obtain the optimal importance density we are moving to step 3

Step III – Importance Sampling

Importence Sampling

Some Results

Pros and Cons Improved CE Pros Rare events 3 basic steps Appropriate in multi dimension settings Fewer simulation effort then the Multi level CE Pros Rare events 3 basic steps Appropriate in multi dimension settings Fewer simulation effort then the Multi level CE Cons Problematic in general performance function not trivial Gibbs sampler requires warm up time Cons Problematic in general performance function not trivial Gibbs sampler requires warm up time

Further research Gibbs sampler for the general performance function Applying Sequential Monte Carlo Methods for sampling from g* Gibbs sampler for the general performance function Applying Sequential Monte Carlo Methods for sampling from g*