Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Statistical modelling and latent variables.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
A Tutorial on Learning with Bayesian Networks
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Bayesian Estimation in MARK
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Bayesian statistics 2 More on priors plus model choice.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Exercise 1: Example of latent variable - occupancy inference The Hobbiton council has recently expanded the hobbit-lands further west. However, it turned.
Bayesian statistics – MCMC techniques
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Visual Recognition Tutorial
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Thanks to Nir Friedman, HU
Maximum likelihood (ML)
Standard error of estimate & Confidence interval.
A Practical Course in Graphical Bayesian Modeling; Class 1 Eric-Jan Wagenmakers.
Crash Course on Machine Learning
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Bayes Factor Based on Han and Carlin (2001, JASA).
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
PBG 650 Advanced Plant Breeding
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Statistical modelling and latent variables. Constructing models based on insight and motivation.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Bayesian statistics Probabilities for everything.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
CS Statistical Machine learning Lecture 24
Lecture 2: Statistical learning primer for biologists
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Inen 460 Lecture 2. Estimation (ch. 6,7) and Hypothesis Testing (ch.8) Two Important Aspects of Statistical Inference Point Estimation – Estimate an unknown.
Sampling and estimation Petter Mostad
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Rao-Blackwellised Particle Filtering for Dynamic Bayesian Network Arnaud Doucet Nando de Freitas Kevin Murphy Stuart Russell.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Data Modeling Patrice Koehl Department of Biological Sciences
Bayesian data analysis
Bayesian statistics So far we have thought of probabilities as the long term “success frequency”: #successes / #trails → P(success). In Bayesian statistics.
Introducing Bayesian Approaches to Twin Data Analysis
From last time: on-policy vs off-policy Take an action Observe a reward Choose the next action Learn (using chosen action) Take the next action Off-policy.
Latent Variables, Mixture Models and EM
Hidden Markov chain models (state space model)
Filtering and State Estimation: Basic Concepts
Statistical NLP: Lecture 4
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
LECTURE 07: BAYESIAN ESTIMATION
Learning From Observed Data
Mathematical Foundations of BME
CS639: Data Management for Data Science
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Statistical modelling and latent variables (2) Mixing latent variables and parameters in statistical inference

State spaces We typically have a parametric model for the latent variables, representing the true state of a system. Also, the distribution of the observations may depend on parameters as well as latent variables. Observations may often be seen as noisy versions of the actual state of a system.  L D Examples of states could be: 1.The physical state of a rocket (position, orientation, velocity, fuel-state). 2.Real water temperature (as opposed to measured temperature). 3.Occupancy in an area. 4.Carrying capacity in an area. Use green arrows for one-way parametric dependency (for which you don’t provide a probability distribution in frequentist statistics).

Observations, latent variables and parameters - inference Sometimes we are interested in the parameters, sometimes in the state of the latent variables, sometimes both. Impossible to do inference on the latent variables without also dealing with the parameters and vice versa.  L D Often, other parameters affect the latent variables than the observations. DD L D LL

Observations, latent variables and parameters ML estimation A latent variable model will specify the distribution of the latent variables given the parameters and the distribution of the observations given both the parameters and the latent variables. This will give the distribution of data *and* latent variables: f(D,L|  )=f(L|  )f(D|L,  ) But in an ML analysis, we want the likelihood, f(D|  )! Theory (law of total probability again):  L D

Observations, latent variables and parameters ML estimation Likelihood: The integral can often not be obtained analytically. In occupancy, the sum is easy (only two possible states) Kalman filter: For latent variables as linear normal Markov chains with normal observations depending linearly on them, this can be done analytically. Alternative when analytical methods fail: numerical integration, particle filters, Bayesian statistics using MCMC.

Occupancy as a state-space model – the model in words Assume a set areas, i  (1,…,A). Each area has a set of n i transects. Each transect has an independent detection probability, p, given the occupancy. Occupancy is a latent variable for each area,  i. Assume independency between the occupancy state in different areas. The probability of occupancy is labelled . So, the parameters are  =(p,  ). Pr(  i =1|  )= . Start with distribution of observations given the latent variable: Pr(x i,j =1 |  i =1,  )=p. Pr(x i,j =0 |  i =1,  )=1-p, Pr(x i,j =1 |  i =0,  )=0. Pr(x i,j =0 |  i =0,  )=1. So, for 5 transects with outcome 00101, we will get Pr(00101 |  i =1,  )=(1-p)(1-p)p(1-p)p=p 2 (1-p) 3. Pr(00101 |  i =0,  )=1  1  0  1  0=0

Occupancy as a state-space model – graphic model  One latent variable per area (area occupancy) 11 22 33 ……… AA x 1,1 x 1,2 x 1,3 ……… x 1,n1 p Parameters (  ): Pr(  i =1|  )=  Pr(  i =0|  )=1-  Pr(x i,j =1 |  i =1,  )=p. Pr(x i,j =0 |  i =1,  )=1-p, Pr(x i,j =1 |  i =0,  )=0. Pr(x i,j =0 |  i =0,  )=1 The detections are independent *conditioned* on the occupancy. Important to keep such things in mind when modelling! PS: What we’ve done so far is enough to start analyzing using WinBUGS. The area occupancies are independent. Data: Detections in single transects.  =occupancy rate p=detection rate given occupancy.

Occupancy as a state-space model – probability distribution for a set of transects Probability for a set of transects to give k i >0 detections in a given order is while with no detections We can represent this more compactly if we introduce the identification function. I(A)=1 if A is true. I(A)=0 if A is false. Then With no given order on the k i detection, we pick up the binomial coefficient: (Not relevant at all for inference. The for a given dataset, the constant is just “sitting” there.)

Occupancy as a state-space model – area-specific marginal detection probability (likelihood) For a given area with an unknown occupancy state, the detection probability will then be (law of tot. prob.): Occupancy (p=0.6,  =0.6) Occupancy is a zero-inflated binomial model Binomial (p=0.6)

Occupancy as a state-space model – full likelihood Each area is independent, so the full likelihood is: We can now do inference on the parameters,  =(p,  ), using ML estimation (or using Bayesian statistics).

Occupancy as a state-space model – occupancy inference Inference on  i, given the parameters,  (Bayes theorem): PS: We pretend that  is known here. However,  is estimated from the data and is not certain at all. We are using data twice! Once to estimate  and once to do inference on the latent variables. Avoided in a Bayesian setting.