EM for Inference in MV Data

Slides:



Advertisements
Similar presentations
Point Estimation Notes of STAT 6205 by Dr. Fan.
Advertisements

Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
The General Linear Model. The Simple Linear Model Linear Regression.
Segmentation and Fitting Using Probabilistic Methods
Visual Recognition Tutorial
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Parametric Inference.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Inference about a Mean Part II
Lecture 7 1 Statistics Statistics: 1. Model 2. Estimation 3. Hypothesis test.
5-3 Inference on the Means of Two Populations, Variances Unknown
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Random Sampling, Point Estimation and Maximum Likelihood.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
HMM - Part 2 The EM algorithm Continuous density HMM.
1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.
CLASS: B.Sc.II PAPER-I ELEMENTRY INFERENCE. TESTING OF HYPOTHESIS.
1 Inferences about a Mean Vector Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Chapter 5 Statistical Inference Estimation and Testing Hypotheses.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Tutorial I: Missing Value Analysis
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Pairwise comparisons: Confidence intervals Multiple comparisons Marina Bogomolov and Gili Baumer.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Applied statistics Usman Roshan.
Inference about the slope parameter and correlation
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Statistical Estimation
STATISTICS POINT ESTIMATION
STATISTICAL INFERENCE
Probability Theory and Parameter Estimation I
Classification of unlabeled data:
Parameter, Statistic and Random Samples
When we free ourselves of desire,
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
CONCEPTS OF ESTIMATION
Bayesian Models in Machine Learning
Statistical Assumptions for SLR
Mathematical Foundations of BME Reza Shadmehr
EC 331 The Theory of and applications of Maximum Likelihood Method
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
The Multivariate Normal Distribution, Part 2
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
EM for Inference in MV Data
Hypothesis Testing.
Factor Analysis BMTRY 726 7/19/2018.
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Clinical prediction models
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Lecture 4: Likelihoods and Inference
Lecture 4: Likelihoods and Inference
Presentation transcript:

EM for Inference in MV Data BMTRY 726 6/7/2018

Last Note on Hypothesis Testing and Confidence Regions/Intervals Recall we started with some data on n individuals where we assume X1, X2,…, Xn ~NID(m,S) We then said we could develop hypothesis tests and confidence regions based on the statistic if we understand the distribution of T2

For Large n When n is large This implies Therefore

Large n When n is large, we worry less about the normality assumption and the distribution of T 2 ~ So if we want to test We reject H0 if We can use this to get an approximate confidence region for m. Only assumption is that Xi iid and m and S exist.

Missing Observations Missing data are common in studies collecting multivariate data on individuals So what happens if we have the following data

Missing Observations Types of missing data: MCAR: missing completely at random -missing observations independent of the actual measurement MAR: missing at random -missingness depends on observed values but not the missing values NMAR: not missing at random -informative, we can’t ignore why the data are missing

Missing Observations No single uniform method to deal with missing data. -complete case analysis -replace missing with sample mean -multiple imputation Easiest to work with the MCAR assumption Imputation: sometimes we do impute missing values. This can be “dangerous” as the variance will often be underestimated. Also often is used for missing values which can bias the results towards rejecting the null.

Basic EM Framework EM is useful when maximum likelihood calculations are easy BUT there are some things we don’t know… -Have a model for complete data X with associated pdf with unknown parameter q -BUT we don’t observe all of X -We want to maximize the observed-data likelihood with respect to q *Note: we are assuming data are missing at random!

Key Ideas of EM Compute the expectation of the complete data log-likelihood conditional on the current estimate of parameters Maximize the resulting log-likelihood to obtain the next estimate of the parameters Iterate to some level of convergence

Key Ideas of EM EM for exponential families: -Compute the expectation of sufficient statistics conditional on the current estimate of the parameters -Use the resulting estimates of the sufficient statistics to re-estimate -Iterate to some level of convergence.

EM and Univariate Normal Suppose Begin by making an initial guess

EM and Univariate Normal E-step k: compute expectations of sufficient statistics, conditional on Xobs and the current parameter estimate M-step k: maximize the resulting log-likelihood

Multivariate Normal Missing Data Say we have: If we impute missing valued of BP and LDL separately, we loose the correlation structure Imputation using the EM algorithm allows us to include correlation among the X’s. We make an initial guess about q and apply our guess to the missing data. We repeat this until convergence.

EM Algorithm for MVN In MVN data, the EM algorithm is based on the sufficient statistics

EM Algorithm for MVN First estimate based on the data presented E-Step: For each vector , use the estimates of the mean of the conditional distribution of and to estimate the missing values These estimates can be calculated based on properties of a MVN distribution…

EM Algorithm for MVN E-Step: Estimates of missing values

EM Algorithm for MVN E-Step: Estimates of missing values These are used to find the sufficient statistics T1 and T2

EM Algorithm for MVN M-Step: Compute the revised maximum likelihood estimate based on our sufficient statistics from the E-step.

Example Consider an observed sample of 4 subjects and 3 X ’s

Example Update missing values in row 1

Example Update missing values in row 3

Example So what is our updated X after our first iteration?

Example Now estimate the sufficient statistics Lets start with T1

Example Now find T2

Example Revise the MLE estimates using the sufficient statistics Use these estimates and go through the algorithm again

Programming an EM algorithm Start with pseudo-code… what information do you need to capture and what output do you expect?