Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E

Slides:



Advertisements
Similar presentations
Copula Representation of Joint Risk Driver Distribution
Advertisements

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Sensitivity Analysis In deterministic analysis, single fixed values (typically, mean values) of representative samples or strength parameters or slope.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
CF-3 Bank Hapoalim Jun-2001 Zvi Wiener Computational Finance.
Simulation Modeling and Analysis
Computer vision: models, learning and inference
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Lecture II-2: Probability Review
Jointly distributed Random variables
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
Modeling and Simulation CS 313
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Lévy copulas: Basic ideas and a new estimation method J L van Velsen, EC Modelling, ABN Amro TopQuants, November 2013.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
HMM - Part 2 The EM algorithm Continuous density HMM.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Machine Learning 5. Parametric Methods.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Chapter 4: Basic Estimation Techniques
Chapter 3: Maximum-Likelihood Parameter Estimation
Basic simulation methodology
CH 5: Multivariate Methods
Classification of unlabeled data:
Clustering (3) Center-based algorithms Fuzzy k-means
Latent Variables, Mixture Models and EM
Outlier Discovery/Anomaly Detection
Unfolding Problem: A Machine Learning Approach
Introduction to Instrumentation Engineering
MEGN 537 – Probabilistic Biomechanics Ch.3 – Quantifying Uncertainty
SMEM Algorithm for Mixture Models
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
EE513 Audio Signals and Systems
EM for Inference in MV Data
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
EM for Inference in MV Data
Multivariate Methods Berlin Chen, 2005 References:
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Unfolding with system identification
Linear Discrimination
EM Algorithm 主講人:虞台文.
Probabilistic Surrogate Models
Maximum Likelihood Estimation (MLE)
Presentation transcript:

Maximum Likelihood Estimation of Mixture Densities for Binned and Truncated Multivariate Data Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E. McLaren, Machine Learning 2001 (to appear) O, Jangmin 2001/06/01

Introduction (1) Fitting mixture models to binned and truncated data by ML via EM. Binning measurement with finite resolution quantifying real-valued variables Truncation Motivation diagnostic evaluation of anemia volume of RBC, amount of hemoglobin : measured by cytometric blood cell counter (Bayer Corp.)

Figure 1

Introduction (2) Data in the form of histogram Binning Truncation Computer Vision, Massive data sets, … Binning Measurement Precision Truncation Limitation of the range of measurement, intentionally, … EM frame work Missing data: original data points.

Binned and Truncated Data Sample space v mutually exclusive regions Hr (r=1,…,v) Observation Only the number of nr of the Yj that fall in Hr (r=1,…,v0) is recorded (v0  v). Observed data vector : a is multinomial distribution

Observed log likelihood

Application of EM Algorithm : Missing Data Unobservable frequencies in the case of truncation. nr unobservable individuals in the rth region Yr. Complete Data vector

p(u|a;) can be specified… (negative binomial ?) p(a;) is specified p(u|a;) can be specified… (negative binomial ?) p(y1+,…, yv+|u, a; ) is specified Conditioning on u and a, yj+ is composed by independent nj sampling from the density

Application of EM Algorithm : Missing Data Then, complete data log-likelihood

Application of EM Algorithm : Mixture Model Extension to mixture model (g components) Conditional probability that Yrs belongs to i-th component given yrs. Final complete data log-likelihood Zero-one indicator variable

E-Step Calculation of Q(; (k)) expection over y1+,…,yv+ expection over u . Expectation of u given a …

M-Step i(k+1) update  = (1,…, g) : other parameters are adjusted to be…

M-Step for Normal Components Parameter update equation Practical implementation is more complex due to multinomial integrals.

Computational and Numerical Issues Integration can’t be evaluated analytically. m bins in univariate, O(md) in d-dimensional. O(i) evaluation in univariate integration, O(id) in d-dimensional Complex geometry. For fixed sample size, more sparser multivariate histogram Integrating methods Numerical Monte Carlo Romberg : Idea – repeated 1-dimensional integration.

Handling Truncated Regions A single bin No extra integration is needed.

3.3 The Complete EM Algorithm Treat the histogram as a PDF and draw a small number of data points from it Fit the mixture model using the standard EM algorithm (nonbinned , nontruncated) Using the parameter estimates from above, refine the estimate with the full EM algorithm applied to the binned and truncated data

4. Experimental Results with Simulated Data 3 experiments Generate data from a known PDF and then bin them (bivariate). Number of bin per dimension: 5 ~ 100 (step 5) 10 different samples for smoothing results. Standard EM on unbinned samples v.s. full EM on binned samples Estimation method: KL distance between true density v.s. 2 EMs

Experiment Setup To test the quality of the solution for different numbers of data points from Figure 4. Data points N : 100 ~ 1000 (step 10) (20 bin, 100 data, 10 samples) To test performance of the algorithm when the component densities are not so well separated. 3 apart components (20 bin, 20 separation, 10 samples) To test the performance of the algorithm when significant truncation occurs (20 bin, 100 positions, 10 samples)

4.2 Estimation from Random Samples Generated from the Binned Data Baseline approach Estimate PDF from a random sample from the binned data Uniform sampling estimation method Figure 6 : comparison Overestimates the variance Variance inflation

Figure 6 : Estimated PDFs obtained from original data and PDFs fitted by binned and the uniform random-sample algorithm for (a) 5 bins per dimension and (b) 10 per dimension. 3-covariance ellipse

4.3 Experiments with Different Sample Size Figure 7 As a function of number of bins and number of data points Bin > 20, data > 500 : small KL distance Figure 8 As a function of number of bins Bin (5 ~ 20): rapid decay, Bin > 20 : flat Figure 9 As a function of number of data Exponential decay

Figure 7 : (a) average KL distance between the estimated density and the true density, (b) standard deviation of the KL distance from10 repeated samples.

4.4 Experiments with Different Separations of Mixture Components Figure 10 As a function of number of bins and separation of mean Insensitive to separation of components Figure 11 As a function of separation of mean Ratio of KL distance of the standard and binned algorithm Small number of bin : standard EM is better. Small separation : binned EM is better Figure 12

4.5 Experiments with Truncation Figure 13 Function of ratio of truncated points Standard EM ignores the information of truncation Relatively insensitive to truncation, in binned EM Figure 14

Real Example : Red Blood Cell Data Medical diagnosis based on two-dimensional histograms characterizing RBC and hemoglobin measurements Mixture densities were fitted to histograms from 90 control subject and 82 subjects with iron deficient anemia B=1002, N=40,000 Using for discriminant rule Baseline features: 4-dim feature vector (mean, variance along RBC and hemoglobin) 11-dim features: two-component lognormal mixture models (mean, cov, mixing weight) 9-dim features: (mean, log-odds of eigenvalues of cov, mixing weight)

Figure 15. Contour plots from estimated density estimates for three control patients and three iron deficient anemia patients.

Conclusion Fitting mixture densities to multivariate binned and truncated data Computational and numerical implementation issues In 2-dim simulation, If number of bins exceeds 10 the loss of information from quantization is minimal.