G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

Unsupervised Learning
EM Algorithm Jur van den Berg.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
Supervised Learning Recap
3 rd Place Winning Project, 2009 USPROC Author: Kinjal Basu Sujayam Saha Sponsor Professor: S. Ghosh A.K. Ghosh Indian Statistical Institute, Kolkata,
Chapter 4: Linear Models for Classification
Segmentation and Fitting Using Probabilistic Methods
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
Statistical Topic Modeling part 1
Visual Recognition Tutorial
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Clustering.
Gaussian Mixture Example: Start After First Iteration.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Visual Recognition Tutorial
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
Gaussian Mixture Models and Expectation Maximization.
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Isolated-Word Speech Recognition Using Hidden Markov Models
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
E XPECTATION M AXIMIZATION M EETS S AMPLING IN M OTIF F INDING Zhizhuo Zhang.
Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)
Roghayeh parsaee  These approaches assume that the study sample arises from a homogeneous population  focus is on relationships among variables 
B AYESIAN L EARNING & G AUSSIAN M IXTURE M ODELS Jianping Fan Dept of Computer Science UNC-Charlotte.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Parameter Learning with Hidden Variables & Expectation Maximization.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.
Information Bottleneck versus Maximum Likelihood Felix Polyakov.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
For multivariate data of a continuous nature, attention has focussed on the use of multivariate normal components because of their computational convenience.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Latent Variables, Mixture Models and EM
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
POINT ESTIMATOR OF PARAMETERS
Mathematical Foundations of BME Reza Shadmehr
10701 / Machine Learning Today: - Cross validation,
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
EE513 Audio Signals and Systems
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Clustering (2) & EM algorithm
Presentation transcript:

G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009

O UTLINE Classifying (Musical) Data: The Audio Mess Statistical Principles Gaussian Mixture Models Maximum Likelihood Estimation: EM Algorithm Applications to Music Conclusions

C LASSIFYING D ATA : T HE A UDIO M ESS Melody Timbre

S TATISTICAL P RINCIPLES Gaussian (Normal) Distribution is a continuous probability distribution that describes data that cluster around a mean. The probability density function provides a theoretical estimate of a sample of data.

T HE G AUSSIAN M IXTURE M ODEL A GMM can be understood simply as a number of Gaussians introduced into a population of data in order to classify each of the possible sample clusters, each of which could refer to our classes (timbre, melody, etc.). The mixture densities must be decomposed.

M AXIMUM L IKELIHOOD E STIMATION : T HE P ROBLEM How do you determine the weights of each of the Gaussian distributions? Maximum likelihood (ML) estimation is a method for fitting a statistical model to the data. It roughly corresponds to least squares. Standard Error = √∑(x-µ) 2 ML estimates require a priori information about class weights, information that isn’t known in GMM.

E XPECTATION -M AXIMIZATION A LGORITHM EM is an iterative procedure consisting of two processes: 1. E-step: the missing data are estimated given the observed data and the current estimate of the model parameters 2. M-step: The likelihood function is maximized under the assumption that the missing data are known (thanks to the E-step). At each iteration the algorithm converges toward the ML estimate. EM Example

A PPLICATIONS TO MIR Instrument Classification (Marques et al 1999) Sound segments.2 seconds in length 3 features 1. Linear prediction features 2. Cepstral features 3. Mel cepstral features Results: The mel cepstral feature set gave the best results, with an overall error rate of 37%.

A PPLICATIONS TO MIR Melodic Lines (Marolt 2004) Marolt employed a GMM to classify and extract melodic lines from an Aretha Franklin recording of “Respect” using only pitch information. The EM algorithm honed in on the dominant pitch in the observed PDF. For lead vocals, the GMM classified with an accuracy of.93.

C ONCLUSIONS GMMs provide a common method for the classification of data. The importance of choosing relevant features cannot be overestimated. What do we do with outliers, i.e. nonparametric data?