Quadratic Classifiers (QC) J.-S. Roger Jang ( 張智星 ) CS Dept., National Taiwan Univ. 2010 Scientific Computing.

Slides:



Advertisements
Similar presentations
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Advertisements

Clustering Beyond K-means
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
K Means Clustering , Nearest Cluster and Gaussian Mixture
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.
Classification and risk prediction
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Neural Networks: A Statistical Pattern Recognition Perspective
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
PatReco: Bayes Classifier and Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Performance Evaluation: Estimation of Recognition rates J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ.
PatReco: Discriminant Functions for Gaussians Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with.
Crash Course on Machine Learning
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
PCA & LDA for Face Recognition
Principles of Pattern Recognition
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.
Maximum Likelihood Estimation
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Review of statistical modeling and probability theory Alan Moses ML4bio.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
Maximum Likelihood Estimate Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Usman Roshan CS 675 Machine Learning
Maximum Likelihood Estimate
Chapter 3: Maximum-Likelihood Parameter Estimation
Quadratic Classifiers (QC)
Probability Theory and Parameter Estimation I
Bayesian Rule & Gaussian Mixture Models
CH 5: Multivariate Methods
Overview of Supervised Learning
Clustering Evaluation The EM Algorithm
Latent Variables, Mixture Models and EM
Classification Discriminant Analysis
Classification Discriminant Analysis
Lecture 26: Faces and probabilities
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
More about Posterior Distributions
Probabilistic Models with Latent Variables
Statistical Models for Automatic Speech Recognition
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Naive Bayes Classifiers (NBC)
Linear Discrimination
Probabilistic Surrogate Models
Presentation transcript:

Quadratic Classifiers (QC) J.-S. Roger Jang ( 張智星 ) CS Dept., National Taiwan Univ Scientific Computing

2 Bayes Classifier Bayes classifier  A probabilistic framework for classification problem Conditional probability Bayes theorem 2015/12/4 2

2010 Scientific Computing /12/4 3 PDF Modeling Goal: Find a PDF (probability density function) that can best describe a given dataset Steps: Select a class of parameterized PDF Identify the parameters via MLE (maximum likelihood estimate) based on a given set of sample data Commonly used PDFs: Multi-dimensional Gaussian PDF Gaussian mixture models (GMM)

2010 Scientific Computing /12/4 4 PDF Modeling for Classification Procedure for classification based on PDF Training stage: PDF modeling of each class based on the training dataset Test stage: For each entry in the test dataset, pick the class with the max. PDF Commonly used classifiers: Quadratic classifier (n-dim. Gaussian PDF) Gaussian-mixture-model classifier (GMM PDF)

2010 Scientific Computing /12/4 5 1D Gaussian PDF Modeling 1D Gaussian PDF: MLE of  and  Detailed derivation

2010 Scientific Computing /12/4 6 1D Gaussian PDF Modeling via MLE MLE: Maximum Likelihood Estimate Given a set of observations, find the parameters of the PDF such that the overall likelihood is maximized. Detailed derivation

2010 Scientific Computing /12/4 7 1D Gaussian PDF Modeling via MLE Normal dist. estimated by normal dist. Uniform dist. estimated by normal dist.

2010 Scientific Computing /12/4 8 d-dim. Gaussian PDF Modeling d-dim. Gaussian PDF g(x, ,  ) MLE of  and  : Detailed derivation

2010 Scientific Computing /12/4 9 d-dim. Gaussian PDF Modeling d-dim. Gaussian PDF g(x, ,  ) Likelihood of x in class j (governed by g(x,  j,  j ))

2010 Scientific Computing /12/4 10 2D Gaussian PDF Bivariate normal density: Density functionContours

2010 Scientific Computing 11 2D Gaussian PDF Modeling gaussianMle.m 2015/12/4 11

2010 Scientific Computing 12 Steps of QC Training stage Select a type of Gaussian PDF Identify the PDF of each class Test stage Assign each sample to the class with the highest PDF value 2015/12/4 12

2010 Scientific Computing 13 Characteristics of QC If each class is modeled by an Gaussian PDF, the decision boundary between any two classes is a quadratic function. That is why it is called quadratic classifier. How to prove it? Different selections of the covariance matrix: Constant times an identity matrix Diagonal matrix Full matrix (hard to use if the input dimension is large) 2015/12/4 13

2010 Scientific Computing 14 QC Results on Iris Dataset (I) Dataset: IRIS dataset with the last two inputs 2015/12/4 14

2010 Scientific Computing 15 QC Results on Iris Dataset(II) PDF for each class: 2015/12/4 15

2010 Scientific Computing 16 QC Results on Iris Dataset (III) Decision boundaries among classes: 2015/12/4 16