Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Principles of Density Estimation
COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Bayesian Decision Theory
Chapter 4: Linear Models for Classification
Classification and risk prediction
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
Lecture II-2: Probability Review
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
Statistical Decision Theory
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Statistical Methods II&III: Confidence Intervals ChE 477 (UO Lab) Lecture 5 Larry Baxter, William Hecker, & Ron Terry Brigham Young University.
1 E. Fatemizadeh Statistical Pattern Recognition.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Confidence Interval & Unbiased Estimator Review and Foreword.
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Sampling Theory and Some Important Sampling Distributions.
CLASSICAL NORMAL LINEAR REGRESSION MODEL (CNLRM )
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 1.31 Criteria for optimal reception of radio signals.
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Summarizing Data by Statistics
EE513 Audio Signals and Systems
The Naïve Bayes (NB) Classifier
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Learning From Observed Data
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Bayesian Decision Theory
Presentation transcript:

Chapter 20 Classification and Estimation

20.2 Classification – Feature selection Good feature have four characteristics: –Discrimination. Features should take on significantly different values for objects belonging to different classes. –Reliability. Features should take on similar values for all objects of the same class. –Independence. The various features used should be uncorrelated with each other. –Small numbers. The number of features should be small because the complexity of a pattern recognition system increases rapidly with the dimensionality of the system.

20.2 Classification Classifier design –Classifier design consists of establishing the logical structure of the classifier and the mathematical basis of the classification rule. Classifier Training –A group of known objects are used to train a classifier to determine its threshold values.

20.2 Classification –The training set is a collection of objects from each class that have been previously identified by some accurate method. –Training rule:minimizing an error function or a cost function. –Unrepresentative training set; Biased training set.

20.2 Classification – Measurement of performance A classifier accuracy can be directly estimated by classifying a known test set of objects. An alternative is to use a test set of known objects to estimate the PDFs of the features for objects belonging to each group. Using a different test set from the training set is a better approach to evaluate a classifier.

20.3 Feature selection Feature selection is the process of eliminating some features and combining others that are related, until the feature set becomes manageable and performance is still adequate. The brute force approach of feature selection.

20.3 Feature selection A training set containing objects from M different classes, let be the number of objects from class j, and, are two features obtained when the ith object in class j, the mean value of each feature is

20.3 Feature selection Feature Variance –All objects within the same class should take on similar values. The variance of the features with class j is

20.3 Feature selection Feature correlation –The correlation of the features x and y in class j is –A value of zero indicates that the two features are uncorrelated, while a value near 1 implies a high degree of correlation.

20.3 Feature Selection Class separation distance –The variance-normalized distance between two class is where the two classes are j and k. –The greater the distance is, the better the feature is.

20.3 Feature selection Dimension reduction –Many features can combine to form few number of features. –Linear combination. Two features x and y can produce a new feature z by this can be reduced to

20.3 Feature selection –This is a projection of (x,y) plane to line z. Class 1 Class 2 x y z

20.4 Statistical Classification Statistical decision theory –An approach that makes classification by statistical method. The PDFs of features are assumed to be known –The PDFs of a feature may be estimated by measuring a large number of objects, and plotting a histogram of the feature.

20.4 Statistical Classification – A Priori Probabilities The a priori probabilities represent our knowledge about an object before it has been measured. The conditional probability is the probability of the event, when a given event occurs.

20.4 Statistical classification Bayes’ Theorem –The a posteriori probability is the conditional probability, which means the probability of the object belongs to the class, when the feature occurs. –The Bayes’ Theorem (two classes)

20.4 Statistical classification Bayes’ theorem may be used to pattern classification. For example, when there are only 2 classes, a object is assigned to class 1 if This is equivalent to The classifier defined by this decision rule is called a maximum-likelihood classifier.

20.4 Statistical classification –If there are more than one features and the feature vector is, and suppose there are m classes, then Bayes’ theorem is Bayes’ Risk. The conditional risk is where is the cost (loss) of assigning an object to class i when it really belongs in class j.

20.4 Statistical classification –Bayes’ decision rule. Each object should be assigned to the class that produces the minimum conditional risk. The Bayes’ risk is –Parametric and Nonparametric classifier If the functional form of the conditional PDFs is known, but some parameters are unknown, the classifier is called parametric. If the functional form of some or all of the conditional PDFs is unknown, the classifier is called nonparametric.

Parameter estimation and classifier training The process of estimating the conditional PDFs o their parameters is refered to as training the classifier. Supervised and unsupervised training –Supervised training. The classes to which the objects in the training set is known. –Unsupervised training. The conditional PDFs are estimated using samples whose class is unknown.

Parameter estimation and classifier training Maximum-likelihood estimation –The maximum-likelihood estimation approach assumes that the parameters to be estimated are fixed but unknown. –The maximum-likelihood estimate of a parameter is the value that makes the occurrence of the observed training set most likely. –The Maximum-likelihood estimates of the mean and standard deviation of a normal distribution are the sample mean and sample standard deviation, respectively.

Bayesian Estimation –The Bayesian estimation treats the unknown parameter as a random variable, and it has a known a priori PDF before any samples are taken. –After the training set has been measured, Bayes’ theorem is used to update the a priori PDF, and this results in an a posterior PDF of the unknown parameter value. –The a posteriori PDF with a single narrow peak, centered on the true value of the parameter is desired.

Bayesian estimation An example of Bayesian estimation –Estimate the mean of a normal distribution with known variance. The a priori PDF is. –The functional form of the PDF of the unknown mean is assumed to be, this means that given a value for, we known. –Suppose represents the set of sample values obtained by measuring the training set.

Bayesian estimation –Bayes’ theorem gives the a posteriori PDF –What we really want is –For example, if has a single sharp peak at, it can be approximated as an impulse

Bayesian estimation –Then This means that is the best estimate of the unknown mean. –If has a relatively broad peak, then becomes a weighted average of many PDFs. –Both maximum-likelihood and Bayesian estimate the unknown mean at the mean of a large training set.

Bayesian estimation Steps of Bayesian estimation –1.Assume an a priori PDF for the unknown parameters; –2.collect samples values from the population by measuring the training set. –3.Use Bayes’ theorem to refine the a priori PDF into the a posteriori PDF –4.Form the joint density of x and the unknown parameter and integrate out the latter to leave the desired estimate of the PDF.

Bayesian estimation –If we have strong ideas about the probable values of the unknown parameter, we may assume a narrow a priori PDF, otherwise, we should assume a relatively broad PDF.

20.5 Neural Networks Layered feedforward neural networks where the activation function is usually a Sigmoidal function.