Download presentation
Presentation is loading. Please wait.
Published byCarmella Pearson Modified over 9 years ago
1
Chapter 20 Classification and Estimation
2
20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features should take on significantly different values for objects belonging to different classes. –Reliability. Features should take on similar values for all objects of the same class. –Independence. The various features used should be uncorrelated with each other. –Small numbers. The number of features should be small because the complexity of a pattern recognition system increases rapidly with the dimensionality of the system.
3
20.2 Classification Classifier design –Classifier design consists of establishing the logical structure of the classifier and the mathematical basis of the classification rule. Classifier Training –A group of known objects are used to train a classifier to determine its threshold values.
4
20.2 Classification –The training set is a collection of objects from each class that have been previously identified by some accurate method. –Training rule:minimizing an error function or a cost function. –Unrepresentative training set; Biased training set.
5
20.2 Classification –20.2.4 Measurement of performance A classifier accuracy can be directly estimated by classifying a known test set of objects. An alternative is to use a test set of known objects to estimate the PDFs of the features for objects belonging to each group. Using a different test set from the training set is a better approach to evaluate a classifier.
6
20.3 Feature selection Feature selection is the process of eliminating some features and combining others that are related, until the feature set becomes manageable and performance is still adequate. The brute force approach of feature selection.
7
20.3 Feature selection A training set containing objects from M different classes, let be the number of objects from class j, and, are two features obtained when the ith object in class j, the mean value of each feature is
8
20.3 Feature selection 20.3.1 Feature Variance –All objects within the same class should take on similar values. The variance of the features with class j is
9
20.3 Feature selection 20.3.2 Feature correlation –The correlation of the features x and y in class j is –A value of zero indicates that the two features are uncorrelated, while a value near 1 implies a high degree of correlation.
10
20.3 Feature Selection 20.3.3 Class separation distance –The variance-normalized distance between two class is where the two classes are j and k. –The greater the distance is, the better the feature is.
11
20.3 Feature selection 20.3.4 Dimension reduction –Many features can combine to form few number of features. –Linear combination. Two features x and y can produce a new feature z by this can be reduced to
12
20.3 Feature selection –This is a projection of (x,y) plane to line z. Class 1 Class 2 x y z
13
20.4 Statistical Classification 20.4.1 Statistical decision theory –An approach that makes classification by statistical method. The PDFs of features are assumed to be known –The PDFs of a feature may be estimated by measuring a large number of objects, and plotting a histogram of the feature.
14
20.4 Statistical Classification –20.4.1.1 A Priori Probabilities The a priori probabilities represent our knowledge about an object before it has been measured. The conditional probability is the probability of the event, when a given event occurs.
15
20.4 Statistical classification 20.4.1.2 Bayes’ Theorem –The a posteriori probability is the conditional probability, which means the probability of the object belongs to the class, when the feature occurs. –The Bayes’ Theorem (two classes)
16
20.4 Statistical classification Bayes’ theorem may be used to pattern classification. For example, when there are only 2 classes, a object is assigned to class 1 if This is equivalent to The classifier defined by this decision rule is called a maximum-likelihood classifier.
17
20.4 Statistical classification –If there are more than one features and the feature vector is, and suppose there are m classes, then Bayes’ theorem is Bayes’ Risk. The conditional risk is where is the cost (loss) of assigning an object to class i when it really belongs in class j.
18
20.4 Statistical classification –Bayes’ decision rule. Each object should be assigned to the class that produces the minimum conditional risk. The Bayes’ risk is –Parametric and Nonparametric classifier If the functional form of the conditional PDFs is known, but some parameters are unknown, the classifier is called parametric. If the functional form of some or all of the conditional PDFs is unknown, the classifier is called nonparametric.
19
20.4.3 Parameter estimation and classifier training The process of estimating the conditional PDFs o their parameters is refered to as training the classifier. Supervised and unsupervised training –Supervised training. The classes to which the objects in the training set is known. –Unsupervised training. The conditional PDFs are estimated using samples whose class is unknown.
20
20.4.3 Parameter estimation and classifier training Maximum-likelihood estimation –The maximum-likelihood estimation approach assumes that the parameters to be estimated are fixed but unknown. –The maximum-likelihood estimate of a parameter is the value that makes the occurrence of the observed training set most likely. –The Maximum-likelihood estimates of the mean and standard deviation of a normal distribution are the sample mean and sample standard deviation, respectively.
21
20.4.3.3 Bayesian Estimation –The Bayesian estimation treats the unknown parameter as a random variable, and it has a known a priori PDF before any samples are taken. –After the training set has been measured, Bayes’ theorem is used to update the a priori PDF, and this results in an a posterior PDF of the unknown parameter value. –The a posteriori PDF with a single narrow peak, centered on the true value of the parameter is desired.
22
20.4.3.3 Bayesian estimation An example of Bayesian estimation –Estimate the mean of a normal distribution with known variance. The a priori PDF is. –The functional form of the PDF of the unknown mean is assumed to be, this means that given a value for, we known. –Suppose represents the set of sample values obtained by measuring the training set.
23
20.4.3.3 Bayesian estimation –Bayes’ theorem gives the a posteriori PDF –What we really want is –For example, if has a single sharp peak at, it can be approximated as an impulse
24
20.4.3.3 Bayesian estimation –Then This means that is the best estimate of the unknown mean. –If has a relatively broad peak, then becomes a weighted average of many PDFs. –Both maximum-likelihood and Bayesian estimate the unknown mean at the mean of a large training set.
25
20.4.3.3 Bayesian estimation Steps of Bayesian estimation –1.Assume an a priori PDF for the unknown parameters; –2.collect samples values from the population by measuring the training set. –3.Use Bayes’ theorem to refine the a priori PDF into the a posteriori PDF –4.Form the joint density of x and the unknown parameter and integrate out the latter to leave the desired estimate of the PDF.
26
20.4.3.3 Bayesian estimation –If we have strong ideas about the probable values of the unknown parameter, we may assume a narrow a priori PDF, otherwise, we should assume a relatively broad PDF.
27
20.5 Neural Networks Layered feedforward neural networks where the activation function is usually a Sigmoidal function.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.