Download presentation
Presentation is loading. Please wait.
Published byAdele Taylor Modified over 9 years ago
1
Classification with several populations Presented by: Libin Zhou
2
Classification procedure Minimum Expected Cost of Misclassification Method (ECM) The ECM for two populations is: Where: P is the conditional probability; p is the prior probability; c is the cost of misclassification The ECM for multiple populations could be:
3
Minimum ECM classification Rule Result 11.5 on page 614. When is smallest, assigning x to population k could minimize the ECM. If misclassification costs are equal, the rule could be simplified as Or
4
Maximum posterior probability Rule The posterior probability is =P(x comes from population k given that x was observed) for k=1,2,…,g This rule is the generalization of the largest posterior probability rule for two populations classification (Equation (11-9))
5
Classification with Normal population When the populations are multivariate normal distribution, the term in the minimum ECM classification rule with equal misclassification costs (Equation(11-41)) could be written by Then we get Where d is the quadratic discrimination score and i=1,2,…,p
6
Minimum total probability of misclassification (TPM) rule for normal populations with different If the quadratic discrimination score then x would be allocated to population k
7
Estimated Minimum (TPM) rule for several normal populations with different In practice, the and are usually unknown, but a training set of correctly classified observations is often available for the construction of estimates. The relevant sample quantities for population i are and Then the estimated could be written by i=1,2,…,g
8
The estimated minimum TPM rule for equal- covariance normal population If the covariance of the several populations are equal, then the quadratic discrimination score could be simplified into an estimate of a linear discriminant score based on the pooled estimate of the covariance. We can also define a new variable: Generalized Squared Distance Then the sample discriminant score could be written by
9
Example 11.11. Classifying a potential business-school graduate student Introduction: the admission officer of a business school has used an “index” of undergraduate grade point average (GPA) and graduate management aptitude test (GMAT) scores to help decide which applicants should be admitted to the school’s graduate programs. Analysis: Populations: Pop1—admit; Pop2—do not admit; Pop3— borderline Variable: x1—GPA; x2—GMAT Question: Allocating a new applicant with variables (3.21,497) using sample discriminant scores
10
Resolution 1) calculate the mean values for each populations 2) calculate the pooled covariance 3) calculate the sample squared distances using the sample squared distance function Where i=1,2,3 4) Results: =2.58; =17.10; =2.47 From the rule of assigning x to the “closest” population, the new application should be assigned to population 3, borderline.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.