Classification with several populations Presented by: Libin Zhou.

Classification with several populations Presented by: Libin Zhou

Classification procedure Minimum Expected Cost of Misclassification Method (ECM)  The ECM for two populations is: Where: P is the conditional probability; p is the prior probability; c is the cost of misclassification  The ECM for multiple populations could be:

Minimum ECM classification Rule Result 11.5 on page 614.  When is smallest, assigning x to population k could minimize the ECM. If misclassification costs are equal, the rule could be simplified as Or

Maximum posterior probability Rule The posterior probability is =P(x comes from population k given that x was observed) for k=1,2,…,g This rule is the generalization of the largest posterior probability rule for two populations classification (Equation (11-9))

Classification with Normal population When the populations are multivariate normal distribution, the term in the minimum ECM classification rule with equal misclassification costs (Equation(11-41)) could be written by Then we get Where d is the quadratic discrimination score and i=1,2,…,p

Minimum total probability of misclassification (TPM) rule for normal populations with different If the quadratic discrimination score then x would be allocated to population k

Estimated Minimum (TPM) rule for several normal populations with different In practice, the and are usually unknown, but a training set of correctly classified observations is often available for the construction of estimates. The relevant sample quantities for population i are and Then the estimated could be written by i=1,2,…,g

The estimated minimum TPM rule for equal- covariance normal population If the covariance of the several populations are equal, then the quadratic discrimination score could be simplified into an estimate of a linear discriminant score based on the pooled estimate of the covariance. We can also define a new variable: Generalized Squared Distance Then the sample discriminant score could be written by

Example 11.11. Classifying a potential business-school graduate student Introduction: the admission officer of a business school has used an “index” of undergraduate grade point average (GPA) and graduate management aptitude test (GMAT) scores to help decide which applicants should be admitted to the school’s graduate programs. Analysis:  Populations: Pop1—admit; Pop2—do not admit; Pop3— borderline  Variable: x1—GPA; x2—GMAT Question: Allocating a new applicant with variables (3.21,497) using sample discriminant scores

Resolution 1) calculate the mean values for each populations 2) calculate the pooled covariance 3) calculate the sample squared distances using the sample squared distance function Where i=1,2,3 4) Results: =2.58; =17.10; =2.47 From the rule of assigning x to the “closest” population, the new application should be assigned to population 3, borderline.

Classification with several populations Presented by: Libin Zhou.

Similar presentations

Presentation on theme: "Classification with several populations Presented by: Libin Zhou."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Classification with several populations Presented by: Libin Zhou.

Similar presentations

Presentation on theme: "Classification with several populations Presented by: Libin Zhou."— Presentation transcript:

Similar presentations

About project

Feedback