Download presentation
Presentation is loading. Please wait.
1
Discrimination and Classification
2
Discrimination & Classification
Discrimination: Goal is to separate individual cases into known population groups by their measurements on several variables. Makes use of graphical and algebraic methods. Goal is to separate groups as much as possible based on numeric values Referred to as “Separation” Classification: Observing new cases along with numeric values and assigning them to groups based on their numeric values Makes use of an algorithm generated on known cases and applies it to new cases whose population is unknown. Referred to as “Allocation”
3
Notation and Concepts Notation: Populations ≡ p1, p2 Measured Variables: X Conceptual Settings of Unknown Population: Incomplete Knowledge of Outcome: The outcome is in future and cannot be observed when X is measured Destruction Necessary to Observe Outcome: A product must be destroyed to observe quality status. Unavailable or Expensive Assessments of Outcome: Authorship unknown or assessment by expensive gold standard may be needed
4
Setting up a Discriminant Function
Prior Probabilities for the 2 Populations – Assumes knowledge of relative population sizes. Will tend to classify individual cases into the “larger” population unless strong evidence in favor of “smaller” population. Misclassification Cost – Is cost of misclassification same for objects from each of the populations? Probability Density Functions – The distributions of the numeric variables for the elements of the 2 populations. Population 1: f1(x) Population 2: f2(x) Classification Regions – Given an observations’ x values, it will be assigned to Population 1 or 2, R1 ≡ {x} s.t. an observation is classified to Population 1, R2 ≡ W – R1 is the set of x where it is classified to Population 2
5
Mathematical Notation
6
Regions that Minimize Expected Cost of Misclassification
7
Allocation of New Observation x0 to Population
8
Normal Populations with Equal S
9
Sample Based Discrimination
10
Fisher’s Method for 2 Populations
11
Classification of Multivariate Normal Populations when S1 ≠ S2
12
Evaluation of Classification Functions
13
Jacknife Cross-Validation (Lauchenbruch’s Holdout Method)
For Population 1, remove each observation 1-at-time and fit the classifier based on all (n1-1)+n2 remaining cases. Classify the hold-out case. Repeat for all n1 cases from Population 1. n1m(H) ≡ # misclassified as p2 Repeat for all n2 cases from Population 2. n2m(H) ≡ # misclassified as p1
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.