Download presentation
Presentation is loading. Please wait.
Published byMadeline Nash Modified over 9 years ago
1
Object Orie’d Data Analysis, Last Time OODA in Image Analysis –Landmarks, Boundary Rep ’ ns, Medial Rep ’ ns Mildly Non-Euclidean Spaces –M-rep data on manifolds –Geodesic Mean –Principal Geodesic Analysis –Limitations - Cautions
2
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA, PGA, … Classification (i. e. Discrimination) –Understanding 2+ populations Time Series of Data Objects –Chemical Spectra, Mortality Data
3
Classification - Discrimination Background: Two Class (Binary) version: Using “ training data ” from Class +1 and Class -1 Develop a “ rule ” for assigning new data to a Class Canonical Example: Disease Diagnosis New Patients are “ Healthy ” or “ Ill ” Determined based on measurements
4
Classification - Discrimination Important Distinction: Classification vs. Clustering Classification: Class labels are known, Goal: understand differences Clustering: Goal: Find class labels (to be similar) Both are about clumps of similar data, but much different goals
5
Classification - Discrimination Important Distinction: Classification vs. Clustering Useful terminology: Classification: supervised learning Clustering: unsupervised learning
6
Classification - Discrimination Terminology: For statisticians, these are synonyms For biologists, classification means: Constructing taxonomies And sorting organisms into them (maybe this is why discrimination was used, until politically incorrect … )
7
Classification (i.e. discrimination) There are a number of: Approaches Philosophies Schools of Thought Too often cast as: Statistics vs. EE - CS
8
Classification (i.e. discrimination) EE – CS variations: Pattern Recognition Artificial Intelligence Neural Networks Data Mining Machine Learning
9
Classification (i.e. discrimination) Differing Viewpoints: Statistics Model Classes with Probability Distribut ’ ns Use to study class diff ’ s & find rules EE – CS Data are just Sets of Numbers Rules distinguish between these Current thought: combine these
10
Classification (i.e. discrimination) Important Overview Reference: Duda, Hart and Stork (2001) Too much about neural nets??? Pizer disagrees … Update of Duda & Hart (1973)
11
Classification (i.e. discrimination) For a more classical statistical view: McLachlan (2004). Likelihood theory, etc. Not well tuned to HDLSS data
12
Classification Basics Personal Viewpoint: Point Clouds
13
Classification Basics Simple and Natural Approach: Mean Difference a.k.a. Centroid Method Find “ skewer through two meatballs ”
14
Classification Basics For Simple Toy Example: Project On MD & split at center
15
Classification Basics Why not use PCA? Reasonable Result? Doesn ’ t use class labels … Good? Bad?
16
Classification Basics Harder Example (slanted clouds):
17
Classification Basics PCA for slanted clouds: PC1 terrible PC2 better? Still misses right dir ’ n Doesn ’ t use Class Labels
18
Classification Basics Mean Difference for slanted clouds: A little better? Still misses right dir ’ n Want to account for covariance
19
Classification Basics Mean Difference & Covariance, Simplest Approach: Rescale (standardize) coordinate axes i. e. replace (full) data matrix: Then do Mean Difference Called “ Na ï ve Bayes Approach ”
20
Classification Basics Na ï ve Bayes Reference: Domingos & Pazzani (1997) Most sensible contexts: Non-comparable data E.g. different units
21
Classification Basics Problem with Na ï ve Bayes: Only adjusts Variances Not Covariances Doesn ’ t solve this problem
22
Classification Basics Better Solution: Fisher Linear Discrimination Gets the right dir ’ n How does it work?
23
Fisher Linear Discrimination Other common terminology (for FLD): Linear Discriminant Analysis (LDA) Original Paper: Fisher (1936)
24
Fisher Linear Discrimination Careful development: Useful notation (data vectors of length ): Class +1: Class -1: Centerpoints: and
25
Fisher Linear Discrimination Covariances, for (outer products) Based on centered, normalized data matrices: Note: use “ MLE ” version of estimated covariance matrices, for simpler notation
26
Fisher Linear Discrimination Major Assumption: Class covariances are the same (or “ similar ” ) Like this: Not this:
27
Fisher Linear Discrimination Good estimate of (common) within class cov? Pooled (weighted average) within class cov: based on the combined full data matrix:
28
Fisher Linear Discrimination Note: is similar to from before I.e. covariance matrix ignoring class labels Important Difference: Class by Class Centering Will be important later
29
Fisher Linear Discrimination Simple way to find “ correct cov. adjustment ” : Individually transform subpopulations so “ spherical ” about their means For define
30
Fisher Linear Discrimination Then: In Transformed Space, Best separating hyperplane is Perpendicular bisector of line between means
31
Fisher Linear Discrimination In Transformed Space, Separating Hyperplane has: Transformed Normal Vector: Transformed Intercept: Sep. Hyperp. has Equation:
32
Fisher Linear Discrimination Thus discrimination rule is: Given a new data vector, Choose Class +1 when: i.e. (transforming back to original space) where:
33
Fisher Linear Discrimination So (in orig ’ l space) have separ ’ ting hyperplane with: Normal vector: Intercept:
34
Fisher Linear Discrimination Relationship to Mahalanobis distance Idea: For, a natural distance measure is: “ unit free ”, i.e. “ standardized ” essentially mod out covariance structure Euclidean dist. applied to & Same as key transformation for FLD I.e. FLD is mean difference in Mahalanobis space
35
Classical Discrimination Above derivation of FLD was: Nonstandard Not in any textbooks(?) Nonparametric (don ’ t need Gaussian data) I.e. Used no probability distributions More Machine Learning than Statistics
36
Classical Discrimination FLD Likelihood View Assume: Class distributions are multivariate for strong distributional assumption + common covariance
37
Classical Discrimination FLD Likelihood View (cont.) At a location, the likelihood ratio, for choosing between Class +1 and Class -1, is: where is the Gaussian density with covariance
38
Classical Discrimination FLD Likelihood View (cont.) Simplifying, using the Gaussian density: Gives (critically using common covariances):
39
Classical Discrimination FLD Likelihood View (cont.) But: so: Thus when i.e.
40
Classical Discrimination FLD Likelihood View (cont.) Replacing, and by maximum likelihood estimates:, and Gives the likelihood ratio discrimination rule: Choose Class +1, when Same as above, so: FLD can be viewed as Likelihood Ratio Rule
41
Classical Discrimination FLD Generalization I Gaussian Likelihood Ratio Discrimination (a. k. a. “ nonlinear discriminant analysis ” ) Idea: Assume class distributions are Different covariances! Likelihood Ratio rule is straightf ’ d num ’ l calc. (thus can easily implement, and do discrim ’ n)
42
Classical Discrimination Gaussian Likelihood Ratio Discrim ’ n (cont.) No longer have separ ’ g hyperplane repr ’ n (instead regions determined by quadratics) (fairly complicated case-wise calculations) Graphical display: for each point, color as: Yellow if assigned to Class +1 Cyan if assigned to Class -1 (intensity is strength of assignment)
43
Classical Discrimination FLD for Tilted Point Clouds – Works well
44
Classical Discrimination GLR for Tilted Point Clouds – Works well
45
Classical Discrimination FLD for Donut – Poor, no plane can work
46
Classical Discrimination GLR for Donut – Works well (good quadratic)
47
Classical Discrimination FLD for X – Poor, no plane can work
48
Classical Discrimination GLR for X – Better, but not great
49
Classical Discrimination Summary of FLD vs. GLR: Tilted Point Clouds Data –FLD good –GLR good Donut Data –FLD bad –GLR good X Data –FLD bad –GLR OK, not great Classical Conclusion: GLR generally better (will see a different answer for HDLSS data)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.