Download presentation
Presentation is loading. Please wait.
Published byMarianna Waters Modified over 9 years ago
1
1 UNC, Stat & OR DWD in Face Recognition, (cont.) Interesting summary: Jump between means (in DWD direction) Clear separation of Maleness vs. Femaleness
2
2 UNC, Stat & OR DWD in Face Recognition, (cont.) Fun Comparison: Jump between means (in SVM direction) Also distinguishes Maleness vs. Femaleness But not as well as DWD
3
3 UNC, Stat & OR DWD in Face Recognition, (cont.) Analysis of difference: Project onto normals SVM has “small gap” (feels noise artifacts?) DWD “more informative” (feels real structure?)
4
HDLSS Discrim ’ n Simulations Main idea: Comparison of SVM (Support Vector Machine) DWD (Distance Weighted Discrimination) MD (Mean Difference, a.k.a. Centroid) Linear versions, across dimensions
5
HDLSS Discrim ’ n Simulations Overall Approach: Study different known phenomena –Spherical Gaussians –Outliers –Polynomial Embedding Common Sample Sizes But wide range of dimensions
6
HDLSS Discrim ’ n Simulations Spherical Gaussians:
7
HDLSS Discrim ’ n Simulations Spherical Gaussians: Same setup as before Means shifted in dim 1 only, All methods pretty good Harder problem for higher dimension SVM noticeably worse MD best (Likelihood method) DWD very close to MD Methods converge for higher dimension??
8
HDLSS Discrim ’ n Simulations Outlier Mixture:
9
HDLSS Discrim ’ n Simulations Outlier Mixture: 80% dim. 1, other dims 0 20% dim. 1 ±100, dim. 2 ±500, others 0 MD is a disaster, driven by outliers SVM & DWD are both very robust SVM is best DWD very close to SVM (insig ’ t difference) Methods converge for higher dimension?? Ignore RLR (a mistake)
10
HDLSS Discrim ’ n Simulations Wobble Mixture:
11
HDLSS Discrim ’ n Simulations Wobble Mixture: 80% dim. 1, other dims 0 20% dim. 1 ±0.1, rand dim ±100, others 0 MD still very bad, driven by outliers SVM & DWD are both very robust SVM loses (affected by margin push) DWD slightly better (by w ’ ted influence) Methods converge for higher dimension?? Ignore RLR (a mistake)
12
HDLSS Discrim ’ n Simulations Nested Spheres:
13
HDLSS Discrim ’ n Simulations Nested Spheres: 1 st d/2 dim ’ s, Gaussian with var 1 or C 2 nd d/2 dim ’ s, the squares of the 1 st dim ’ s (as for 2 nd degree polynomial embedding) Each method best somewhere MD best in highest d (data non-Gaussian) Methods not comparable (realistic) Methods converge for higher dimension?? HDLSS space is a strange place Ignore RLR (a mistake)
14
HDLSS Discrim ’ n Simulations Conclusions: Everything (sensible) is best sometimes DWD often very near best MD weak beyond Gaussian Caution about simulations (and examples): Very easy to cherry pick best ones Good practice in Machine Learning –“ Ignore method proposed, but read paper for useful comparison of others ”
15
HDLSS Discrim ’ n Simulations Caution: There are additional players E.g. Regularized Logistic Regression looks also very competitive Interesting Phenomenon: All methods come together in very high dimensions???
16
HDLSS Discrim ’ n Simulations Can we say more about: All methods come together in very high dimensions??? Mathematical Statistical Question: Mathematics behind this??? (will answer later)
17
SVM & DWD Tuning Parameter Main Idea: Handling of Violators (“Slack Variables”), Controlled by Tuning Parameter, C Larger C Try Harder to Avoid Violation
18
SVM Tuning Parameter Recall Movie for SVM:
19
SVM & DWD Tuning Parameter Possible Approaches: Visually Tuned (Can be Effective, But Takes Time, Requires Expertise)
20
SVM & DWD Tuning Parameter Possible Approaches: Visually Tuned Simple Defaults DWD: 100 / median pairwise distance (Surprisingly Useful, Simple Answer) SVM: 1000 (Works Well Sometimes, Not Others)
21
SVM & DWD Tuning Parameter Possible Approaches: Visually Tuned Simple Defaults (Works Well for DWD, Less Effective for SVM)
22
SVM & DWD Tuning Parameter Possible Approaches: Visually Tuned Simple Defaults Cross Validation Measure Classification Error Rate Leaving Some Out (to Avoid Overfitting) Choose C to Minimize Error Rate
23
SVM & DWD Tuning Parameter Possible Approaches: Visually Tuned Simple Defaults Cross Validation (Very Popular – Useful for SVD, But Comes at Computational Cost)
24
SVM & DWD Tuning Parameter Possible Approaches: Visually Tuned Simple Defaults Cross Validation Scale Space (Work with Full Range of Choices)
25
Melanoma Data Study Differences Between (Malignant) Melanoma & (Benign) Nevi Use Image Features as Before (Recall from Transformation Discussion) Paper: Miedema et al (2012)
26
March 17, 2010, 26 Clinical diagnosis BackgroundIntroduction
27
March 17, 2010, 27 Image Analysis of Histology Slides GoalBackground Melanoma Image: www.melanoma.ca Benign 1 in 75 North Americans will develop a malignant melanoma in their lifetime. Initial goal: Automatically segment nuclei. Challenge: Dense packing of nuclei. Ultimately:Cancer grading and patient survival. Image: melanoma.blogsome.com
28
March 17, 2010, 28 Feature Extraction Features from Cell NucleiFeature Extraction Extract various features based on color and morphology Example “high-level” concepts: Stain intensity Nuclear area Density of nuclei Regularity of nuclear shape
29
March 17, 2010, 29 Labeled Nuclei Features from Cell NucleiFeature Extraction Conventional NevusSuperficial Spreading Melanoma
30
March 17, 2010, 30 Nuclear Regions Features from Cell NucleiFeature Extraction Conventional NevusSuperficial Spreading Melanoma Generated by growing nuclei out from boundary Used for various color and density features: Region Stain 2, Region Area Ratio, etc.
31
March 17, 2010, 31 Delaunay Triangulation Features from Cell NucleiFeature Extraction Conventional NevusSuperficial Spreading Melanoma Triangulation of nuclear centers Used for various density features: Mean Delaunay, Max. Delaunay, etc.
32
Melanoma Data Study Differences Between (Malignant) Melanoma & (Benign) Nevi Explore with PCA View
33
Melanoma Data PCA View
34
Melanoma Data Rotate To DWD Direction
35
Melanoma Data Rotate To DWD Direction “Good” Separation ???
36
Melanoma Data Rotate To DWD Direction Orthogonal PCs Avoid Strange Projections
37
Melanoma Data Return To PCA View And Focus On Subtypes
38
Melanoma Data Focus On Subtypes: Melanoma 1 Sev. Dys. Nevi Gray Out Others
39
Melanoma Data Rotate To Pairwise Only PCA
40
Melanoma Data Rotate To DWD & Ortho PCs
41
Melanoma Data Rotate To DWD & Ortho PCs Better Separation Than Full Data???
42
Melanoma Data Full Data DWD Direction “Good” Separation ???
43
Melanoma Data Challenge: Measure “Goodness of Separation” Approach from Signal Detection: Receiver Operator Characteristic (ROC) Curve
44
ROC Curve Challenge: Measure “Goodness of Separation” Approach from Signal Detection: Receiver Operator Characteristic (ROC) Curve Developed in WWII, History in Green and Swets (1966)
45
ROC Curve Challenge: Measure “Goodness of Separation” Approach from Signal Detection: Receiver Operator Characteristic (ROC) Curve Good Modern Treatment: DeLong, DeLong & Clarke-Pearson (1988)
46
ROC Curve Challenge: Measure “Goodness of Separation” Idea: For Range of Cutoffs Plot: Prop’n +1’s Smaller than Cutoff Vs. Prop’n -1s Smaller than Cutoff
47
ROC Curve Aim: Quantify “Overlap”
48
ROC Curve Aim: Quantify “Overlap” Approach Consider Series Of Cutoffs
49
ROC Curve Approach Consider Series Of Cutoffs
50
ROC Curve X-coord Is Prop’n Of Reds Smaller
51
ROC Curve X-coord Is Prop’n Of Reds Smaller Y-coord Is Prop’n Of Blues Smaller
52
ROC Curve Slide Cutoff To Trace Out Curve
53
ROC Curve Slide Cutoff To Trace Out Curve
54
ROC Curve Slide Cutoff To Trace Out Curve
55
ROC Curve Slide Cutoff To Trace Out Curve
56
ROC Curve Better Separation Is “More To Upper Left”
57
ROC Curve Summarize & Compare Using Area Under Curve (AUC)
58
ROC Curve Toy Example Perfect Separation
59
ROC Curve Toy Example Very Slight Overlap
60
ROC Curve Toy Example Little More Overlap
61
ROC Curve Toy Example More Overlap
62
ROC Curve Toy Example Much More Overlap
63
ROC Curve Toy Example Complete Overlap
64
ROC Curve Toy Example Complete Overlap AUC ≈ 0.5 Reflects “Coin Tossing”
65
ROC Curve Toy Example Can Reflect “Worse Than Coin Tossing”
66
ROC Curve Interpretation of AUC: Very Context Dependent Radiology: “> 70% has Predictive Usefulness” Bigger is Better
67
Melanoma Data Recall Question: Which Gives Better Separation of Melanoma vs. Nevi DWD on All Melanoma vs. All Nevi DWD on Melanoma 1 vs. Sev. Dys. Nevi
68
Melanoma Data Subclass DWD Direction
69
Melanoma Data Full Data DWD Direction
70
Melanoma Data Recall Question: Which Gives Better Separation of Melanoma vs. Nevi DWD on All Melanoma vs. All Nevi DWD on Melanoma 1 vs. Sev. Dys. Nevi
71
Melanoma Data Full Data ROC Analysis AUC = 0.93
72
Melanoma Data SubClass ROC Analysis AUC = 0.95 Better, Makes Intuitive Sense
73
Melanoma Data What About Other Subclasses? Looked at Several Best Separation Was: Melanoma 2 vs. Conventional Nevi
74
Melanoma Data Full Data PCA
75
Melanoma Data Full Data PCA Gray Out All But Subclasses
76
Melanoma Data Rotate to SubClass PCA
77
Melanoma Data Rotate to SubClass DWD
78
Melanoma Data ROC Analysis AUC = 0.99
79
Clustering Idea: Given data Assign each object to a class
80
Clustering Idea: Given data Assign each object to a class Of similar objects Completely data driven
81
Clustering Idea: Given data Assign each object to a class Of similar objects Completely data driven I.e. assign labels to data “Unsupervised Learning”
82
Clustering Idea: Given data Assign each object to a class Of similar objects Completely data driven I.e. assign labels to data “Unsupervised Learning” Contrast to Classification (Discrimination) With predetermined classes “Supervised Learning”
83
Clustering Important References: MacQueen (1967) Hartigan (1975) Gersho and Gray (1992) Kaufman and Rousseeuw (2005)
84
K-means Clustering Main Idea: for data Partition indices among classes
85
K-means Clustering Main Idea: for data Partition indices among classes Given index sets that partition
86
K-means Clustering Main Idea: for data Partition indices among classes Given index sets that partition represent clusters by “class means” i.e, (within class means)
87
K-means Clustering Given index sets Measure how well clustered, using Within Class Sum of Squares
88
K-means Clustering Common Variation: Put on scale of proportions (i.e. in [0,1])
89
K-means Clustering Common Variation: Put on scale of proportions (i.e. in [0,1]) By dividing “within class SS” by “overall SS”
90
K-means Clustering Common Variation: Put on scale of proportions (i.e. in [0,1]) By dividing “within class SS” by “overall SS” Gives Cluster Index:
91
K-means Clustering Notes on Cluster Index: CI = 0 when all data at cluster means
92
K-means Clustering Notes on Cluster Index: CI = 0 when all data at cluster means CI small when gives tight clustering (within SS contains little variation)
93
K-means Clustering Notes on Cluster Index: CI = 0 when all data at cluster means CI small when gives tight clustering (within SS contains little variation) CI big when gives poor clustering (within SS contains most of variation)
94
K-means Clustering Notes on Cluster Index: CI = 0 when all data at cluster means CI small when gives tight clustering (within SS contains little variation) CI big when gives poor clustering (within SS contains most of variation) CI = 1 when all cluster means are same
95
K-means Clustering Clustering Goal: Given data Choose classes To miminize
96
2-means Clustering Study CI, using simple 1-d examples Varying Standard Deviation
97
2-means Clustering
107
Study CI, using simple 1-d examples Varying Standard Deviation Varying Mean
108
2-means Clustering
120
Study CI, using simple 1-d examples Varying Standard Deviation Varying Mean Varying Proportion
121
2-means Clustering
135
Study CI, using simple 1-d examples Over changing Classes (moving b’dry)
136
2-means Clustering
146
Study CI, using simple 1-d examples Over changing Classes (moving b’dry) Multi-modal data interesting effects –Multiple local minima (large number) –Maybe disconnected –Optimization (over ) can be tricky… (even in 1 dimension, with K = 2)
147
2-means Clustering
148
Study CI, using simple 1-d examples Over changing Classes (moving b’dry) Multi-modal data interesting effects –Can have 4 (or more) local mins (even in 1 dimension, with K = 2)
149
2-means Clustering
150
Study CI, using simple 1-d examples Over changing Classes (moving b’dry) Multi-modal data interesting effects –Local mins can be hard to find –i.e. iterative procedures can “get stuck” (even in 1 dimension, with K = 2)
151
2-means Clustering Study CI, using simple 1-d examples Effect of a single outlier?
152
2-means Clustering
163
Study CI, using simple 1-d examples Effect of a single outlier? –Can create local minimum –Can also yield a global minimum –This gives a one point class –Can make CI arbitrarily small (really a “good clustering”???)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.