1 UNC, Stat & OR DWD in Face Recognition, (cont.) Interesting summary: Jump between means (in DWD direction) Clear separation of Maleness vs. Femaleness.

1 UNC, Stat & OR DWD in Face Recognition, (cont.) Interesting summary: Jump between means (in DWD direction) Clear separation of Maleness vs. Femaleness

2 UNC, Stat & OR DWD in Face Recognition, (cont.) Fun Comparison: Jump between means (in SVM direction) Also distinguishes Maleness vs. Femaleness But not as well as DWD

3 UNC, Stat & OR DWD in Face Recognition, (cont.) Analysis of difference: Project onto normals SVM has “small gap” (feels noise artifacts?) DWD “more informative” (feels real structure?)

HDLSS Discrim ’ n Simulations Main idea: Comparison of SVM (Support Vector Machine) DWD (Distance Weighted Discrimination) MD (Mean Difference, a.k.a. Centroid) Linear versions, across dimensions

HDLSS Discrim ’ n Simulations Overall Approach: Study different known phenomena –Spherical Gaussians –Outliers –Polynomial Embedding Common Sample Sizes But wide range of dimensions

HDLSS Discrim ’ n Simulations Spherical Gaussians:

HDLSS Discrim ’ n Simulations Spherical Gaussians: Same setup as before Means shifted in dim 1 only, All methods pretty good Harder problem for higher dimension SVM noticeably worse MD best (Likelihood method) DWD very close to MD Methods converge for higher dimension??

HDLSS Discrim ’ n Simulations Outlier Mixture:

HDLSS Discrim ’ n Simulations Outlier Mixture: 80% dim. 1, other dims 0 20% dim. 1 ±100, dim. 2 ±500, others 0 MD is a disaster, driven by outliers SVM & DWD are both very robust SVM is best DWD very close to SVM (insig ’ t difference) Methods converge for higher dimension?? Ignore RLR (a mistake)

HDLSS Discrim ’ n Simulations Wobble Mixture:

HDLSS Discrim ’ n Simulations Wobble Mixture: 80% dim. 1, other dims 0 20% dim. 1 ±0.1, rand dim ±100, others 0 MD still very bad, driven by outliers SVM & DWD are both very robust SVM loses (affected by margin push) DWD slightly better (by w ’ ted influence) Methods converge for higher dimension?? Ignore RLR (a mistake)

HDLSS Discrim ’ n Simulations Nested Spheres:

HDLSS Discrim ’ n Simulations Nested Spheres: 1 st d/2 dim ’ s, Gaussian with var 1 or C 2 nd d/2 dim ’ s, the squares of the 1 st dim ’ s (as for 2 nd degree polynomial embedding) Each method best somewhere MD best in highest d (data non-Gaussian) Methods not comparable (realistic) Methods converge for higher dimension?? HDLSS space is a strange place Ignore RLR (a mistake)

HDLSS Discrim ’ n Simulations Conclusions: Everything (sensible) is best sometimes DWD often very near best MD weak beyond Gaussian Caution about simulations (and examples): Very easy to cherry pick best ones Good practice in Machine Learning –“ Ignore method proposed, but read paper for useful comparison of others ”

HDLSS Discrim ’ n Simulations Caution: There are additional players E.g. Regularized Logistic Regression looks also very competitive Interesting Phenomenon: All methods come together in very high dimensions???

HDLSS Discrim ’ n Simulations Can we say more about: All methods come together in very high dimensions??? Mathematical Statistical Question: Mathematics behind this??? (will answer later)

SVM & DWD Tuning Parameter Main Idea: Handling of Violators (“Slack Variables”), Controlled by Tuning Parameter, C Larger C  Try Harder to Avoid Violation

SVM Tuning Parameter Recall Movie for SVM:

SVM & DWD Tuning Parameter Possible Approaches: Visually Tuned (Can be Effective, But Takes Time, Requires Expertise)

SVM & DWD Tuning Parameter Possible Approaches: Visually Tuned Simple Defaults DWD: 100 / median pairwise distance (Surprisingly Useful, Simple Answer) SVM: 1000 (Works Well Sometimes, Not Others)

SVM & DWD Tuning Parameter Possible Approaches: Visually Tuned Simple Defaults (Works Well for DWD, Less Effective for SVM)

SVM & DWD Tuning Parameter Possible Approaches: Visually Tuned Simple Defaults Cross Validation Measure Classification Error Rate Leaving Some Out (to Avoid Overfitting) Choose C to Minimize Error Rate

SVM & DWD Tuning Parameter Possible Approaches: Visually Tuned Simple Defaults Cross Validation (Very Popular – Useful for SVD, But Comes at Computational Cost)

SVM & DWD Tuning Parameter Possible Approaches: Visually Tuned Simple Defaults Cross Validation Scale Space (Work with Full Range of Choices)

Melanoma Data Study Differences Between (Malignant) Melanoma & (Benign) Nevi Use Image Features as Before (Recall from Transformation Discussion) Paper: Miedema et al (2012)

March 17, 2010, 26 Clinical diagnosis BackgroundIntroduction

March 17, 2010, 27 Image Analysis of Histology Slides GoalBackground Melanoma Image: www.melanoma.ca Benign 1 in 75 North Americans will develop a malignant melanoma in their lifetime. Initial goal: Automatically segment nuclei. Challenge: Dense packing of nuclei. Ultimately:Cancer grading and patient survival. Image: melanoma.blogsome.com

March 17, 2010, 28 Feature Extraction Features from Cell NucleiFeature Extraction Extract various features based on color and morphology Example “high-level” concepts: Stain intensity Nuclear area Density of nuclei Regularity of nuclear shape

March 17, 2010, 29 Labeled Nuclei Features from Cell NucleiFeature Extraction Conventional NevusSuperficial Spreading Melanoma

March 17, 2010, 30 Nuclear Regions Features from Cell NucleiFeature Extraction Conventional NevusSuperficial Spreading Melanoma Generated by growing nuclei out from boundary Used for various color and density features: Region Stain 2, Region Area Ratio, etc.

March 17, 2010, 31 Delaunay Triangulation Features from Cell NucleiFeature Extraction Conventional NevusSuperficial Spreading Melanoma Triangulation of nuclear centers Used for various density features: Mean Delaunay, Max. Delaunay, etc.

Melanoma Data Study Differences Between (Malignant) Melanoma & (Benign) Nevi Explore with PCA View

Melanoma Data PCA View

Melanoma Data Rotate To DWD Direction

Melanoma Data Rotate To DWD Direction “Good” Separation ???

Melanoma Data Rotate To DWD Direction Orthogonal PCs Avoid Strange Projections

Melanoma Data Return To PCA View And Focus On Subtypes

Melanoma Data Focus On Subtypes: Melanoma 1 Sev. Dys. Nevi Gray Out Others

Melanoma Data Rotate To Pairwise Only PCA

Melanoma Data Rotate To DWD & Ortho PCs

Melanoma Data Rotate To DWD & Ortho PCs Better Separation Than Full Data???

Melanoma Data Full Data DWD Direction “Good” Separation ???

Melanoma Data Challenge: Measure “Goodness of Separation” Approach from Signal Detection: Receiver Operator Characteristic (ROC) Curve

ROC Curve Challenge: Measure “Goodness of Separation” Approach from Signal Detection: Receiver Operator Characteristic (ROC) Curve Developed in WWII, History in Green and Swets (1966)

ROC Curve Challenge: Measure “Goodness of Separation” Approach from Signal Detection: Receiver Operator Characteristic (ROC) Curve Good Modern Treatment: DeLong, DeLong & Clarke-Pearson (1988)

ROC Curve Challenge: Measure “Goodness of Separation” Idea: For Range of Cutoffs Plot: Prop’n +1’s Smaller than Cutoff Vs. Prop’n -1s Smaller than Cutoff

ROC Curve Aim: Quantify “Overlap”

ROC Curve Aim: Quantify “Overlap” Approach Consider Series Of Cutoffs

ROC Curve Approach Consider Series Of Cutoffs

ROC Curve X-coord Is Prop’n Of Reds Smaller

ROC Curve X-coord Is Prop’n Of Reds Smaller Y-coord Is Prop’n Of Blues Smaller

ROC Curve Slide Cutoff To Trace Out Curve

ROC Curve Better Separation Is “More To Upper Left”

ROC Curve Summarize & Compare Using Area Under Curve (AUC)

ROC Curve Toy Example Perfect Separation

ROC Curve Toy Example Very Slight Overlap

ROC Curve Toy Example Little More Overlap

ROC Curve Toy Example More Overlap

ROC Curve Toy Example Much More Overlap

ROC Curve Toy Example Complete Overlap

ROC Curve Toy Example Complete Overlap AUC ≈ 0.5 Reflects “Coin Tossing”

ROC Curve Toy Example Can Reflect “Worse Than Coin Tossing”

ROC Curve Interpretation of AUC: Very Context Dependent Radiology: “> 70% has Predictive Usefulness” Bigger is Better

Melanoma Data Recall Question: Which Gives Better Separation of Melanoma vs. Nevi  DWD on All Melanoma vs. All Nevi  DWD on Melanoma 1 vs. Sev. Dys. Nevi

Melanoma Data Subclass DWD Direction

Melanoma Data Full Data DWD Direction

Melanoma Data Recall Question: Which Gives Better Separation of Melanoma vs. Nevi  DWD on All Melanoma vs. All Nevi  DWD on Melanoma 1 vs. Sev. Dys. Nevi

Melanoma Data Full Data ROC Analysis AUC = 0.93

Melanoma Data SubClass ROC Analysis AUC = 0.95 Better, Makes Intuitive Sense

Melanoma Data What About Other Subclasses? Looked at Several Best Separation Was: Melanoma 2 vs. Conventional Nevi

Melanoma Data Full Data PCA

Melanoma Data Full Data PCA Gray Out All But Subclasses

Melanoma Data Rotate to SubClass PCA

Melanoma Data Rotate to SubClass DWD

Melanoma Data ROC Analysis AUC = 0.99

Clustering Idea: Given data Assign each object to a class

Clustering Idea: Given data Assign each object to a class Of similar objects Completely data driven

Clustering Idea: Given data Assign each object to a class Of similar objects Completely data driven I.e. assign labels to data “Unsupervised Learning”

Clustering Idea: Given data Assign each object to a class Of similar objects Completely data driven I.e. assign labels to data “Unsupervised Learning” Contrast to Classification (Discrimination) With predetermined classes “Supervised Learning”

Clustering Important References: MacQueen (1967) Hartigan (1975) Gersho and Gray (1992) Kaufman and Rousseeuw (2005)

K-means Clustering Main Idea: for data Partition indices among classes

K-means Clustering Main Idea: for data Partition indices among classes Given index sets that partition

K-means Clustering Main Idea: for data Partition indices among classes Given index sets that partition represent clusters by “class means” i.e, (within class means)

K-means Clustering Given index sets Measure how well clustered, using Within Class Sum of Squares

K-means Clustering Common Variation: Put on scale of proportions (i.e. in [0,1])

K-means Clustering Common Variation: Put on scale of proportions (i.e. in [0,1]) By dividing “within class SS” by “overall SS”

K-means Clustering Common Variation: Put on scale of proportions (i.e. in [0,1]) By dividing “within class SS” by “overall SS” Gives Cluster Index:

K-means Clustering Notes on Cluster Index: CI = 0 when all data at cluster means

K-means Clustering Notes on Cluster Index: CI = 0 when all data at cluster means CI small when gives tight clustering (within SS contains little variation)

K-means Clustering Notes on Cluster Index: CI = 0 when all data at cluster means CI small when gives tight clustering (within SS contains little variation) CI big when gives poor clustering (within SS contains most of variation)

K-means Clustering Notes on Cluster Index: CI = 0 when all data at cluster means CI small when gives tight clustering (within SS contains little variation) CI big when gives poor clustering (within SS contains most of variation) CI = 1 when all cluster means are same

K-means Clustering Clustering Goal: Given data Choose classes To miminize

2-means Clustering Study CI, using simple 1-d examples Varying Standard Deviation

2-means Clustering

Study CI, using simple 1-d examples Varying Standard Deviation Varying Mean

2-means Clustering

Study CI, using simple 1-d examples Varying Standard Deviation Varying Mean Varying Proportion

2-means Clustering

Study CI, using simple 1-d examples Over changing Classes (moving b’dry)

2-means Clustering

Study CI, using simple 1-d examples Over changing Classes (moving b’dry) Multi-modal data  interesting effects –Multiple local minima (large number) –Maybe disconnected –Optimization (over ) can be tricky… (even in 1 dimension, with K = 2)

2-means Clustering

Study CI, using simple 1-d examples Over changing Classes (moving b’dry) Multi-modal data  interesting effects –Can have 4 (or more) local mins (even in 1 dimension, with K = 2)

2-means Clustering

Study CI, using simple 1-d examples Over changing Classes (moving b’dry) Multi-modal data  interesting effects –Local mins can be hard to find –i.e. iterative procedures can “get stuck” (even in 1 dimension, with K = 2)

2-means Clustering Study CI, using simple 1-d examples Effect of a single outlier?

2-means Clustering

Study CI, using simple 1-d examples Effect of a single outlier? –Can create local minimum –Can also yield a global minimum –This gives a one point class –Can make CI arbitrarily small (really a “good clustering”???)

1 UNC, Stat & OR DWD in Face Recognition, (cont.) Interesting summary: Jump between means (in DWD direction) Clear separation of Maleness vs. Femaleness.

Similar presentations

Presentation on theme: "1 UNC, Stat & OR DWD in Face Recognition, (cont.) Interesting summary: Jump between means (in DWD direction) Clear separation of Maleness vs. Femaleness."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 UNC, Stat & OR DWD in Face Recognition, (cont.) Interesting summary: Jump between means (in DWD direction) Clear separation of Maleness vs. Femaleness.

Similar presentations

Presentation on theme: "1 UNC, Stat & OR DWD in Face Recognition, (cont.) Interesting summary: Jump between means (in DWD direction) Clear separation of Maleness vs. Femaleness."— Presentation transcript:

Similar presentations

About project

Feedback