Object Orie’d Data Analysis, Last Time Distance Weighted Discrimination: Revisit microarray data Face Data Outcomes Data Simulation Comparison.

Slides:



Advertisements
Similar presentations
Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.
Advertisements

STOR 892 Object Oriented Data Analysis Radial Distance Weighted Discrimination Jie Xiong Advised by Prof. J.S. Marron Department of Statistics and Operations.
HDLSS Asy’s: Geometrical Represent’n Assume, let Study Subspace Generated by Data Hyperplane through 0, ofdimension Points are “nearly equidistant to 0”,
G. Alonso, D. Kossmann Systems Group
Dimension reduction (1)
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Interesting Statistical Problem For HDLSS data: When clusters seem to appear E.g. found by clustering method How do we know they are really there? Question.
SigClust Gaussian null distribution - Simulation Now simulate from null distribution using: where (indep.) Again rotation invariance makes this work (and.
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Affine-invariant Principal Components Charlie Brubaker and Santosh Vempala Georgia Tech School of Computer Science Algorithms and Randomness Center.
Dimensional reduction, PCA
Data Basics. Data Matrix Many datasets can be represented as a data matrix. Rows corresponding to entities Columns represents attributes. N: size of the.
Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Object Orie’d Data Analysis, Last Time Mildly Non-Euclidean Spaces Strongly Non-Euclidean Spaces –Tree spaces –No Tangent Plane Classification - Discrimination.
Unsupervised Learning
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Statistical Shape Models Eigenpatches model regions –Assume shape is fixed –What if it isn’t? Faces with expression changes, organs in medical images etc.
The Multivariate Normal Distribution, Part 1 BMTRY 726 1/10/2014.
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
Separate multivariate observations
Object Orie’d Data Analysis, Last Time Finished Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp. gives rotation,
Object Orie’d Data Analysis, Last Time OODA in Image Analysis –Landmarks, Boundary Rep ’ ns, Medial Rep ’ ns Mildly Non-Euclidean Spaces –M-rep data on.
Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed.
Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: More NCI 60 Data.
Object Orie’d Data Analysis, Last Time
Sample classification using Microarray Data. AB We have two sample entities malignant vs. benign tumor patient responding to drug vs. patient resistant.
Support Vector Machines Graphical View, using Toy Example:
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Object Orie’d Data Analysis, Last Time Discrimination for manifold data (Sen) –Simple Tangent plane SVM –Iterated TANgent plane SVM –Manifold SVM Interesting.
Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: Detailed (math ’
1 UNC, Stat & OR DWD in Face Recognition, (cont.) Interesting summary: Jump between means (in DWD direction) Clear separation of Maleness vs. Femaleness.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
SWISS Score Nice Graphical Introduction:. SWISS Score Toy Examples (2-d): Which are “More Clustered?”
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Sampling and estimation Petter Mostad
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Thurs., Early, Oct., Nov.,
1 UNC, Stat & OR ??? Place ??? Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina January.
Principal Component Analysis (PCA)
Point Distribution Models Active Appearance Models Compilation based on: Dhruv Batra ECE CMU Tim Cootes Machester.
Joint Moments and Joint Characteristic Functions.
1 UNC, Stat & OR U. C. Davis, F. R. G. Workshop Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research, University of North.
Object Orie’d Data Analysis, Last Time
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, I J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Next Week, Early, Oct.,
GWAS Data Analysis. L1 PCA Challenge: L1 Projections Hard to Interpret (i.e. Little Data Insight) Solution: 1)Compute PC Directions Using L1 2)Compute.
Object Orie’d Data Analysis, Last Time Reviewed Clustering –2 means Cluster Index –SigClust When are clusters really there? Q-Q Plots –For assessing Goodness.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, III J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Kernel Embedding Polynomial Embedding, Toy Example 3: Donut FLD Good Performance (Slice of Paraboloid)
Recall Flexibility From Kernel Embedding Idea HDLSS Asymptotics & Kernel Methods.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Distance Weighted Discrim ’ n Based on Optimization Problem: For “Residuals”:
SigClust Statistical Significance of Clusters in HDLSS Data When is a cluster “really there”? Liu et al (2007), Huang et al (2014)
Object Orie’d Data Analysis, Last Time DiProPerm Test –Direction – Projection – Permutation –HDLSS hypothesis testing –NCI 60 Data –Particulate Matter.
Return to Big Picture Main statistical goals of OODA:
Object Orie’d Data Analysis, Last Time
Object Orie’d Data Analysis, Last Time
Maximal Data Piling MDP in Increasing Dimensions:
Participant Presentations
HDLSS Discrimination Mean Difference (Centroid) Method Same Data, Movie over dim’s.
Principal Component Analysis
Feature space tansformation methods
Presentation transcript:

Object Orie’d Data Analysis, Last Time Distance Weighted Discrimination: Revisit microarray data Face Data Outcomes Data Simulation Comparison

2 UNC, Stat & OR Twiddle ratios of subtypes

3 UNC, Stat & OR Why not adjust by means? DWD robust against non-proportional subtypes… Mathematical Statistical Question: Are there mathematics behind this? (will answer next time…)

Distance Weighted Discrim ’ n Maximal Data Piling

HDLSS Discrim ’ n Simulations Main idea: Comparison of SVM (Support Vector Machine) DWD (Distance Weighted Discrimination) MD (Mean Difference, a.k.a. Centroid) Linear versions, across dimensions

HDLSS Discrim ’ n Simulations Conclusions: Everything (sensible) is best sometimes DWD often very near best MD weak beyond Gaussian Caution about simulations (and examples): Very easy to cherry pick best ones Good practice in Machine Learning –“ Ignore method proposed, but read paper for useful comparison of others ”

HDLSS Discrim ’ n Simulations Can we say more about: All methods come together in very high dimensions??? Mathematical Statistical Question: Mathematics behind this??? (will answer now)

HDLSS Asymptotics Modern Mathematical Statistics:  Based on asymptotic analysis  I.e. Uses limiting operations  Almost always  Occasional misconceptions:  Indicates behavior for large samples  Thus only makes sense for “large” samples  Models phenomenon of “increasing data”  So other flavors are useless???

HDLSS Asymptotics Modern Mathematical Statistics:  Based on asymptotic analysis  Real Reasons:  Approximation provides insights  Can find simple underlying structure  In complex situations  Thus various flavors are fine: Even desirable! (find additional insights)

HDLSS Asymptotics: Simple Paradoxes For dim’al Standard Normal dist’n: Euclidean Distance to Origin (as ):

HDLSS Asymptotics: Simple Paradoxes As, -Data lie roughly on surface of sphere, with radius - Yet origin is point of highest density??? - Paradox resolved by: density w. r. t. Lebesgue Measure

HDLSS Asymptotics: Simple Paradoxes For dim’al Standard Normal dist’n: indep. of Euclidean Dist. Between and (as ): Distance tends to non-random constant:

HDLSS Asymptotics: Simple Paradoxes Distance tends to non-random constant: Factor, since Can extend to Where do they all go??? (we can only perceive 3 dim’ns)

HDLSS Asymptotics: Simple Paradoxes For dim’al Standard Normal dist’n: indep. of High dim’al Angles (as ): - Everything is orthogonal??? - Where do they all go??? (again our perceptual limitations) - Again 1st order structure is non-random

HDLSS Asy’s: Geometrical Represent’n Assume, let Study Subspace Generated by Data Hyperplane through 0, ofdimension Points are “nearly equidistant to 0”, & dist Within plane, can “rotate towards Unit Simplex” All Gaussian data sets are: “near Unit Simplex Vertices”!!! “Randomness” appears only in rotation of simplex Hall, Marron & Neeman (2005)

HDLSS Asy’s: Geometrical Represent’n Assume, let Study Hyperplane Generated by Data dimensional hyperplane Points are pairwise equidistant, dist Points lie at vertices of: “regular hedron” Again “randomness in data” is only in rotation Surprisingly rigid structure in data?

HDLSS Asy’s: Geometrical Represen’tion Simulation View: study “rigidity after rotation” Simple 3 point data sets In dimensions d = 2, 20, 200, Generate hyperplane of dimension 2 Rotate that to plane of screen Rotate within plane, to make “comparable” Repeat 10 times, use different colors

HDLSS Asy’s: Geometrical Represen’tion Simulation View: shows “rigidity after rotation”

HDLSS Asy’s: Geometrical Represen’tion Explanation of Observed (Simulation) Behavior: “everything similar for very high d ” 2 popn’s are 2 simplices (i.e. regular n-hedrons) All are same distance from the other class i.e. everything is a support vector i.e. all sensible directions show “data piling” so “sensible methods are all nearly the same” Including 1 - NN

HDLSS Asy’s: Geometrical Represen’tion Straightforward Generalizations: non-Gaussian data: only need moments non-independent: use “mixing conditions” Mild Eigenvalue condition on Theoretical Cov. (Ahn, Marron, Muller & Chi, 2007) All based on simple “Laws of Large Numbers”

2 nd Paper on HDLSS Asymptotics Ahn, Marron, Muller & Chi (2007)  Assume 2 nd Moments  Assume no eigenvalues too large in sense: For assume i.e. (min possible) (much weaker than previous mixing conditions…)

2 nd Paper on HDLSS Asymptotics Background: In classical multivariate analysis, the statistic Is called the “epsilon statistic” And is used to test “sphericity” of dist’n, i.e. “are all cov’nce eigenvalues the same?”

2 nd Paper on HDLSS Asymptotics Can show: epsilon statistic: Satisfies: For spherical Normal, Single extreme eigenvalue gives So assumption is very mild Much weaker than mixing conditions

2 nd Paper on HDLSS Asymptotics Ahn, Marron, Muller & Chi (2007)  Assume 2 nd Moments  Assume no eigenvalues too large, : Then Not so strong as before:

2 nd Paper on HDLSS Asymptotics Can we improve on: ? John Kent example: Normal scale mixture Won’t get:

2 nd Paper on HDLSS Asymptotics Notes on Kent’s Normal Scale Mixture Data Vectors are indep’dent of each other But entries of each have strong depend’ce However, can show entries have cov = 0! Recall statistical folklore: Covariance = 0 Independence

0 Covariance is not independence Simple Example: Random Variables and Make both Gaussian With strong dependence Yet 0 covariance Given, define

0 Covariance is not independence Simple Example:

0 Covariance is not independence Simple Example:

0 Covariance is not independence Simple Example, c to make cov(X,Y) = 0

0 Covariance is not independence Simple Example: Distribution is degenerate Supported on diagonal lines Not abs. cont. w.r.t. 2-d Lebesgue meas. For small, have For large, have By continuity, with

0 Covariance is not independence Result: Joint distribution of and : – Has Gaussian marginals – Has – Yet strong dependence of and – Thus not multivariate Gaussian Shows Multivariate Gaussian means more than Gaussian Marginals

HDLSS Asy’s: Geometrical Represen’tion Further Consequences of Geometric Represen’tion 1. Inefficiency of DWD for uneven sample size (motivates weighted version, Xingye Qiao) 2. DWD more stable than SVM (based on deeper limiting distributions) (reflects intuitive idea feeling sampling variation) (something like mean vs. median) 3. 1-NN rule inefficiency is quantified.

HDLSS Math. Stat. of PCA, I Consistency & Strong Inconsistency: Spike Covariance Model, Paul (2007) For Eigenvalues: 1 st Eigenvector: How good are empirical versions, as estimates?

HDLSS Math. Stat. of PCA, II Consistency (big enough spike): For, Strong Inconsistency (spike not big enough): For,

HDLSS Math. Stat. of PCA, III Consistency of eigenvalues?  Eigenvalues Inconsistent  But known distribution  Unless as well

HDLSS Work in Progress, I Batch Adjustment: Xuxin Liu Recall Intuition from above: Key is sizes of biological subtypes Differing ratio trips up mean But DWD more robust Mathematics behind this?

Liu: Twiddle ratios of subtypes

HDLSS Data Combo Mathematics Xuxin Liu Dissertation Results:  Simple Unbalanced Cluster Model  Growing at rate as  Answers depend on Visualization of setting….

HDLSS Data Combo Mathematics

Asymptotic Results (as ):  For, DWD Consistent Angle(DWD,Truth)  For, DWD Strongly Inconsistent Angle(DWD,Truth)

HDLSS Data Combo Mathematics Asymptotic Results (as ):  For, PAM Inconsistent Angle(PAM,Truth)  For, PAM Strongly Inconsistent Angle(PAM,Truth)

HDLSS Data Combo Mathematics Value of, for sample size ratio : , only when  Otherwise for, PAM Inconsistent  Verifies intuitive idea in strong way

The Future of Geometrical Repres’tion? HDLSS version of “optimality” results? “Contiguity” approach? Params depend on d? Rates of Convergence? Improvements of DWD? (e.g. other functions of distance than inverse) It is still early days …

State of HDLSS Research? Development Of Methods Mathematical Assessment … (thanks to: defiant.corban.edu/gtipton/net-fun/iceberg.html)