Download presentation
Presentation is loading. Please wait.
Published byAleesha Reed Modified over 8 years ago
1
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, III J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina March 21, 2016
2
2 UNC, Stat & OR HDLSS Space is a Weird Place, I Maximal Data Piling (with J. Y. Ahn) In HDLSS Binary Discrimination: There is direction where: Class +1 projections pile at one pt. Class -1 projections pile at another
3
3 UNC, Stat & OR HDLSS Space is a Weird Place, II Maximal Data Piling Mathematics: Exists w.p. 1, when abs. cont. w.r.t. Lebesgue measure Unique within subspace gend by Data Formula very similar to FLD (pooled within cov. global cov.) Same as FLD when n < d
4
4 UNC, Stat & OR HDLSS Space is a Weird Place, III ~2 n MDP Dirns Useful for clustering? Hard Optimization…
5
5 UNC, Stat & OR HDLSS Space is a Weird Place, IV Parallel Directions (with X. Liu)
6
6 UNC, Stat & OR Time Series of Curves Chemical Spectra, evolving over time (with J. Wendelberger & E. Kober) Mortality curves changing in time (with Andres Alonzo) Visualization: Similar tools, PCA & Dirns But color according to time
7
7 UNC, Stat & OR Chemical Spectra, I
8
8 UNC, Stat & OR Chemical Spectra, II
9
9 UNC, Stat & OR Chemical Spectra, III
10
10 UNC, Stat & OR Chemical Spectra, IV
11
11 UNC, Stat & OR Chemical Spectra, V
12
12 UNC, Stat & OR Chemical Spectra, VI
13
13 UNC, Stat & OR Chemical Spectra, VII
14
14 UNC, Stat & OR Chemical Spectra, VIII
15
15 UNC, Stat & OR Demography Data Mortality, as a function of age “Chance of dying”, for Males of each 1-year age group Curves are years 1908 - 2002 PCA of the family of curves
16
16 UNC, Stat & OR Demography Data PCA of the family of curves for Males Babies & elderly “most mortal” (Raw) All getting better over time (Raw & PC1) Except 1918 - Influenza Pandemic (see Color Scale)Color Scale Middle age most mortal (PC2): 1918 Early 1930s - Spanish Civil War 1980 – 1994 (then better) auto wrecks Decade Rounding (several places)
17
17 UNC, Stat & OR Demography Data PCA for Males in Switzerland Most aspects similar No decade rounding (better records) 1918 Flu – Different Color (PC2) (see Color Scale)Color Scale No War Changes Steady improvement until 70s (PC2) When auto accidents kicked in
18
18 UNC, Stat & OR Demography Data Dual PCA Idea: Rows and Columns trade places Demographic Primal View: Curves are Years, Coord’s are Ages Demographic Dual View: Curves are Ages, Coord’s are Years Dual PCA View, Spanish Males
19
19 UNC, Stat & OR Demography Data Dual PCA View, Spanish Males Olde people have const. mortality (raw) But improvement for rest (raw) Bad for 1918 (flu) & Spanish Civil War, but generally improving (mean) Improves for ages 1-6, then worse (PC1) Big Improvement for young (PC2) (Age Color Key)Age Color Key
20
20 UNC, Stat & OR Discrimination for m-reps Classification for Lie Groups – Symm. Spaces S. K. Sen & S. Joshi What is “separating plane” (for SVM-DWD)?
21
21 UNC, Stat & OR Trees as Data Points, I Brain Blood Vessel Trees - E. Bullit & H. Wang Statistical Understanding of Population? Mean? PCA? Challenge: Very Non-Euclidean
22
22 UNC, Stat & OR Trees as Data Points, II Mean of Tree Population: Frechét Approach PCA on Trees (based on “tree lines”) Theory in Place - Implementation?
23
23 UNC, Stat & OR HDLSS Asymptotics: Simple Paradoxes, I For dim’al “Standard Normal” dist’n: Euclidean Distance to Origin (as ): - Data lie roughly on surface of sphere of radius - Yet origin is point of “highest density”??? - Paradox resolved by: “density w. r. t. Lebesgue Measure”
24
24 UNC, Stat & OR HDLSS Asymptotics: Simple Paradoxes, II For dim’al “Standard Normal” dist’n: indep. of Euclidean Dist. between and (as ): Distance tends to non-random constant: Can extend to Where do they all go??? (we can only perceive 3 dim’ns)
25
25 UNC, Stat & OR HDLSS Asymptotics: Simple Paradoxes, III For dim’al “Standard Normal” dist’n: indep. of High dim’al Angles (as ): - -“Everything is orthogonal”??? - Where do they all go??? (again our perceptual limitations) - Again 1st order structure is non-random
26
26 UNC, Stat & OR HDLSS Asy’s: Geometrical Representation, I Assume, let Study Subspace Generated by Data a. Hyperplane through 0, of dimension b. Points are “nearly equidistant to 0”, & dist c. Within plane, can “rotate towards Unit Simplex” d. All Gaussian data sets are“near Unit Simplex Vertices”!!! “Randomness” appears only in rotation of simplex With P. Hall & A. Neemon
27
27 UNC, Stat & OR HDLSS Asy’s: Geometrical Representation, II Assume, let Study Hyperplane Generated by Data a. dimensional hyperplane b. Points are pairwise equidistant, dist c. Points lie at vertices of “regular hedron” d. Again “randomness in data” is only in rotation e. Surprisingly rigid structure in data?
28
28 UNC, Stat & OR HDLSS Asy’s: Geometrical Representation, III Simulation View: shows “rigidity after rotation”
29
29 UNC, Stat & OR HDLSS Asy’s: Geometrical Representation, III Straightforward Generalizations: non-Gaussian data: only need moments non-independent: use “mixing conditions” Mild Eigenvalue condition on Theoretical Cov. (with J. Ahn, K. Muller & Y. Chi) All based on simple “Laws of Large Numbers”
30
30 UNC, Stat & OR HDLSS Asy’s: Geometrical Representation, IV Explanation of Observed (Simulation) Behavior: “everything similar for very high d” 2 popn’s are 2 simplices (i.e. regular n-hedrons) All are same distance from the other class i.e. everything is a support vector i.e. all sensible directions show “data piling” so “sensible methods are all nearly the same” Including 1 - NN
31
31 UNC, Stat & OR HDLSS Asy’s: Geometrical Representation, V Further Consequences of Geometric Representation 1. Inefficiency of DWD for uneven sample size (motivates “weighted version”, work in progress) 2. DWD more “stable” than SVM (based on “deeper limiting distributions”) (reflects intuitive idea “feeling sampling variation”) (something like “mean vs. median”) 3. 1-NN rule inefficiency is quantified.
32
32 UNC, Stat & OR The Future of Geometrical Representation? HDLSS version of “optimality” results? “Contiguity” approach? Params depend on d? Rates of Convergence? Improvements of DWD? (e.g. other functions of distance than inverse) It is still early days …
33
33 UNC, Stat & OR Some Carry Away Lessons Atoms of the Analysis: Object Oriented HDLSS contexts deserve further study DWD is attractive for HDLSS classification “Randomness” in HDLSS data is only in rotations (Modulo rotation, have context simplex shape) How to put HDLSS asymptotics to work?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.