Download presentation
Presentation is loading. Please wait.
Published byLawrence Barker Modified over 9 years ago
1
Object Orie’d Data Analysis, Last Time Organizational Matters http://www.unc.edu/~marron/UNCstat322-2005/HomePage.html What is OODA? Visualization by Projection Object Space & Feature Space Curves as Data Data Representation Issues PCA visualization
2
Data Object Conceptualization Object Space Feature Space Curves Images Manifolds Shapes Tree Space Trees
3
Functional Data Analysis, Toy EG I
4
Easy way to do these analyses Matlab software (user friendly?) available: http://www.stat.unc.edu/postscript/papers/marron/Matlab7Software/ Download & put in Matlab Path: General Smoothing Look first at: curvdatSM.m scatplotSM.m
5
Time Series of Curves Again a “Set of Curves” But now Time Order is Important! An approach: Use color to code for time Start End
6
Time Series Toy E.g. Explore Question of Eli Broadhurst: “Is Horizontal Motion Linear Variation?” Example: Set of time shifted Gaussian densities View: Code time with colors as above
7
T. S. Toy E.g., Raw Data
8
T. S. Toy E.g., PCA View PCA gives “Modes of Variation” But there are Many… Intuitively Useful??? Like “harmonics”? Isn’t there only 1 mode of variation? Answer comes in 2-d scatterplots
9
T. S. Toy E.g., PCA Scatterplot
10
Where is the Point Cloud? Lies along a 1-d curve in So actually have 1-d mode of variation But a non-linear mode of variation Poorly captured by PCA (linear method) Will study more later
11
Chemo-metric Time Series Mass Spectrometry Measurements On an Aging Substance, called “Estane” Made over Logarithmic Time Grid, n = 60 Each is a Spectrum What about Time Evolution? Approach: PCA & Time Coloring
12
Chemo-metric Time Series Joint Work w/ E. Kober & J. Wendelberger Los Alamos National Lab Four Experimental Conditions: 1.Control 2.Aged 59 days in Dry Air 3.Aged 27 days in Humid Air 4.Aged 59 days in Humid Air
13
Chemo-metric Time Series, HA 27
14
Raw Data: All 60 spectra essentially the same “Scale” of mean is much bigger than variation about mean Hard to see structure of all 1600 freq’s Centered Data: Now can see different spectra Since mean subtracted off Note much smaller vertical axis
15
Chemo-metric Time Series, HA 27
16
Data zoomed to “important” freq’s: Raw Data: Now see slight differences Smoother “natural looking” spectra Centered Data: Differences in spectra more clear Maybe now have “real structure” Scale is important
17
Chemo-metric Time Series, HA 27
18
Use of Time Order Coloring: Raw Data: Can see a little ordering, not much Centered Data: Clear time ordering Shifting peaks? (compare to Raw) PC1: Almost everything? PC1 Residuals: Data nearly linear (same scale import’nt)
19
Chemo-metric Time Series, Control
20
PCA View Clear systematic structure Time ordering very important Reminiscent of Toy Example A clear 1-d curve in Feature Space Physical Explanation?
21
Toy Data Explanations Simple Chemical Reaction Model: Subst. 1 transforms into Subst. 2 Note: linear path in Feature Space
22
Toy Data Explanations Richer Chemical Reaction Model: Subst. 1 Subst. 2 Subst. 3 Curved path in Feat. Sp. 2 Reactions Curve lies in 2-dim’al subsp.
23
Toy Data Explanations Another Chemical Reaction Model: Subst. 1 Subst. 2 & Subst. 5 Subst. 6 Curved path in Feat. Sp. 2 Reactions Curve lies in 2-dim’al subsp.
24
Toy Data Explanations More Complex Chemical Reaction Model: 1 2 3 4 Curved path in Feat. Sp. (lives in 3-d) 3 Reactions Curve lies in 3-dim’al subsp.
25
Toy Data Explanations Even More Complex Chemical Reaction Model: 1 2 3 4 5 Curved path in Feat. Sp. (lives in 4-d) 4 Reactions Curve lies in 4-dim’al subsp.
26
Chemo-metric Time Series, Control
27
Suggestions from Toy Examples: Clearly 3 reactions under way Maybe a 4 th ??? Hard to distinguish from noise? Interesting statistical open problem!
28
Chemo-metric Time Series What about the other experiments? Recall: 1.Control 2.Aged 59 days in Dry Air 3.Aged 27 days in Humid Air 4.Aged 59 days in Humid Air Above results were “cherry picked”, to best makes points What about cases???
29
Scatterplot Matrix, Control Above E.g., maybe ~4d curve ~4 reactions
30
Scatterplot Matrix, Da59 PC2 is “bleeding of CO2”, discussed below
31
Scatterplot Matrix, Ha27 Only “3-d + noise”? Only 3 reactions
32
Scatterplot Matrix, Ha59 Harder to judge???
33
Object Space View, Control Terrible discretization effect, despite ~4d …
34
Object Space View, Da59 OK, except strange at beginning (CO2 …)
35
Object Space View, Ha27 Strong structure in PC1 Resid (d < 2)
36
Object Space View, Ha59 Lots at beginning, OK since “oldest”
37
Problem with Da59 What about strange behavior for DA59? Recall: PC2 showed “really different behavior at start” Chemists comments: Ignore this, should have started measuring later…
38
Problem with Da59 But still fun to look at broader spectra
39
Chemo-metric T. S. Joint View Throw them all together as big population Take Point Cloud View
40
Chemo-metric T. S. Joint View
41
Throw them all together as big population Take Point Cloud View Note 4d space of interest, driven by: 4 clusters (3d) PC1 of chemical reaction (1-d) But these don’t appear as the 4 PCs Chem. PC1 “spread over PC2,3,4” Essentially a “rotation of interesting dir’ns” How to “unrotate”???
42
Chemo-metric T. S. Joint View Interesting Variation: Remove cluster means Allows clear comparison of within curve variation
43
Chemo-metric T. S. Joint View (- mean)
44
Chemo-metric T. S. Joint View Interesting Variation: Remove cluster means Allows clear comparison of within curve variation: PC1 versus others are quite revealing (note different “rotations”) Others don’t show so much
45
Demography Data Joint Work with: Andres Alonso Univ. Carlos III, Madrid Mortality, as a function of age “ Chance of dying ”, for Males, in Spain of each 1-year age group Curves are years 1908 - 2002 PCA of the family of curves
46
Demography Data PCA of the family of curves for Males Babies & elderly “ most mortal ” (Raw) All getting better over time (Raw & PC1) Except 1918 - Influenza Pandemic (see Color Scale)Color Scale Middle age most mortal (PC2): –1918 –Early 1930s - Spanish Civil War –1980 – 1994 (then better) auto wrecks Decade Rounding (several places)
47
Demography Data PCA for Females in Spain Most aspects similar (see Color Scale)Color Scale No War Changes –Steady improvement until 70s (PC2) –When auto accidents kicked in
48
Demography Data PCA for Males in Switzerland Most aspects similar No decade rounding (better records) 1918 Flu – Different Color (PC2) (see Color Scale)Color Scale No War Changes –Steady improvement until 70s (PC2) –When auto accidents kicked in
49
Demography Data Dual PCA Idea: Rows and Columns trade places Demographic Primal View: Curves are Years, Coord ’ s are Ages Demographic Dual View: Curves are Ages, Coord ’ s are Years Dual PCA View, Spanish Males
50
Demography Data Dual PCA View, Spanish Males Old people have const. mortality (raw) But improvement for rest (raw) Bad for 1918 (flu) & Spanish Civil War, but generally improving (mean) Improves for ages 1-6, then worse (PC1) Big Improvement for young (PC2) (Age Color Key)Age Color Key
51
Yeast Cell Cycle Data “ Gene Expression ” – Micro-array data Data (after major preprocessing): Expression “ level ” of: thousands of genes (d ~ 1,000s) but only dozens of “ cases ” (n ~ 10s) Interesting statistical issue: High Dimension Low Sample Size data (HDLSS)
52
Yeast Cell Cycle Data Data from: Spellman, P. T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D. and Futcher, B. (1998), “ Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization ”, Molecular Biology of the Cell, 9, 3273-3297.
53
Yeast Cell Cycle Data Analysis here is from: Zhao, X., Marron, J.S. and Wells, M.T. (2004) The Functional Data View of Longitudinal Data, Statistica Sinica, 14, 789-808
54
Yeast Cell Cycle Data Lab experiment: Chemically “ synchronize cell cycles ”, of yeast cells Do cDNA micro-arrays over time Used 18 time points, over “ about 2 cell cycles ” Studied 4,489 genes (whole genome) Time series view of data: 4,489 time series of length 18 Functional Data View: 4,489 “ curves ”
55
Yeast Cell Cycle Data, FDA View Central question: Which genes are “ periodic ” over 2 cell cycles?
56
Yeast Cell Cycle Data, FDA View Periodic genes? Na ï ve approach: Simple PCA
57
Yeast Cell Cycle Data, FDA View Central question: which genes are “ periodic ” over 2 cell cycles? Na ï ve approach: Simple PCA No apparent (2 cycle) periodic structure? Eigenvalues suggest large amount of “ variation ” PCA finds “ directions of maximal variation ” Often, but not always, same as “ interesting directions ” Here need better approach to study periodicities
58
Yeast Cell Cycles, Freq. 2 Proj. PCA on Freq. 2 Periodic Component Of Data
59
Yeast Cell Cycles, Freq. 2 Proj. PCA on periodic component of data Hard to see periodicities in raw data But very clear in PC1 (~sin) and PC2 (~cos) PC1 and PC2 explain 65% of variation (see residuals) Recall linear combos of sin and cos capture “ phase ” since:
60
Frequency 2 Analysis Important features of data appear only at frequency 2, Hence project data onto 2-dim space of sin and cos (freq. 2) Useful view: scatterplot
61
Frequency 2 Analysis
62
Project data onto 2-dim space of sin and cos (freq. 2) Useful view: scatterplot Angle (in polar coordinates) shows phase Colors: Spellman ’ s cell cycle phase classification Black was labeled “ not periodic ” Within class phases approx ’ ly same, but notable differences Later will try to improve “ phase classification ”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.