Download presentation
Presentation is loading. Please wait.
Published byMervyn Greene Modified over 9 years ago
1
Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory
2
Reduce complexity Visual Computational Identify the intrinsic dimensionality of data Identify the most relevant aspects of data given a task
3
Lower Dimension Higher Dimension
4
a) b) Not all projections are equal
5
Desired properties Reduced, compressed representation Preserved useful/intrinsic properties of the data Applify patterns of interest (e.g. outliers) Simple, interpretable Trade-off between simplicity and preservation of structure
6
Helps us organize the data Helps us discriminate patterns
7
Manhattan distance (1 norm, taxicab distance) Euclidean distance (2 norm)
8
L-p Distance As p grows the largest coordinate distances tends to dominate the global distance
10
Projective methods: preserve a property of data Principal Component Analysis (PCA) Many others: ICA, Factor Analysis, Manifold Learning Multidimensional Dimension Reduction (MDS) LLE, Isomap
11
Goal: Find a linear projection that captures most of variance 1 st Principal Component 2 nd Principal Component 1 st Principal Component
12
PCA pseudo code: Centralize the data by subtracting the mean Calculate the covariance matrix: Calculate the eigenvectors(principal components) of the covariance matrix Select top few(2-3) eigenvectors (highest eigenvalues) Project the data using these eigenvectors as axis
13
Screeplot Biplot
14
Goal: Find a lower embedding of the data that preserves pairwise distances Formally: : Input distance values : Output distances values
16
Shepard Diagram MDS Distances Data Distances
17
More features are not necessarily better Understand the assumptions of different modeling choices When choosing distance functions, projection methods Consider the characteristics of the data Consider the learning objective Explore multiple choices simultaneously to gain better insight
18
http://statweb.stanford.edu/~jtaylo/courses/stats202/mds.html https://planspacedotorg.wordpress.com/2013/02/03/pca-3d-visualization-and- clustering-in-r/ Multidimensional Scaling, Leland Wilkinson Dimension Reduction: A Guided Tour, Christopher J.C. Burgesti When is “nearest neighbor” meaningful?, Beyer, K.S., GoldStein, J. Ramakrishnan, R. & Shaft g, by
19
The effect of concentration of distances Lower DimensionHigher Dimension
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.