Chapter 2 Dimensionality Reduction. Linear Methods

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Eigen Decomposition and Singular Value Decomposition
3D Geometry for Computer Graphics
Covariance Matrix Applications
Lecture 3: A brief background to multivariate statistics
Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
Tensors and Component Analysis Musawir Ali. Tensor: Generalization of an n-dimensional array Vector: order-1 tensor Matrix: order-2 tensor Order-3 tensor.
Dimensionality Reduction PCA -- SVD
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Lecture 7: Principal component analysis (PCA)
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Principal Component Analysis
Unsupervised Learning - PCA The neural approach->PCA; SVD; kernel PCA Hertz chapter 8 Presentation based on Touretzky + various additions.
Principal component analysis (PCA)
Face Recognition Jeremy Wyatt.
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Ordinary least squares regression (OLS)
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Principal component analysis (PCA)
Principal component analysis (PCA) Purpose of PCA Covariance and correlation matrices PCA using eigenvalues PCA using singular value decompositions Selection.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Separate multivariate observations
Object Orie’d Data Analysis, Last Time Finished Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp. gives rotation,
Summarized by Soo-Jin Kim
Linear Least Squares Approximation. 2 Definition (point set case) Given a point set x 1, x 2, …, x n  R d, linear least squares fitting amounts to find.
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Chapter 3 Data Exploration and Dimension Reduction 1.
1 Dimension Reduction Examples: 1. DNA MICROARRAYS: Khan et al (2001): 4 types of small round blue cell tumors (SRBCT) Neuroblastoma (NB) Rhabdomyosarcoma.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
SINGULAR VALUE DECOMPOSITION (SVD)
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis (PCA)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Chapter 13 Discrete Image Transforms
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Unsupervised Learning II Feature Extraction
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Unsupervised Learning II Feature Extraction
Principal Component Analysis
Principal component analysis (PCA)
Principal Component Analysis (PCA)
PREDICT 422: Practical Machine Learning
Exploring Microarray data
LECTURE 10: DISCRIMINANT ANALYSIS
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Principal Component Analysis
Covariance Vs Correlation Matrix
Recitation: SVD and dimensionality reduction
Principal Components Analysis
Principal Component Analysis (PCA)
Outline Singular Value Decomposition Example of PCA: Eigenfaces.
LECTURE 09: DISCRIMINANT ANALYSIS
Principal Component Analysis
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Chapter 2 Dimensionality Reduction. Linear Methods

2.1 Introduction Dimensionality reduction Goal: the process of finding a suitable lower dimensional space in which to represent the original data Goal: Explore high-dimensional data Visualize the data using 2-D or 3-D. Analyze the data using statistical methods, such as clustering, smoothing

possible methods just select subsets of the variables for processing An alternative would be to create new variables that are functions The methods we describe in this book are of the second type

Example 2.1 A projection will be in the form of a matrix that takes the data from the original space to a lower-dimensional one. To project onto a line that is θ radians from the horizontal or x axis projection matrix P

Example 2.1

2.2 Principal Component Analysis . PCA Aim: (PCA) is to reduce the dimensionality from p to d, where d < p, while at the same time accounting for as much of the variation in the original data set as possible new set of coordinates or variables that are a linear combination of the original variables the observations in the new principal component space are uncorrelated.

2.2.1 PCA Using the Sample Covariance Matrix centered data matrix Xc that has dimension Variable definition:

2.2.1 PCA Using the Sample Covariance Matrix The next step is to calculate the eigenvectors and eigenvalues of the matrix S subject to the condition that the set of eigenvectors is orthonormal. orthonormal

2.2.1 PCA Using the Sample Covariance Matrix A major result in matrix algebra shows that any square, symmetric, nonsingular matrix can be transformed to a diagonal matrix using the columns of A contain the eigenvectors of S, and L is a diagonal matrix with the eigenvalues along the diagonal. By convention, the eigenvalues are ordered in descending order

2.2.1 PCA Using the Sample Covariance Matrix use the eigenvectors of S to obtain new variables called principal components (PCs) Equation 2.2 shows that the PCs are linear combinations of the original variables. Scaling the eigenvectors Using wj in the transformation yields PCs that are uncorrelated with unit variance.

2.2.1 PCA Using the Sample Covariance Matrix transform the observations to the PC coordinate system via the following equation The matrix Z contains the principal component scores To summarize: the transformed variables are the PCs and the individual transformed data values are the PC scores.

2.2.1 PCA Using the Sample Covariance Matrix linear algebra theorem: the sum of the variances of the original variables is equal to the sum of the eigenvalues The idea of dimensionality reduction with PCA is that one could include in the analysis only those PCs that have the highest eigenvalues Reduce the dimensionality to d with the following Ad contains the first d eigenvectors or columns of A

2.2.2 PCA Using the Sample Correlation Matrix We can scale the data first to have standard units The standardized data x* are then treated as observations in the PCA process. sample correlation matrix R The correlation matrix should be used for PCA when the variances along the original dimensions are very different

2.2.2 PCA Using the Sample Correlation Matrix Something should be noted: Methods for statistical inference based on the sample PCs from covariance matrices are easier and are available in the literature. the PCs obtained from the correlation and covariance matrices do not provide equivalent information.

2.2.3 How Many Dimensions Should We Keep? following possible ways to address this question Cumulative Percentage of Variance Explained Scree Plot The Broken Stick Size of Variance Example 2.2 We show how to perform PCA using the yeast cell cycle data set.

2.2.3 How Many Dimensions Should We Keep? Cumulative Percentage of Variance Explained The idea is to select those d PCs that contribute a specified cumulative percentage of total variation in the data

2.2.3 How Many Dimensions Should We Keep? Scree Plot graphical way to decide the number of PCs The original idea: a plot of lk (the eigenvalue) versus k (the index of the eigenvalue) In some cases, we might plot the log of the eigenvalues when the first eigenvalues are very large looks for the elbow in the curve or the place where the curve levels off and becomes almost flat. The value of k at this elbow is an estimate for how many PCs to retain

2.2.3 How Many Dimensions Should We Keep? The Broken Stick choose the number of PCs based on the size of the eigenvalue or the proportion of the variance explained by the individual PC. take a line segment and randomly divide it into p segments, the expected length of the k-th longest segment the proportion of the variance explained by the k-th PC is greater than gk, that PC is kept.

2.2.3 How Many Dimensions Should We Keep? Size of Variance we would keep PCs if where

2.2.3 How Many Dimensions Should We Keep? Example 2.2 Yeast that these contain 384 gene corresponding to five phases, measured at 17 time points.

2.3 Singular Value Decomposition . SVD provides a way to find the PCs without explicitly calculating the covariance matrix the technique is valid for an arbitrary matrix We use the noncentered form in the explanation that follows The SVD of X is given by where U is an matrix D is a diagonal matrix with n rows and p columns, and V has dimensions

2.3 Singular Value Decomposition . SVD

2.3 Singular Value Decomposition . SVD the first r columns of V form an orthonormal basis for the column space of X the first r columns of U form a basis for the row space of X As with PCA, we order the singular values decreasing ordered and impose the same order on the columns of U and V approximation to the original matrix X is obtained

2.3 Singular Value Decomposition . SVD Example 2.3 applied to information retrieval (IR) start with a data matrix, where each row corresponds to a term, and each column corresponds to a document in the corpus this query is given by a column vector

Example 2.3

Example 2.3 Method to find the most relevant documents cosine of the angle between the query vectors and the columns use a cutoff value of 0.5 the second query matches with the first book, but misses the fourth one

Example 2.3 The idea is that some of the dimensions represented by the full term-document matrix are noise and that documents will have closer semantic structure after dimensionality reduction using SVD find the representation of the query vector in the reduced space given by the first k columns of U

Example 2.3 Why? Consider Note that The columns of U and V are orthonormal Equation 2.6 left-multiply

Example 2.3 Using a cutoff value of 0.5,we now correctly have documents 1 and 4 as being relevant to our queries on baking bread and baking.

Thanks