Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Component Analysis (Review)
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Lecture 7: Principal component analysis (PCA)
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Data mining and statistical learning - lab2-4 Lab 2, assignment 1: OLS regression of electricity consumption on temperature at 53 sites.
Principal Component Analysis
Principal component analysis (PCA)
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Principal component analysis (PCA) Purpose of PCA Covariance and correlation matrices PCA using eigenvalues PCA using singular value decompositions Selection.
Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.
Techniques for studying correlation and covariance structure
Correlation. The sample covariance matrix: where.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc. Revised talk:
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Summarized by Soo-Jin Kim
Chapter 2 Dimensionality Reduction. Linear Methods
Presented By Wanchen Lu 2/25/2013
The Multiple Correlation Coefficient. has (p +1)-variate Normal distribution with mean vector and Covariance matrix We are interested if the variable.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Eigen Decomposition Based on the slides by Mani Thomas Modified and extended by Longin Jan Latecki.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.
Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Lecture 12 Factor Analysis.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
CpSc 881: Machine Learning
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Logistic Regression & Elastic Net
Feature Selection and Extraction Michael J. Watts
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Principal Component Analysis
Multivariate Transformation. Multivariate Transformations  Started in statistics of psychology and sociology.  Also called multivariate analyses and.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Unsupervised Learning II Feature Extraction
Matrices.
Canonical Correlation Analysis (CCA). CCA This is it! The mother of all linear statistical analysis When ? We want to find a structural relation between.
Factor and Principle Component Analysis
Information Management course
Background on Classification
LECTURE 10: DISCRIMINANT ANALYSIS
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Principal Component Analysis
Measuring latent variables
Measuring latent variables
Descriptive Statistics vs. Factor Analysis
Measuring latent variables
Matrix Algebra and Random Vectors
X.1 Principal component analysis
Principal Component Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Factor Analysis (Principal Components) Output
Principal Component Analysis
Measuring latent variables
Presentation transcript:

Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such as ridge regression  Methods using derived input directions  Principal components regression  Partial least squares regression (PLS)

Data mining and statistical learning, lecture 4 Partitioning of the expected squared prediction error bias Shrinkage decreases the variance but increases the bias Shrinkage methods are more robust to structural changes in the analysed data

Data mining and statistical learning, lecture 4 Advantages of ridge regression over OLS The models are easier to comprehend because strongly correlated inputs tend to get similar regression coefficients Generalizations to new data sets are facilitated by a larger robustness to structural changes in the analysed data set

Data mining and statistical learning, lecture 4 Ridge regression - a note on standardization The principal components and the shrinkage in ridge regression are scale-dependent. Inputs are normally standardized to mean zero and variance one prior to the regression

Data mining and statistical learning, lecture 4 Regression methods using derived input directions Extract linear combinations of the inputs as derived features, and then model the target (response) as a linear function of these features x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y

Data mining and statistical learning, lecture 4 Absorbance records for ten samples of chopped meat 1 response variable (fat) 100 predictors (absorbance at 100 wavelengths or channels) The predictors are strongly correlated to each other

Data mining and statistical learning, lecture 4 Absorbance records for ten samples of chopped meat High fat samples Low fat samples

Data mining and statistical learning, lecture 4 3-D plots of absorbance records for samples of meat - channels 1, 50 and 100

Data mining and statistical learning, lecture 4 3-D plots of absorbance records for samples of meat - channels 40, 50 and 60

Data mining and statistical learning, lecture 4 3-D plot of absorbance records for samples of meat - channels 49, 50 and 51

Data mining and statistical learning, lecture 4 Matrix plot of absorbance records for samples of meat - channels 1, 50 and 100

Data mining and statistical learning, lecture 4 Principal Component Analysis (PCA) PCA is a technique for reducing the complexity of high dimensional data It can be used to approximate high dimensional data with a few dimensions so that important features can be visually examined

Data mining and statistical learning, lecture 4 Principal Component Analysis - two inputs PC1 PC2

Data mining and statistical learning, lecture 4 3-D plot of artificially generated data - three inputs PC1 PC2

Data mining and statistical learning, lecture 4 Principal Component Analysis The first principal component (PC1) is the direction that maximizes the variance of the projected data The second principal component (PC2) is the direction that maximizes the variance of the projected data after the variation along PC1 has been removed The third principal component (PC3) is the direction that maximizes the variance of the projected data after the variation along PC1 and PC2 has been removed

Data mining and statistical learning, lecture 4 Eigenvector and eigenvalue In this shear transformation of the Mona Lisa, the picture was deformed in such a way that its central vertical axis (red vector) was not modified, but the diagonal vector (blue) has changed direction. Hence the red vector is an eigenvector of the transformation and the blue vector is not. Since the red vector was neither stretched nor compressed, its eigenvalue is 1.

Data mining and statistical learning, lecture 4 Sample covariance matrix where

Data mining and statistical learning, lecture 4 Eigenvectors of covariance and correlation matrices The eigenvectors of a covariance matrix provide information about the major orthogonal directions of the variation in the inputs The eigenvalues provide information about the strength of the variation along the different eigenvectors The eigenvectors and eigenvalues of the correlation matrix provide scale-independent information about the variation of the inputs

Data mining and statistical learning, lecture 4 Principal Component Analysis Eigenanalysis of the Covariance Matrix Eigenvalue Proportion Cumulative Variable PC1 PC2 X X Loadings

Data mining and statistical learning, lecture 4 Principal Component Analysis Coordinates in the coordinate system determined by the principal components

Data mining and statistical learning, lecture 4 Principal Component Analysis Eigenanalysis of the Covariance Matrix Eigenvalue Proportion Cumulative Variable PC1 PC2 PC3 x y z

Data mining and statistical learning, lecture 4 Scree plot

Data mining and statistical learning, lecture 4 Principal Component Analysis - absorbance data from samples of chopped meat Eigenanalysis of the Covariance Matrix Eigenvalue Proportion Cumulative

Data mining and statistical learning, lecture 4 Scree plot - absorbance data One direction is responsible for most of the variation in the inputs

Data mining and statistical learning, lecture 4 Loadings of PC1, PC2 and PC3 - absorbance data The loadings define derived inputs (linear combinations of the inputs)

Data mining and statistical learning, lecture 4 Software recommendations Minitab 15  Stat  Multivariate  Principal components SAS Enterprise Miner  Princomp/Dmneural

Data mining and statistical learning, lecture 4 Regression methods using derived input directions - Partial Least Squares Regression Extract linear combinations of the inputs as derived features, and then model the target (response) as a linear function of these features x1x1 x1x1 xpxp z1z1 z2z2 zMzM … … y Select the intermediates so that the covariance with the response variable is maximized Normally, the inputs are standardized to mean zero and variance one prior to the PLS analysis

Data mining and statistical learning, lecture 4 Partial least squares regression (PLS) Step 1: Standardize inputs to mean zero and variance one Step 2: Compute the first derived input by setting where the  1j are standardized univariate regression coefficients of the response vs each of the inputs Repeat: Remove the variation in the inputs along the directions determined by existing z-vectors Compute another derived input

Data mining and statistical learning, lecture 4 Methods using derived input directions Principal components regression (PCR) The derived directions are determined by the X -matrix alone, and are orthogonal Partial least squares regression (PLS) The derived directions are determined by the covariance of the output and linear combinations of the inputs, and are orthogonal

Data mining and statistical learning, lecture 4 PLS in SAS The following statements are available in PROC PLS. Items within the brackets are optional. PROC PLS PROC PLSPROC PLS PROC PLS ; BY BY BY variables ; CLASS CLASS CLASS variables ; MODEL MODEL MODEL dependent-variables = effects ; OUTPUT OUTPUT OUTPUT OUT= SAS-data-set ; To analyze a data set, you must use the PROC PLS and MODEL statements. You can use the other statements as needed.

Data mining and statistical learning, lecture 4 proc PLS in SAS proc pls data=mining.tecatorscores method=pls nfac=10; model fat=channel1-channel100; output out=tecatorpls predicted=predpls; proc pls data=mining.tecatorscores method=pcr nfac=10; model fat=channel1-channel100; output out=tecatorpcr predicted=predpcr; run;