Information Management course

Slides:



Advertisements
Similar presentations
Covariance Matrix Applications
Advertisements

Component Analysis (Review)
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
The General Linear Model. The Simple Linear Model Linear Regression.
Lecture 7: Principal component analysis (PCA)
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Canonical correlations
Chapter 3 Determinants and Matrices
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Tables, Figures, and Equations
Techniques for studying correlation and covariance structure
Correlation. The sample covariance matrix: where.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Separate multivariate observations
Objectives of Multiple Regression
Summarized by Soo-Jin Kim
Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.
Chapter 2 Dimensionality Reduction. Linear Methods
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Some matrix stuff.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
A Review of Some Fundamental Mathematical and Statistical Concepts UnB Mestrado em Ciências Contábeis Prof. Otávio Medeiros, MSc, PhD.
1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Discriminant Analysis
Lecture 12 Factor Analysis.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Principal Component Analysis
Introduction to Vectors and Matrices
Correlation.
Information Management course
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Exploring Microarray data
Principal Components Analysis
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Descriptive Statistics vs. Factor Analysis
Measuring latent variables
Principal Components Analysis
Principal Component Analysis (PCA)
Dimensionality Reduction
Feature space tansformation methods
Multivariate Statistical Methods
Factor Analysis BMTRY 726 7/19/2018.
Principal Components What matters most?.
Digital Image Processing Lecture 21: Principal Components for Description Prof. Charlene Tsai *Chapter 11.4 of Gonzalez.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Introduction to Vectors and Matrices
Factor Analysis (Principal Components) Output
Feature Selection Methods
Principal Component Analysis
Lecture 8: Factor analysis (FA)
Measuring latent variables
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 08: 23/10/2014

Data Mining: Methods and Models — Chapter 1 — Daniel T. Larose ©2006 John Wiley and Sons 2 2

Data (Dimension) Reduction In large datasets it is unlikely that all attributes are independent: multicollinearity Worse mining quality: Instability in multiple regression (significant overall, but poor wrt significant attributes) Overemphasize particular attributes (multiple counts) Violates principle of parsimony (too many unnecessary predictors in a relation with a response var) Curse of dimensionality: Sample size needed to fit a multivariate function grows exponentially with number of attributes e.g. in 1-dimensional distrib. 68% of normally distributed values lie between -1 and 1; in 10-dimensional distrib. only 0.02% within the radius 1 hypersphere 3 3 3

Principal Component Analysis (PCA) Try to explain correlation using a small set of linear combination of attributes Geometrically: Look at the attributes as variables forming a coordinate system Principal Components are a new coordinate system, found by rotating the original system along the directions of maximum variability

PCA – Step 1: preprocess data Notation (review): Dataset with n rows and m columns Attributes (columns): Xj Mean of each attrib: Variance of each attrib: Covariance between two attrib: Correlation coefficient: μ 𝑗 = 1 𝑛 𝑖=1 𝑛 𝑋 𝑖 𝑗 σ 𝑗𝑗 2 = 1 𝑛 𝑖=1 𝑛 𝑋 𝑖 𝑗 − μ 𝑗 2 σ 𝑘𝑗 2 = 1 𝑛 𝑖=1 𝑛 𝑋 𝑖 𝑘 − μ 𝑘 ⋅ 𝑋 𝑖 𝑗 − μ 𝑗 𝑟 𝑘𝑗 = σ 𝑘𝑗 2 σ 𝑘𝑘 σ 𝑗𝑗

PCA – Step 1: preprocess data Definitions Standard Deviation Matrix: (Symmetric) Covariance Matrix: Correlation Matrix: Standardization in matrix form: N.B. E(Z) = vector of zeros; Cov(Z) = ρ 𝑉 1 2 = σ 11 0 ... 0 0 σ 22 ... ... ... ... ... ... ... ... ... σ 𝑚𝑚 𝐶𝑜𝑣= σ 11 2 σ 12 2 ... σ 1m 2 σ 21 2 σ 22 2 ... σ 2m 2 ... ... ... ... ... ... ... σ 𝑚𝑚 2 ρ= 𝑟 𝑘𝑗 𝑍= 𝑋−μ 𝑉 1 2 −1 𝑍 𝑖𝑗 = 𝑋 𝑖 𝑗 − μ 𝑗 σ 𝑗𝑗

PCA – Step 2: compute eigenvalues and eigenvectors Eigenvalues of (mxm matrix) ρ are scalars λ1 ... λm such that det(ρ – λI) = 0 Given a matrix ρ and its eigenvalue λj, ej is a corresponding (mx1) eigenvector if ρ ej = λjej Spectral theorem / symmetric eigenvalue decomposition (for symmetric ρ) We are interested in eigenvalues / eigenvectors of the correlation matrix ρ= 𝑗=1 𝑚 λ 𝑗 𝑒 𝑗 𝑒 𝑗 𝑇

PCA – Step 3: compute principal components Consider the (nx1 column) vectors Yj = Z ej e.g. Y1 = e11 Z1 + e12 Z2 + … + e1m Zm Sort Yi by value of variance: Var(Yj) = (ej)T ρ (ej) Then Start with an empty sequence of principal components Select the vector ej that maximizes Var(Yj) Is independent from all selected components Goto (2)

PCA – Properties Property 1: The total variability in the standardized data set equals the sum of the variances for each column vector Zj, which equals the sum of the variances for each component, which equals the sum of the eigenvalues, Which equals the number of variables 𝑗=1 𝑚 𝑉𝑎𝑟 𝑌 𝑗 = 𝑗=1 𝑚 𝑉𝑎𝑟 𝑍 𝑗 = 𝑗=1 𝑚 λ 𝑗 =𝑚

PCA – Properties Property 2: The partial correlation between a given component and a given variable is a function of an eigenvector and an eigenvalue. In particular, Corr(Yk, Zj) = ekj sqrt(λk) Property 3: The proportion of the total variability in Z that is explained by the jth principal component is the ratio of the jth eigenvalue to the number of variables, that is the ratio λj/m

PCA – Experiment on real data Open R and read “cadata.txt” Keep first attribute (say 0) as response, remaining ones as predictors Know Your Data: Barplot and scatterplot attributes Normalize Data Scatterplot normalized data Compute correlation matrix Compute eigenvalues and eigenvectors Compute components (eigenvectors) – attribute correlation matrix Compute cumulative variance explained by principal components

PCA – Experiment on real data Details on the dataset: Block groups of houses (1990 California census) Response: Median house value Predictors: Median income Housing median age Total rooms Total bedrooms Population Households Latitude Longitude

PCA – Step 4: choose components How many components should we extract? Eigenvalue criterion Keep components having λ>1 (they “explain” more than 1 attribute) Proportion of the variance explained Fix a coefficient of determination r Choose the min. number of components to reach a cumulative variance > r Scree plot Criterion (try to barplot eigenvalues) Stop just prior to “tailing off” Communality Criterion

PCA – Profiling the components Look at principal components: Comp. 1 is “explaining” attributes 3, 4, 5 and 6 → block group size? Comp. 2 is “explaining” attributes 7 and 8 → geography? Comp. 3 is “explaining” attribute 1 → salary? Comp. 4 ??? Compare factor scores of components 3 and 4 with attributes 1 and 2

PCA – Communality of attributes Def: communality of an (original) attribute j is the sum of squared principal component weights for that attribute. When we consider only the first p principal components: k(p,j) = corr(1,j)2 + corr(2,j)2 + … + corr(p,j)2 Interpretation: communality is the fraction of variability of an attribute “extracted” by the selected principal components Rule of thumb: communality < 0.5 is low! Experiment: compute communality for attribute 2 when 3 or 4 components are selected

PCA – Final choice of components Eigenvalue criterion did not exclude component 4 (and it tends to underestimate when number of attributes is small) Proportion of variance criterion suggests to keep component 4 Scree criterion suggests not to exceed 4 components Minimum communality suggests to keep component 4 to keep attribute 2 in the analysis → Let's keep 4 components

An alternative: user defined composites Sometimes correlation is known to the data analyst or evident from data Then, nothing forbids to aggregate attributes by hand! Example: housing median age, total rooms, total bedrooms and population can be expected to be strongly correlated as “block group size” → replace these four attributes with a new attribute, that is the average of them (possibly after normalization) Xm+1i = (X1i + X2i + X3i + X4i) / 4