Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.

Slides:



Advertisements
Similar presentations
Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Advertisements

Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
An Introduction to Multivariate Analysis
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Chapter Nineteen Factor Analysis.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Lecture 7: Principal component analysis (PCA)
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
Factor Analysis There are two main types of factor analysis:
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Tables, Figures, and Equations
Techniques for studying correlation and covariance structure
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.
Principal Components Analysis (PCA). a technique for finding patterns in data of high dimension.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Chapter 3 Data Exploration and Dimension Reduction 1.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 22.
Lecture 12 Factor Analysis.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
Education 795 Class Notes Factor Analysis Note set 6.
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis (PCA)
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
Chapter 4 –Dimension Reduction Data Mining for Business Analytics Shmueli, Patel & Bruce.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Principal Component Analysis
Feature Extraction 主講人:虞台文.
Principal Components Analysis ( PCA)
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Unsupervised Learning II Feature Extraction
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
EXPLORATORY FACTOR ANALYSIS (EFA)
Exploring Microarray data
Factor analysis Advanced Quantitative Research Methods
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Principal Component Analysis
Descriptive Statistics vs. Factor Analysis
X.1 Principal component analysis
M248: Analyzing data Block D UNIT D2 Regression.
Principal Components Analysis
Principal Component Analysis (PCA)
Principal Component Analysis
Chapter_19 Factor Analysis
Principal Component Analysis
The Numerology of T Cell Functional Diversity
MGS 3100 Business Analysis Regression Feb 18, 2016
Marios Mattheakis and Pavlos Protopapas
Unsupervised Learning
Chapter 4 –Dimension Reduction
Presentation transcript:

Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by specific indicators. This provides a valuable insight but it is not sufficient to capture existing underlying phenomena nor to unfold the problem in a comprehensive manner.

Factor Analysis Many statistical methods are used to study the relation between independent and dependent variables. There is a domain of statistics that deals with this type of data analysis which is called factor or multivariate analysis.

Factor Analysis The purpose of factor analysis is to discover simple patterns in the network of relationships among variables. In particular, it seeks to discover if the observed variables can be explained largely or entirely in terms of a much smaller number of variables called factors.

Factor Analysis The multivariate statistical analysis provides several techniques to analyze continuous and categorical variables, to capture the essence of their relationship and build new indicators (i.e. factors or principal components) conveying this relationship.

Principal Component Analysis The objectives of a PCA are: To discover or reduce the dimensionality of the data set; To identify new meaningful underlying variables.

Principal Component Analysis Assume to have two correlated variables on a scatter-plot. A regression line can be fitted to represent the linear relationship between the two variables.

Principal Component Analysis If we could define a variable that would approximate the regression line on the plot, then the new variable would capture most of the essence of the two variables. Individual single scores on the new factor can be used to represent the essence of the two variables. The new factor is thus a linear combination of the two variables. We have reduced the two variables to one factor. The Principal Components are calculated from the correlation matrix.

Principal Component Analysis Graphically, the first principal component lies along the line of greatest variation and it is as close to all of the data as possible (red line).

Principal Component Analysis The second PCA axis also must be completely uncorrelated i.e. at right angles, or "orthogonal" to PCA axis 1 (green line).orthogonal PC1 PC2

Principal Component Analysis In a typical PCA however, there are more than two variables i.e more than two dimensions. If the second principal component will be both perpendicular to the first, and along the line of second next greatest variation. The third principal component will be along the line of the following greatest variation and perpendicular to the first two principal components. The same applies to the N dimensions under analysis. With several variables the computation is more complicated but the basic principle to express two or more variables by a single factor remains the same.

Principal Component Analysis By multiplying the original data-set by the principal components, the data is rotated so that the components form the new perpendicular axes and the objects lying exactly on the axes have now only one coordinate, i.e. are captured by one variable only. The most common use for PCA is in fact to reduce the dimensionality of the data while retaining the most information.

Principal Component Analysis PCA helps to simultaneously envision all variables selected to describe a statistical unit (e.g. a district, a household, etc). The PCA process intends to identify the maximum data variability and project onto new orthogonal axes. PCA takes the cloud of data points and rotates it such that maximum variability is visible.

Principal Component Analysis New factors, e.g. principal components are created by rotating the data plotted on orthogonal axes. So doing, PCA helps to determine whether there is/are hidden factors/components along which the data vary. It computes a compact and optimal description of the data set.

Principal Component Analysis Before rotating the data cloud, the PCA standardize the data by subtracting the mean and dividing by the standard deviation. Thus the centroid of the whole data set is zero. By standardizing we give all variables the same variation, i.e. standard deviation of 1. When we use variables measured in different units we must do it.

Principal Component Analysis Basically a PCA transforms a set of more or less correlated variables into a set of uncorrelated variables which are ordered by reducing variability. The uncorrelated variables are linear combinations of the original variables and the last of these variables can be removed with minimum loss of real data.

Principal Component Analysis If we have more than three initial variables, how do we determine how many axes we worth interpreting. This is left to the analyst, however the eigenvalues calculated by the PCA give us a big hint.

Principal Component Analysis Every axis has an eigenvalue associated with it, that is the variance extracted by the factor. They are ranked from the highest to the lowest and their level is related to the amount of variation explained by the axis. The sum of the eigenvalues is the number of variables. Usually, the eigenvalues are expressed as a percentage of the total.

Principal Component Analysis Example: PCA Axis 1: eigenvalue 63% PCA Axis 2: eigenvalue 33% PCA Axis 3: eigenvalue 4% In this example the first PCA axis explains 63% (about 2/3) of the variation of the entire data set and the second axis almost all the remaining variation. Axis 3 explain a trivial amount and can be dropped.