Descriptive Analysis and PCA Hervé Abdi The university of Texas at Dallas Dominique Valentin ENSBANA/CESG

Slides:



Advertisements
Similar presentations
Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Advertisements

Benefits Key Features and Results. XLSTAT-ADA’s functions.
Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Factor Analysis Continued
Chapter Nineteen Factor Analysis.
Descriptive analysis1 Dominique Valentin ENSBANA/CESG Université de Bourgogne
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Lecture 7: Principal component analysis (PCA)
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Principal Component Analysis
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
Factor Analysis There are two main types of factor analysis:
Principal component analysis (PCA)
1 Carrying out EFA - stages Ensure that data are suitable Decide on the model - PAF or PCA Decide how many factors are required to represent you data When.
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Principal component analysis (PCA) Purpose of PCA Covariance and correlation matrices PCA using eigenvalues PCA using singular value decompositions Selection.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Summarized by Soo-Jin Kim
Key Features and Results
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
1 Dimension Reduction Examples: 1. DNA MICROARRAYS: Khan et al (2001): 4 types of small round blue cell tumors (SRBCT) Neuroblastoma (NB) Rhabdomyosarcoma.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
PLS Regression Hervé Abdi The university of Texas at Dallas
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Lecture 12 Factor Analysis.
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis (PCA)
Department of Cognitive Science Michael Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Factor Analysis 1 PSYC 4310 Advanced Experimental.
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Principal Component Analysis
Feature Extraction 主講人:虞台文.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Principal Component Analysis
Factor and Principle Component Analysis
Principal Component Analysis (PCA)
Mini-Revision Since week 5 we have learned about hypothesis testing:
EXPLORATORY FACTOR ANALYSIS (EFA)
Exploring Microarray data
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Principal Component Analysis
Measuring latent variables
Quality Control at a Local Brewery
Principal Component Analysis (PCA)
Measuring latent variables
Descriptive Statistics vs. Factor Analysis
Measuring latent variables
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
Covariance Vs Correlation Matrix
Principal Components Analysis
Principal Component Analysis
Chapter_19 Factor Analysis
Factor Analysis (Principal Components) Output
Principal Component Analysis
Measuring latent variables
Presentation transcript:

Descriptive Analysis and PCA Hervé Abdi The university of Texas at Dallas Dominique Valentin ENSBANA/CESG

Back to the yogurt example Texture Thickness: consistency of the mass in the mouth Rate of Melt: amount of product melted after a certain pressure of the tongue Graininess: amount of particle in mass Mouth coating: amount of film left on the mouth surfaces Basic tastes Sweet: Sucrose Sour: lactic acid Bitter: caffeine Salty: sodium chloride Arôme Water: taste like water downFlour: 1 spoon of flavor mixed in water Wood: cutting from pencil sharpeningChalk: smecta Milk: whole milkRaw pie crust: commercial raw pie crust Cream: crème fraicheHazelnut: : hazelnut powder earthy: earthMushroom: dry mushrooms soaked in water

9 panélistes 5 yogurts: 2 cow milk yogurts 3 soy yogurts Pas du tout Très Amer Pas du tout Très Salé Pas du tout Astringent Back to the yogurt example

Texture Farineux - Flour 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne ab a bb leaderprice Épais – thickness 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone Intensité moyenne bc a ab d Gras – Mouth coating 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne b ab a Fondant - melt 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne abc c ab Back to the yogurt example

astringent 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne Taste Sucré - Sweet 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne ab leaderprice Acide - Sour 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone Intensité moyenne cd bc a Amer - Bitter 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne a a aa a a abc c Back to the yogurt example

Aroma Farine - flour 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne Craie - chalk 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne Crème - cream 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne c abc d d a bb bb Noisette - Hazelnut 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne ab a b c cc a c Back to the yogurt example

Facteur % Facteur % soja bio soja champion Soja leaderprice Soja carrefour Soja bifidus Soja sun sojade Soja délice carrefour velouté danone danone bifidus Leader price A solution: Principal Component Analysis

A statistical technique used to transform a number of correlated variables into a smaller number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible The mathematical technique used in PCA is called eigen analysis What is PCA ?

When to use PCA ? To analyze 2 dimensional data tables describing I observations with J quantitative variables 1 … j … J 1...i...I1...i...I y ij …... ……... Variables Observations

Why using PCA ? 1.To evaluate the similarity between the observations, here the products 2.to detect structure in the relationships between variables, here the descriptors 3.to reduce the number of variables to allow for a graphical representation of the data To give a synthetic description of the products

General principle of PCA 1 … j … J 1...i...I1...i...I y ij …... ……... Variables Observations PC 1.. PC k.. PC K 1...i...I1...i...I Cp ik …... ……... Principal components Diagonalization or eigen analysis Cp 1 PC 2 PC 1 PC 2 Circle of correlations Projection of observations + + +

A baby example: wine profile Amber Black currentCoconutLeatherMusc Goose berryWoodyVanillaRasberry v v v v v v v v v v v v

A baby example: wine profile

How to find the principal components? Step 1: get some data Step 2: subtract the means of the variables Step 3: find the eigenvectors and eigenvalues of the covariance matrix Step 4: find the principal components by projecting the observations onto the eigenvectors Step 5: compute the loading as the correlation between the original variables and the principal components

A 2D example: step 1 get the data 20 words : Variable 1 = number of letters Variable 2 = number of lines used to define the words in the dictionary.

A 2D example: step 1 get the data

A 2D example: step 2 subtract the mean Y = “length of words ” M Y = 6 y = (Y −M Y ) W = “number of lines of the definition” M W = 8 w = (W −M W )

A 2D example: step 2 subtract the mean

A 2D example: step 3 find the eigenvectors

A 2D example: project the observations

A 2D example: compute the loadings r (W, F 1 ) = 0.97 Pearson correlation coefficient

A 2D example: compute the loadings r (W, F 2 ) = 0.23 Pearson correlation coefficient

A 2D example: compute the loadings r (Y, F 1 ) = Pearson correlation coefficient

A 2D example: compute the loadings r (Y, F 2 ) = 0.50 Pearson correlation coefficient

A 2D example: draw the circle of correlation r (W, F 1 ) = 0.97 r (W, F 2 ) = 0.23 r (Y, F 1 ) = r (Y, F 2 ) = 0.50

How to compute the explained variance ? Eigenvalue% varianceCumulated % variance X 100 = 88%

How many components to keep The Kaiser criterion. retain only composante with eigenvalues greater than 1. The scree test. Common sens. Keep dimensions that are interpretable. Examines several solutions and chooses the one that makes the best "sense." 0 0,5 1 1,5 2 2,5 3 3,

Should I normalize the data Yes if they are not measured on the same scale Otherwise it depends: Normalized: same weight for all variables Not normalized: weight proportional to standard deviation