Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive Analysis and PCA Hervé Abdi The university of Texas at Dallas Dominique Valentin ENSBANA/CESG

Similar presentations


Presentation on theme: "Descriptive Analysis and PCA Hervé Abdi The university of Texas at Dallas Dominique Valentin ENSBANA/CESG"— Presentation transcript:

1 Descriptive Analysis and PCA Hervé Abdi The university of Texas at Dallas herve@utdallas.edu Dominique Valentin ENSBANA/CESG valentin@u-bourgogne.fr

2 Back to the yogurt example Texture Thickness: consistency of the mass in the mouth Rate of Melt: amount of product melted after a certain pressure of the tongue Graininess: amount of particle in mass Mouth coating: amount of film left on the mouth surfaces Basic tastes Sweet: Sucrose Sour: lactic acid Bitter: caffeine Salty: sodium chloride Arôme Water: taste like water downFlour: 1 spoon of flavor mixed in water Wood: cutting from pencil sharpeningChalk: smecta Milk: whole milkRaw pie crust: commercial raw pie crust Cream: crème fraicheHazelnut: : hazelnut powder earthy: earthMushroom: dry mushrooms soaked in water

3 9 panélistes 5 yogurts: 2 cow milk yogurts 3 soy yogurts Pas du tout Très Amer Pas du tout Très Salé Pas du tout Astringent Back to the yogurt example

4 Texture Farineux - Flour 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne ab a bb leaderprice Épais – thickness 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone Intensité moyenne bc a ab d Gras – Mouth coating 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne b ab a Fondant - melt 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne abc c ab Back to the yogurt example

5 astringent 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne Taste Sucré - Sweet 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne ab leaderprice Acide - Sour 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone Intensité moyenne cd bc a Amer - Bitter 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne a a aa a a abc c Back to the yogurt example

6 Aroma Farine - flour 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne Craie - chalk 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne Crème - cream 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne c abc d d a bb bb Noisette - Hazelnut 0,00 2,00 4,00 6,00 8,00 10,00 soja carrefour sojasunsojadevelouté danone leaderprice Intensité moyenne ab a b c cc a c Back to the yogurt example

7 -4.5-3.0-1.501.53.0 -2 0 1 2 Facteur 1 - 61.04 % Facteur 2 - 17.84 % soja bio soja champion Soja leaderprice Soja carrefour Soja bifidus Soja sun sojade Soja délice carrefour velouté danone danone bifidus Leader price A solution: Principal Component Analysis

8 A statistical technique used to transform a number of correlated variables into a smaller number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible The mathematical technique used in PCA is called eigen analysis What is PCA ?

9 When to use PCA ? To analyze 2 dimensional data tables describing I observations with J quantitative variables 1 … j … J 1...i...I1...i...I y ij …... ……... Variables Observations

10 Why using PCA ? 1.To evaluate the similarity between the observations, here the products 2.to detect structure in the relationships between variables, here the descriptors 3.to reduce the number of variables to allow for a graphical representation of the data To give a synthetic description of the products

11 General principle of PCA 1 … j … J 1...i...I1...i...I y ij …... ……... Variables Observations PC 1.. PC k.. PC K 1...i...I1...i...I Cp ik …... ……... Principal components Diagonalization or eigen analysis Cp 1 PC 2 PC 1 PC 2 Circle of correlations Projection of observations + + +

12 A baby example: wine profile Amber Black currentCoconutLeatherMusc Goose berryWoodyVanillaRasberry v17.0003.0001.0006.0009.0003.0001.0000.0002.000 v20.0005.0001.000 0.0007.0000.0001.0006.000 v31.0009.0000.000 6.0001.000 5.000 v41.0006.0007.0000.0001.0006.0004.0006.0004.000 v56.0001.0008.0005.0004.0002.0005.000 1.000 v61.0006.0005.0001.0000.0005.000 7.0006.000 v77.0003.0001.0006.0008.0002.0001.0000.0002.000 v86.0003.0000.0005.000 3.0001.000 3.000 v90.0004.000 1.0000.0007.0006.0005.000 v104.0002.0006.0005.0006.0002.0005.0007.0001.000 v115.0001.0004.0006.0007.0001.0006.0007.0002.000 v121.0006.0000.0001.0000.0005.0000.0001.0008.000

13 A baby example: wine profile

14

15 How to find the principal components? Step 1: get some data Step 2: subtract the means of the variables Step 3: find the eigenvectors and eigenvalues of the covariance matrix Step 4: find the principal components by projecting the observations onto the eigenvectors Step 5: compute the loading as the correlation between the original variables and the principal components

16 A 2D example: step 1 get the data 20 words : Variable 1 = number of letters Variable 2 = number of lines used to define the words in the dictionary.

17 A 2D example: step 1 get the data

18 A 2D example: step 2 subtract the mean Y = “length of words ” M Y = 6 y = (Y −M Y ) W = “number of lines of the definition” M W = 8 w = (W −M W )

19 A 2D example: step 2 subtract the mean

20 A 2D example: step 3 find the eigenvectors

21

22 A 2D example: project the observations

23

24 A 2D example: compute the loadings r (W, F 1 ) = 0.97 Pearson correlation coefficient

25 A 2D example: compute the loadings r (W, F 2 ) = 0.23 Pearson correlation coefficient

26 A 2D example: compute the loadings r (Y, F 1 ) = -0.87 Pearson correlation coefficient

27 A 2D example: compute the loadings r (Y, F 2 ) = 0.50 Pearson correlation coefficient

28 A 2D example: draw the circle of correlation r (W, F 1 ) = 0.97 r (W, F 2 ) = 0.23 r (Y, F 1 ) = -0.87 r (Y, F 2 ) = 0.50

29 How to compute the explained variance ? Eigenvalue% varianceCumulated % variance 392 8888 5212 100 444 392 444 X 100 = 88%

30 How many components to keep The Kaiser criterion. retain only composante with eigenvalues greater than 1. The scree test. Common sens. Keep dimensions that are interpretable. Examines several solutions and chooses the one that makes the best "sense." 0 0,5 1 1,5 2 2,5 3 3,5 4 12345678

31 Should I normalize the data Yes if they are not measured on the same scale Otherwise it depends: Normalized: same weight for all variables Not normalized: weight proportional to standard deviation


Download ppt "Descriptive Analysis and PCA Hervé Abdi The university of Texas at Dallas Dominique Valentin ENSBANA/CESG"

Similar presentations


Ads by Google