Download presentation
Presentation is loading. Please wait.
1
Factor and Principle Component Analysis
2
Combine correlated variables
If X and Y are strongly correlated, don’t need both- redundant r = 0.94
3
Combine correlated variables
If X and Y are strongly correlated, should use X or Y? Can make a new variable “F” F = a X + b Y + error Most of the variation (information) in X and Y is now contained in F. One variable instead of two. The stronger the correlation, the more “F” contains all the information.
4
Standardized data Since we are only interested in the relative differences and correlations among the data, it is easier to work with the standardized data. If “X” is the original variable, we compute Z = (X – mean X) / SD(x) Z has overall mean 0 and SD=1 For Z, the correlation and covariances are the same. If each Z has variance =SD2=1, so if there are “K” variables, their total variance is K.
5
Consolidating many correlated variables
6
Correlation matrix, K=9 A G C B I D E F H 1 -0.215 0.912 0.936 -0.189
0.920 -0.199 -0.250 -0.177 0.884 -0.239 -0.242 0.856 0.855 0.908 -0.192 0.923 -0.212 -0.227 -0.129 0.911 0.9118 -0.147 -0.148 -0.202 -0.184 0.870 0.843 0.903 -0.235 -0.234 -0.206 -0.224 0.846
7
Sorted Correlation matrix
B C D E F G H I 1 0.936 0.912 0.920 -0.199 -0.215 -0.189 0.908 0.911 -0.147 -0.177 -0.148 -0.129 0.923 -0.212 -0.250 -0.227 -0.192 0.903 -0.235 -0.239 -0.234 -0.202 -0.206 -0.242 -0.224 -0.184 0.856 0.846 0.870 0.855 0.884 0.843
8
Heat map
9
Make K factors – keep most important
Initially, if have K variables, we make K factors where each “factor” is uncorrelated (orthogonal) with the others. The factor with the largest variance (also called the “eigenvalue”) is denoted “factor 1” and has the most “information”. The factor with the next largest variance is factor 2 etc. Keep the factors whose variance is larger than 1.0 – or examine scree plot.
10
Make K factors, K=9 Factor 1 = a11 X1 + a12 X2 + a13 X3 + … a19 X9 …
The aij values (weights) are chosen so the K factors are mutually orthogonal. Can compute variance (and SD) of each factor. Means are zero by definition. Note that this assumes linearity!
11
Eigenvalues = factor variances
12
Eigenvalues (variance accounted for) Scree plot
factor variance Percent Cum Percent 1 5.185 57.61 2 3.071 34.12 91.73 3 0.168 1.87 93.60 4 0.152 1.69 95.29 5 0.119 1.32 96.61 6 0.094 1.04 97.65 7 0.090 1.00 98.65 8 0.070 0.78 99.43 9 0.051 0.57 100.0 total 9.000 100.00 --
13
Make two factors variable Factor 1 Factor 2 A 0.964 -0.107 B 0.958
Rotated factor loadings variable Factor 1 Factor 2 A 0.964 -0.107 B 0.958 -0.053 C 0.944 -0.130 D 0.949 -0.137 E 0.945 -0.123 F -0.104 0.917 G -0.128 0.929 H -0.112 0.902 I -0.081 0.936
14
Factor loadings Factor 1=0.964 A B C D E + error Factor 2 =0.917 F G H I + error Coefficients are (approximately) the correlation of the variable with the factor. For example, is (approximately) the correlation of A with Factor 1. total variance accounted for by factors factor variance pct cum pct % 50.9% % 89.3%
15
Factors are uncorrelated (orthogonal) with each other
Factors are uncorrelated (orthogonal) with each other. They represent non redundant information
16
Communalities How much of the variation in each variable is accounted for by the factor(s) – similar to R2. variable value A 0.940 B 0.921 C 0.908 D 0.919 E 0.907 F 0.852 G 0.878 H 0.826 I 0.883
17
WGCNA- Weighted gene co-expression network analysis – Horvath (UCLA)
18
Factors can have factors
19
Power adjacency function results in a weighted gene network
Often choosing beta=6 works well but in general we use the “scale free topology criterion” described in Zhang and Horvath 2005.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.