Xuhua Xia Slide 1 Correlation Simple correlation –between two variables Multiple and Partial correlations –between one variable and a set of other variables Canonical Correlation –between two sets of variables each containing more than one variable. Simple and multiple correlations are special cases of canonical correlation. Multiple: x 1 on x 2 and x 3 Partial: between X and Y with Z being controlled for
Xuhua Xia Slide 2 Review of correlation XZY Compute Pearson correlation coefficients between X and Z, X and Y and Z and Y. Compute partial correlation coefficient between X and Y, controlling for Z (i.e., the correlation coefficient between X and Y when Z is held constant), by using the equation in the previous slide. Run R to verify your calculation: install.packages("ggm") library(ggm) md<-read.table("XYZ.txt",header=T) cor(md) s<-var(md) parcor(s) install.packages("psych") library(psych) smc(s)
Data for canonical correlation Xuhua Xia Slide 3 # First three variables: physical # Last three variables: exercise # Middle-aged men weightwaistpulsechinssitupsjumps
Xuhua Xia Slide 4 Many Possible Correlations With multiple DV’s (say A, B, C) and IV’s (say a, b, c, d, e), there could be many correlation patterns: –Variable A in the DV set could be correlated to variables a, b, c in the IV set –Variable B in the DV set could be correlated to variables c, d in the IV set –Variable C in the DV set could be correlated to variables a, c, e in the IV set With these plethora of possible correlated relationships, what is the best way of summarizing them?
Xuhua Xia Slide 5 Dealing with Two Sets of Variables The simple correlation approach: –For N DV’s and M IV’s, calculate the simple correlation coefficient between each of N DV’s and each of M IV’s, yielding a total of N*M correlation coefficients The multiple correlation approach: –For N DV’s and M IV’s, calculate multiple or partial correlation coefficients between each of N DV’s and the set of M IV’s, yielding a total of N correlation coefficients The canonical correlation Note: All these deal with linear correlations
Correlation matrix Xuhua Xia Slide 6 md<-read.table("Cancor.txt",header=T) attach(md) R<-cor(md) R weight waist pulse chins situps jumps weight waist pulse chins situps jumps
Multiple correlations Slide 7 fit<-lm(weight~chins+situps+jumps);summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) e-11 *** chins situps ** jumps Multiple R-squared: , Adjusted R-squared: fit<-lm(waist~chins+situps+jumps);summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) e-15 *** chins situps *** jumps Multiple R-squared: , Adjusted R-squared: fit<-lm(pulse~chins+situps+jumps);summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) e-07 *** chins situps jumps Multiple R-squared: , Adjusted R-squared:
Multiple correlation Slide 8 fit<-lm(chins~weight+waist+pulse);summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) * weight waist * pulse Multiple R-squared: , Adjusted R-squared: fit<-lm(situps~weight+waist+pulse);summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) e-06 *** weight waist ** pulse Multiple R-squared: , Adjusted R-squared: fit<-lm(jumps~weight+waist+pulse);summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) weight waist pulse Multiple R-squared: , Adjusted R-squared:
Canonical correlation (cc) install.packages("ggplot2") install.packages("Ggally") install.packages("CCA") install.packages("CCP") require(ggplot2) require(GGally) require(CCA) require(CCP) phys<-md[,1:3] exer<-md[,4:6] matcor(phys,exer) cc1<-cc(phys,exer) cc1
cc output [1] $xcoef [,1] [,2] [,3] weight waist pulse $ycoef [,1] [,2] [,3] chins situps jumps canonical correlations raw canonical coefficients matrices: U and V phys*U: raw canonical variates for phys exer*V: raw canonical variates for exer $scores$xscores: standardized canonical variates.
standardized canonical variates $scores$xscores [,1] [,2] [,3] [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] $scores$yscores [,1] [,2] [,3] [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,]
Canonical structure: Correlations $scores$corr.X.xscores [,1] [,2] [,3] weight waist pulse $scores$corr.Y.xscores [,1] [,2] [,3] chins situps jumps $scores$corr.X.yscores [,1] [,2] [,3] weight waist pulse $scores$corr.Y.yscores [,1] [,2] [,3] chins situps jumps correlation between phys variables with CVs_U correlation between exer variables with CVs_U correlation between phys variables with CVs_V correlation between exer variables with CVs_V
Significance: p.asym in CCP vCancor<-cc1$cor # p.asym(rho,N,p,q, tstat = "Wilks|Hotelling|Pillai|Roy") p.asym(vCancor,length(md$weight),3,3, tstat = "Wilks") Wilks' Lambda, using F-approximation (Rao's F): stat approx df1 df2 p.value 1 to 3: to 3: to 3: plt.asym(res,rhostart=1) plt.asym(res,rhostart=2) plt.asym(res,rhostart=3) At least one cancor significant? Significant relationship after excluding cancor 1? Significant relationship after excluding cancor 1 and 2?
Slide 14 Ecology data: Assignment # 24 sites; for each site, record coverage of four species and concentration of four chemicals