Download presentation
Presentation is loading. Please wait.
1
Correlations & Regression Modelling
2
Correlation Correlation is a statistical technique that show whether and how strongly pairs of variables are related (height and weight are related). Like all statistical techniques, correlation is only appropriate for certain kinds of data. Branch of statistics that looks at the relationship between two data sets
3
Correlation Pearson correlation coefficient specifically addresses linear relationships It ranges from -1 to 1. The closer r is to 1 or -1, the more closely the two variables are related. If r is close to 0, it means there is no relationship between the variables. If r is positive, it means that as some variables gets larger, the other gets larger too. If r is negative, it means that as one gets larger, the other gets smaller Inverse correlation) π= π=1 π (π₯ π β π₯ )(π¦ π β π¦ ) π=1 π (π₯ π β π₯ ) π=1 π (π¦ π β π¦ )2
4
Correlation A correlation report can also show a second result of each test (statistical significance). Significance level will tell you how likely it is that the reported correlation may be due to chance in the form of random sampling error. alpha = r2
5
Correlation Error ο to assume a correlation means that a change in one variable causes a change in another ο Correlation doesnβt imply causation.
6
Correlation βCorrelation does not imply causationβ
Source:
7
Example. Correlation between Tree Stumps and Beetle Larvae.
Is there a linear relationship between the number of tree stumps left behind by beavers and the number of beetle larvae?. Researchers laid out 10 circular plots, each 4 meters diameter, in an area where beavers were cutting down cottonwood trees. The number of stumps and the number of clusters of beetle were recorded in each plot with the following results. Stumps (x) Beetle Larvae (y) 2 10 30 1 12 3 24 4 40 11 5 56 8 14
8
Example. Correlation between Tree Stumps and Beetle Larvae.
Stumps (x) Beetle Larvae (y) x2 y2 x*y 2 10 4 100 20 30 900 60 1 12 144 3 24 9 576 72 40 16 1600 160 11 121 5 56 25 3136 280 120 8 64 14 196 28 245 74 8437 771
9
Example. Correlation between Tree Stumps and Beetle Larvae
π= π=1 π (π₯ π β π₯ )(π¦ π β π¦ ) π=1 π (π₯ π β π₯ ) π=1 π (π¦ π β π¦ )2 r = 0.92 (0.92)2 = ο The variability in the number of tree stumps explains about 84% of the variability in the number of clusters of beetle larvae.
10
Example. Correlation between Tree Stumps and Beetle Larvae
Resolution in R x= c(2,2,1,3,4,1,5,3,1,2) y= c(10, 30, 12, 24, 40, 11, 56, 40, 8, 14) a=cor(x, y, method = c("pearson")) a
11
Correlation Interpretation and Covariance Matrix
Calculation of Covariance Matrix 2. Calculation of Covariance 3. Calculation of Correlation
12
Covariance Matrix
13
Covariance Matrix Jochen Triesch, UC San Diego,
14
Covariance Matrix (example)
We want to calculate and interpret the covariance and correlation between the height and the weight of a group of people: P1 P2 P3 Height (cm) 180 156 170 Weight (kg) 86 54 70
15
1. If we treat Height and weight as independent samples, we can calculate their Variances:
16
2. If we extend the idea of variance to two dimensions, it gives a 2x2 matrix with the total information about the random vector.
17
Ο 1 2 Ο 21 Ο 12 Ο 2 2 Covariance Matrix
Ο Ο 21 Ο 12 Ο 2 2 The variances are located in the main diagonal of the matrix. The elements besides the main diagonal are called covariances.
18
Covariance Matrix The covariance matrix is the basis for all later considerations concerning accuracy. The covariance matrix is always symmetric.
19
Covariance The covariance measures the linear relationship between two variables.
20
Correlation A correlation coefficient measures the degree to which two variables tend to change at the same time. The coefficient describes both the strength and the direction of the relationship.
21
Correlation The correlation coefficient depends on the covariance. The correlation coefficient is equal to the covariance divided by the product of the standard deviations of the variables. Therefore, a positive covariance will always produce a positive correlation and a negative covariance will always generate a negative correlation.
22
Example x= [2,2,1,3,4,1,5,3,1,2] y= [10, 30, 12, 24, 40, 11, 56, 40, 8, 14] n=10 1. Calculate the Covariance Matrix 2. Calculate Covariance Coefficient sample 3. Calculate correlation coefficient
23
R mdata=matrix(c(2,2,1,3,4,1,5,3,1,2,10, 30, 12, 24, 40, 11, 56, 40, 8, 14),ncol=10,nrow=2, byrow=TRUE) c1=mdata[1,] c2=mdata[2,] A= c1-mean(c1) B= c2 - mean(c2) N=rbind(A, B) COV= t(N)%*% N cov2=(t(A)%*% B)/9 cov(c1, c2) cor(c1,c2) cor2= cov2/(sqrt(var(c1))*sqrt(var(c2)))
24
Exercise 1 Using the internal R Data set βtreesβ.
We will look at whether volume, height and girth of trees are correlated. Weβll plot the data first. Does the data appear to show a correlation?. Which relationship appears to be the strongest?. Manually calculate the Pearsons correlation coefficient for one of these relationships. Use R to check the correlation between all sets.
25
Reference material https://en.wikipedia.org/wiki/Linear_model
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.