Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation coefficients and simple linear regression Chapter 7.1 ~ 7.6.

Similar presentations


Presentation on theme: "Correlation coefficients and simple linear regression Chapter 7.1 ~ 7.6."— Presentation transcript:

1 Correlation coefficients and simple linear regression Chapter 7.1 ~ 7.6

2 1. Correlation coefficients Pearson Spearman 2. Simple linear regression 3. Generation of correlated random numbers Visualization Contents

3 Example

4 1.1. Pearson correlation coefficient Pearson's correlation coefficient (r) between two variables is defined as the covariance of the two variables X i and Y i divided by the product of their standard deviations. i-th value in Xi-th value in Ymean of Xmean of Y http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

5 my.cor <- function(x,y){ mx <- mean(x) my <- mean(y) Sxx <- sum((x-mx)^2) Syy <- sum((y-my)^2) Sxy <- sum((x-mx)*(y-my)) r <- Sxy/sqrt(Sxx)/sqrt(Syy) return(r) } #cor(x,y) 1.2. Pearson correlation coefficient Example: x <- c(26,25,23,27,28,25,22,26,25,23) y <- c(54,62,51,58,63,65,59,63,65,60)

6 The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the ranked variables. For a sample of size n, the n raw scores X i, Y i are converted to ranks x i, y i, and ρ is computed from these: http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient 1.3. Spearman correlation coefficient

7 my.cor2 <- function(x,y,method=c("pearson","spearman")){ method <- match.arg(method) if (method == "spearman") { x <- rank(x) y <- rank(y) } mx <- mean(x) my <- mean(y) Sxx <- sum((x-mx)^2) Syy <- sum((y-my)^2) Sxy <- sum((x-mx)*(y-my)) r <- Sxy/sqrt(Sxx)/sqrt(Syy) return(r) } 1.4. Spearman correlation coefficient

8 2.1. Simple linear regression Simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible.

9 2.2. Simple linear regression Suppose there are n data points {y i, x i }, where i = 1, 2, …, n. The goal is to find the equation of the straight line that minimizes the sum of squared residuals of the linear regression model. In other words, numbers a and b solve the following minimization problem:

10 2.3. Simple linear regression Partial derivative of  provides equations to get a and b that minimize the sum of squared residuals.

11 2.4. Simple linear regression my.reg <- function(x,y){ sx2 <- sum(x^2) sx <- sum(x) sxy <- sum(x*y) sy <- sum(y) A <- matrix(c(sx2,sx,sx,length(x)),2,2) B <- matrix(c(sxy,sy)) v <- solve(A,B) return(v) } #lm(y~x)

12 3.1. Generation of correlated random numbers Generating two sequences of random numbers with a given correlation is done in two simple steps: 1. Generate two sequences of uncorrelated normal distributed random numbers 2. Define a new sequence This new sequence Z will have a correlation of  with the sequence X.

13 3.2. Generation of correlated random numbers r2norm <- function(n=100, rho=0.5) { x <- rnorm(n) y <- rho*x + sqrt(1-rho^2)*rnorm(n) return(data.frame(x=x,y=y)) } http://cse.naro.affrc.go.jp/takezawa/r-tips/r/60.html

14 3.3. Scatter plot and histgrams scatterplot2<-function (x, y,...) { def.par<-par(no.readonly = TRUE) n<-length(x) xhist<-hist(x, sqrt(n), plot = FALSE) yhist<-hist(y, sqrt(n), plot = FALSE) top<-max(c(xhist$counts, yhist$counts)) xrange<-c(min(x), max(x)) yrange<-c(min(y), max(y)) nf<-layout(matrix(c(2, 0, 1, 3), 2, 2, TRUE), c(3, 1), c(1, 3), TRUE) par(mar = c(3, 3, 1, 1)) plot(x, y, xlim = xrange, ylim = yrange, xlab = "x", ylab = "y",...) #abline(lm(y ~ x)) par(mar = c(0, 3, 1, 1)) barplot(xhist$counts, axes = FALSE, ylim = c(0, top), space = 0, col = gray(0.95)) par(mar = c(3, 0, 1, 1)) barplot(yhist$counts, axes = FALSE, xlim = c(0, top), space = 0, col = gray(0.95), horiz = TRUE) par(def.par) }

15 r = 0.7 3.4. Scatter plot and histgrams

16 3.5. 3D histgram res<-r2norm(100000) library(gregmisc) h2d <- hist2d(res$x, res$y, show=FALSE, same.scale=TRUE, nbins = c(50,50)) Z <- h2d$counts/max(h2d$counts) X <- h2d$x Y <- h2d$y library(rgl) zlim <-c(0,max(h2d$counts)) zlen <- zlim[2] - zlim[1] + 1 colorlut <- cm.colors(zlen) # height color lookup table col <- colorlut[(Z-zlim[1]+1)] # assign colors to heights for each point open3d() surface3d(X, Y, 5*Z, color=col,alpha=0.65, back="lines") axes3d(c('x','y','z-')) grid3d(c("x", "y+", "z"),at = NULL, col=c("gray","gray","gray","gray","gray","gray","gray","gray","gray", "gray","gray","gray","gray","gray"), n =8,lwd = 1, lty = "solid") title3d(main = “Correlation”, sub = "", xlab = "X", ylab = "Y", zlab = "") rgl.bringtotop()


Download ppt "Correlation coefficients and simple linear regression Chapter 7.1 ~ 7.6."

Similar presentations


Ads by Google