Correlation coefficients and simple linear regression Chapter 7.1 ~ 7.6.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Statistical Techniques I EXST7005 Simple Linear Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Correlation 2 Computations, and the best fitting line.
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Introduction to Probability and Statistics Linear Regression and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation Coefficients Pearson’s Product Moment Correlation Coefficient  interval or ratio data only What about ordinal data?
Chapter 21 Correlation. Correlation A measure of the strength of a linear relationship Although there are at least 6 methods for measuring correlation,
Lecture 5 Correlation and Regression
Lecture 16 Correlation and Coefficient of Correlation
Equations in Simple Regression Analysis. The Variance.
Correlation and Regression
Section #6 November 13 th 2009 Regression. First, Review Scatter Plots A scatter plot (x, y) x y A scatter plot is a graph of the ordered pairs (x, y)
CORRELATION & REGRESSION
Correlation.
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Chapter 15 Correlation and Regression
Linear Regression James H. Steiger. Regression – The General Setup You have a set of data on two variables, X and Y, represented in a scatter plot. You.
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Hypothesis of Association: Correlation
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Slide 8- 1 Copyright © 2010 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Business Statistics First Edition.
Regression Regression relationship = trend + scatter
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Chapter 6 Simple Regression Introduction Fundamental questions – Is there a relationship between two random variables and how strong is it? – Can.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Simple linear regression Tron Anders Moger
[1] Simple Linear Regression. The general equation of a line is Y = c + mX or Y =  +  X.  > 0  > 0  > 0  = 0  = 0  < 0  > 0  < 0.
Correlation & Regression Analysis
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
CORRELATION ANALYSIS.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Research Methods: 2 M.Sc. Physiotherapy/Podiatry/Pain Correlation and Regression.
Correlation.
The simple linear regression model and parameter estimation
Lecture 9 Sections 3.3 Objectives:
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Practice. Practice Practice Practice Practice r = X = 20 X2 = 120 Y = 19 Y2 = 123 XY = 72 N = 4 (4) 72.
Computations, and the best fitting line.
Correlation 10/27.
Kin 304 Regression Linear Regression Least Sum of Squares
Ch12.1 Simple Linear Regression
Correlation 10/27.
BPK 304W Regression Linear Regression Least Sum of Squares
Multiple Regression.
CHAPTER 10 Correlation and Regression (Objectives)
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Linear Regression.
6-1 Introduction To Empirical Models
Linear regression Fitting a straight line to observations.
CORRELATION ANALYSIS.
Section 2: Linear Regression.
Correlation and Regression
Ch11 Curve Fitting II.
M248: Analyzing data Block D UNIT D3 Related variables.
Product moment correlation
Ch 4.1 & 4.2 Two dimensions concept
Sleeping and Happiness
Regression and Correlation of Data
Regression and Correlation of Data
Presentation transcript:

Correlation coefficients and simple linear regression Chapter 7.1 ~ 7.6

1. Correlation coefficients Pearson Spearman 2. Simple linear regression 3. Generation of correlated random numbers Visualization Contents

Example

1.1. Pearson correlation coefficient Pearson's correlation coefficient (r) between two variables is defined as the covariance of the two variables X i and Y i divided by the product of their standard deviations. i-th value in Xi-th value in Ymean of Xmean of Y

my.cor <- function(x,y){ mx <- mean(x) my <- mean(y) Sxx <- sum((x-mx)^2) Syy <- sum((y-my)^2) Sxy <- sum((x-mx)*(y-my)) r <- Sxy/sqrt(Sxx)/sqrt(Syy) return(r) } #cor(x,y) 1.2. Pearson correlation coefficient Example: x <- c(26,25,23,27,28,25,22,26,25,23) y <- c(54,62,51,58,63,65,59,63,65,60)

The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the ranked variables. For a sample of size n, the n raw scores X i, Y i are converted to ranks x i, y i, and ρ is computed from these: Spearman correlation coefficient

my.cor2 <- function(x,y,method=c("pearson","spearman")){ method <- match.arg(method) if (method == "spearman") { x <- rank(x) y <- rank(y) } mx <- mean(x) my <- mean(y) Sxx <- sum((x-mx)^2) Syy <- sum((y-my)^2) Sxy <- sum((x-mx)*(y-my)) r <- Sxy/sqrt(Sxx)/sqrt(Syy) return(r) } 1.4. Spearman correlation coefficient

2.1. Simple linear regression Simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible.

2.2. Simple linear regression Suppose there are n data points {y i, x i }, where i = 1, 2, …, n. The goal is to find the equation of the straight line that minimizes the sum of squared residuals of the linear regression model. In other words, numbers a and b solve the following minimization problem:

2.3. Simple linear regression Partial derivative of  provides equations to get a and b that minimize the sum of squared residuals.

2.4. Simple linear regression my.reg <- function(x,y){ sx2 <- sum(x^2) sx <- sum(x) sxy <- sum(x*y) sy <- sum(y) A <- matrix(c(sx2,sx,sx,length(x)),2,2) B <- matrix(c(sxy,sy)) v <- solve(A,B) return(v) } #lm(y~x)

3.1. Generation of correlated random numbers Generating two sequences of random numbers with a given correlation is done in two simple steps: 1. Generate two sequences of uncorrelated normal distributed random numbers 2. Define a new sequence This new sequence Z will have a correlation of  with the sequence X.

3.2. Generation of correlated random numbers r2norm <- function(n=100, rho=0.5) { x <- rnorm(n) y <- rho*x + sqrt(1-rho^2)*rnorm(n) return(data.frame(x=x,y=y)) }

3.3. Scatter plot and histgrams scatterplot2<-function (x, y,...) { def.par<-par(no.readonly = TRUE) n<-length(x) xhist<-hist(x, sqrt(n), plot = FALSE) yhist<-hist(y, sqrt(n), plot = FALSE) top<-max(c(xhist$counts, yhist$counts)) xrange<-c(min(x), max(x)) yrange<-c(min(y), max(y)) nf<-layout(matrix(c(2, 0, 1, 3), 2, 2, TRUE), c(3, 1), c(1, 3), TRUE) par(mar = c(3, 3, 1, 1)) plot(x, y, xlim = xrange, ylim = yrange, xlab = "x", ylab = "y",...) #abline(lm(y ~ x)) par(mar = c(0, 3, 1, 1)) barplot(xhist$counts, axes = FALSE, ylim = c(0, top), space = 0, col = gray(0.95)) par(mar = c(3, 0, 1, 1)) barplot(yhist$counts, axes = FALSE, xlim = c(0, top), space = 0, col = gray(0.95), horiz = TRUE) par(def.par) }

r = Scatter plot and histgrams

3.5. 3D histgram res<-r2norm(100000) library(gregmisc) h2d <- hist2d(res$x, res$y, show=FALSE, same.scale=TRUE, nbins = c(50,50)) Z <- h2d$counts/max(h2d$counts) X <- h2d$x Y <- h2d$y library(rgl) zlim <-c(0,max(h2d$counts)) zlen <- zlim[2] - zlim[1] + 1 colorlut <- cm.colors(zlen) # height color lookup table col <- colorlut[(Z-zlim[1]+1)] # assign colors to heights for each point open3d() surface3d(X, Y, 5*Z, color=col,alpha=0.65, back="lines") axes3d(c('x','y','z-')) grid3d(c("x", "y+", "z"),at = NULL, col=c("gray","gray","gray","gray","gray","gray","gray","gray","gray", "gray","gray","gray","gray","gray"), n =8,lwd = 1, lty = "solid") title3d(main = “Correlation”, sub = "", xlab = "X", ylab = "Y", zlab = "") rgl.bringtotop()