Biostatistics Lecture 17 6/15 & 6/16/2015. Chapter 17 – Correlation & Regression Correlation (Pearson’s correlation coefficient) Linear Regression Multiple.

Slides:



Advertisements
Similar presentations
Simple Linear Regression and Correlation by Asst. Prof. Dr. Min Aung.
Advertisements

Correlation and regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Chapter 10 Curve Fitting and Regression Analysis
Objectives (BPS chapter 24)
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
SIMPLE LINEAR REGRESSION
REGRESSION AND CORRELATION
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Chapter 9: Correlation and Regression
SIMPLE LINEAR REGRESSION
Relationships Among Variables
Lecture 5 Correlation and Regression
Correlation and Linear Regression
Correlation and Linear Regression Chapter 13 Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
STATISTICS ELEMENTARY C.M. Pascual
February  Study & Abstract StudyAbstract  Graphic presentation of data. Graphic presentation of data.  Statistical Analyses Statistical Analyses.
SIMPLE LINEAR REGRESSION
Correlation and Regression
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
Correlation and Linear Regression
Chapter Correlation and Regression 1 of 84 9 © 2012 Pearson Education, Inc. All rights reserved.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Correlation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation and Regression
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
The Scientific Method Interpreting Data — Correlation and Regression Analysis.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
© The McGraw-Hill Companies, Inc., 2000 Business and Finance College Principles of Statistics Lecture 10 aaed EL Rabai week
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
EQT 272 PROBABILITY AND STATISTICS
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Chapter 7 Inner Product Spaces 大葉大學 資訊工程系 黃鈴玲 Linear Algebra.
CHAPTER 3 INTRODUCTORY LINEAR REGRESSION. Introduction  Linear regression is a study on the linear relationship between two variables. This is done by.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 4 Summary Scatter diagrams of data pairs (x, y) are useful in helping us determine visually if there is any relation between x and y values and,
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
SIMPLE LINEAR REGRESSION AND CORRELLATION
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Free Powerpoint Templates ROHANA BINTI ABDUL HAMID INSTITUT E FOR ENGINEERING MATHEMATICS (IMK) UNIVERSITI MALAYSIA PERLIS.
Go to Table of Content Correlation Go to Table of Content Mr.V.K Malhotra, the marketing manager of SP pickles pvt ltd was wondering about the reasons.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation and Regression
Regression and Correlation
Chapter 4 Basic Estimation Techniques
Elementary Statistics
SIMPLE LINEAR REGRESSION MODEL
CHAPTER 10 Correlation and Regression (Objectives)
Correlation and Simple Linear Regression
Correlation and Regression
Correlation and Simple Linear Regression
Correlation and Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Biostatistics Lecture 17 6/15 & 6/16/2015

Chapter 17 – Correlation & Regression Correlation (Pearson’s correlation coefficient) Linear Regression Multiple Regression

Introduction To determine whether there is an association between two variables (one independent and one dependent) If so, what is the association? Can we use it to predict the weight of a male bear given his body length? Lengths and Weights of Male Bears x Length y Weight

Correlation A “correlation” can help determining whether there is a “statistically significant” association between two variables. A scatter plot can help visually assessing whether the paired data (x, y) might be correlated. Such correlation could be “linear” or in other nonlinear forms (such as exponential, etc.)

Linear correlation The linear correlation coefficient r measures the strength of the linear association between the paired x- and y- quantitative values in a sample. It is sometimes called Pearson product moment correlation coefficient. r ranges between -1 and 1.

Basic requirements before computing r 1.paired data (x, y) are randomly sampled. 2.visual scatter plot be approximately a straight line. 3.outliners be firstly removed

Example 1 Given the following 4 paired data, compute the correlation coefficient r. x1135 y2864     It looks like a low negative correlation. r =

>> x=[ ];y=[ ];n=4; >> r=(n*sum(x.*y)- sum(x)*sum(y))/sqrt((n*sum(x.*x)- sum(x)^2)*(n*sum(y.*y)-sum(y)^2)) r = >>

Interpreting r r is between -1 (perfect negative correlation) and +1 (perfect positive correlation). r = 0 means no correlation. Then what defines a “strong” correlation? a critical valueThe absolute value of r should be no less than a critical value? Is r a random variable having its own probability density function?

Hypothesis testing for r H 0 : no significant linear correlation Test statistic: This is a 2-tailed t test DF = n-2  = 0.05 (usually) p-value can be computed based on computed t on a t distribution of specific DF. Reject if p <= 0.05.

Back to example 1 A critical t value that cuts off the left tail of t DF=2 is “tinv(0.025, 2) = ”. The t-statistic computed based on previously computed r = now becomes is not as extreme or more extreme than We thus do not reject the null hypothesis of no significant linear correlation. P-value = 2*tcdf( ,2)=0.8652, much greater than  =0.05.

>> x=[ ];y=[ ]; >> n=4; >> r=(n*sum(x.*y)- sum(x)*sum(y))/sqrt((n*sum(x.*x)- sum(x)^2)*(n*sum(y.*y)-sum(y)^2)) r = >> t=r/sqrt((1-r^2)/(n-2)) t = >> 2*tcdf(t,2) ans =

Example 2 Find the linear correlation coefficient r for the following data. Determine whether the correlation is significant or not by computing a critical r value and a p-value. Lengths and Weights of Male Bears x Length y Weight

0.8974The computed r = The computed t-statistic = DF=8-2=6 A critical r cutting off of the right tail of t DF=6 is “tinv(0.975, 6)=2.4469”. This is smaller than So our t-statistic is more extreme than expected P-value = “2*(1-tcdf(4.9807,6))=0.0025”. We thus reject the null hypothesis, suggesting a significant linear correlation exists between the length and weight for male bears.

>> y=[ ];y=[ ];n=8; >> r=(n*sum(x.*y)- sum(x)*sum(y))/sqrt((n*sum(x.*x)- sum(x)^2)*(n*sum(y.*y)-sum(y)^2)) r = >> t=r/sqrt((1-r^2)/(n-2)) t = >> 2*(1-tcdf(t,6)) ans = >>

Using MATLAB’s “corrcoef” function >> x=[ ]; >> y=[ ]; >> [R, P] = corrcoef(x,y) R = P = Pearson coefficient P-value in supporting the linear correlation at  =0.05.

Linear Regression To find a graph and an equation of the straight line that represents the association. The straight line is called “regression line”. The equation is called “regression equation”.

It’s all above finding the slope m and the y- intercept c of the straight line.

Example 3 Find the regression equation for the following data. Predict the weight of a bear with x = Lengths and Weights of Male Bears x Length y Weight

MATLAB’s “polyfit” (based on minimizing the least-squares of the errors) will serve the purpose. >> x=[ ]; >> y=[ ]; >> polyfit(x,y,1) ans = >> The equation is y = x –

MATLAB’s “polyval” can be used to evaluate a value of a polynomial function. >> x=[ ]; >> y=[ ]; >> polyfit(x,y,1) ans = >> polyval(polyfit(x,y,1), 71.0) ans = >>

Multiple regression Two or more independent variables. Data from Male Bears y Weight x2 Age x3 Head L x4 Head W x5 Neck x6 Length x7 Chest y = b1 + b2*x2 + b3*x3 + … + b7*x7

Example 4 Find b1, b3 and b6. Data from Male Bears y Weight x2 Age x3 Head L x4 Head W x5 Neck x6 Length x7 Chest y = b1 + b3*x3 + b6*x6

b1 + b3*x3 + b6*x6 = y b *b *b6 = 80 b *b *b6 = 344 b *b *b6 = 416 b *b *b6 = 348 b *b *b6 = 262 b *b *b6 = 360 b *b *b6 = 332 b *b *b6 = b b b = or AX = y

AX = y [8 by 3][3 by 1] = [8 by 1] The problem is – matrix A is not square. We cannot find its inverse and solve the equation as X=A -1 y.

X = pinv(A)y 8 [3 by 1] = [3 by 8] [8 by 1] I need to have a “3 by 8” matrix which serves like a inverse of A. We call it a pseudo-inverse of matrix A, or pinv(A) = (A t A) -1 A t. See Appendix for the definition of pinv(A) Pseudo-inverse of matrix A

>> y=[ ]’; >> x3=[ ]’; >> x6=[ ]’; >> A=[ones(size(x3)) x3 x6] A = [8 by 3] Note that y, x3 and x6 must be column vectors.

>> A'*A ans = 1.0e+004 * >> pinvA=inv(ans)*A' pinvA = >> b=pinvA*y b = >> [3 by 8] [8 by 3] = [3 by 3] 8 [3 by 8] = [3 by 3] [3 by 8] 8 [3 by 1] = [3 by 8] [8 by 1] y = *x *x6 pinv(A) = (A t A) -1 A t

>> y=[ ]’; >> x3=[ ]’; >> x6=[ ]’; >> A=[ones(size(x3)) x3 x6]; >> b=regress(y, A) b = >> MATLAB’s “regress” Note that y, x3 and x6 must be column vectors in order to build the 8x3 matrix A.

APPENDIX – LEAST SQUARE METHOD FROM LINEAR ALGEBRA

Least-Squares Curves We have seen that the system Ax = y of n equations in n variables, where A is invertible, has the unique solution x = A  1 y. over-determinedHowever, if the system has n equations and m variables, with n > m, the system does not, in general, have a solution and it is said to be over-determined. A is not a square matrix, thus A  1 does not exist. pseudoinverseWe will introduce a matrix called the pseudoinverse of A, denoted pinv(A), that leads to a least-squares solution x = pinv(A)y for an over-determined system. This is not a true solution, but in some sense the closest we can get in order to have a true solution.

33 DEFINITION : pseudoinverse Let A be a matrix, then the matrix (A t A)  1 A t is called a pseudoinverse of A and is denoted pinv(A). Example 1 Find the pseudoinverse of A = Solution

34 Ax = y x = pinv(A)y system least-squares solution Let Ax = y be a system of n linear equations in m variables with n > m, where A is of rank m. (1) This system has a least-squares solution. (2) If the system has a unique solution, the least –squares solution is that unique solution. (3) If the system is over-determined, the least-squares solution is the closest we can get to a true solution. (4) The system cannot have many solutions. System of Equations Ax = y

35 Example 2 Find the least-squares solution to the following over-determined system of equations. Sketch the solution. Solution We have (m=3, n=2)

36 Then the least-squares solution is The solution is shown below as point

37 Least Squares Curves The least squares line or curve can be found by solving a over-determined system. This is found by minimizing the sum of d d 2 2 +…+ d n 2 That’s where we get the name “least squares” from. Least Squares Line Least Squares Curve

38 Example 3 Find the least-squares line for the following data points. (1, 1), (2, 2.4), (3, 3.6), (4, 4) Solution Let the equation of the line be y = a + bx. Substituting for these points into the equation, we get an over-determined system: To solve for the least-squares solution, we have Thus

39 And the least-squares solution is Thus a = 0.2, b = And the equation is y = x

40 Example 4 Find the least-squares parabola for the following data points. (1, 7), (2, 2), (3, 1), (4, 3) Solution Let the parabola be y = a + bx + cx 2. Substituting data points: We have

41 Finally we have the solution Thus a = 15.25, b =  10.05, c = Or y = – 10.05x x 2