Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Regression and correlation methods
Forecasting Using the Simple Linear Regression Model and Correlation
Hypothesis Testing Steps in Hypothesis Testing:
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Linear regression models
Correlation and Regression
Objectives (BPS chapter 24)
Correlation Correlation is the relationship between two quantitative variables. Correlation coefficient (r) measures the strength of the linear relationship.
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
PSY 307 – Statistics for the Behavioral Sciences
Chapter 10 Simple Regression.
PPA 415 – Research Methods in Public Administration
Linear Regression and Correlation
The Simple Regression Model
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Final Review Session.
SIMPLE LINEAR REGRESSION
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
Chapter Topics Types of Regression Models
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Simple Linear Regression and Correlation
Chapter 7 Forecasting with Simple Regression
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Correlation and Linear Regression
Simple Linear Regression Models
Correlation and Regression
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Chapter Eleven Performing the One-Sample t-Test and Testing Correlation.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
The simple linear regression model and parameter estimation
Inference for Least Squares Lines
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation
Correlation and Simple Linear Regression
Presentation transcript:

Correlation and Regression

Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength and direction of association between two variables Uses the ranks instead of the raw data

Example: Spearman's r s VERSIONS: 1. Boy climbs up rope, climbs down again 2. Boy climbs up rope, seems to vanish, re-appears at top, climbs down again 3. Boy climbs up rope, seems to vanish at top 4. Boy climbs up rope, vanishes at top, reappears somewhere the audience was not looking 5. Boy climbs up rope, vanishes at top, reappears in a place which has been in full view

Hypotheses H 0 : The difficulty of the described trick is not correlated with the time elapsed since it was observed. H A : The difficulty of the described trick is correlated with the time elapsed since it was observed.

East-Indian Rope Trick

Years elapsed Impressiveness Score Rank Years Rank Impressiveness

East-Indian Rope Trick TABLE H n = 21,  = 0.05 Critical value: P < 0.05, reject H o

Spearman’s Rank Correlation - large n For large n (> 100), you can use the normal correlation coefficient test for the ranks Under Ho, t has a t-distribution with n-2 d.f.

Measurement Error and Correlation Measurement error decreases the apparent correlation between two variables You can correct for this effect - see text

Species are not independent data points

Independent contrasts

Quick Reference Guide - Correlation Coefficient What is it for? Measuring the strength of a linear association between two numerical variables What does it assume? Bivariate normality and random sampling Parameter:  Estimate: r Formulae:

Quick Reference Guide - t-test for zero linear correlation What is it for? To test the null hypothesis that the population parameter, , is zero What does it assume? Bivariate normality and random sampling Test statistic: t Null distribution: t with n-2 degrees of freedom Formulae:

Sample Test statistic Null hypothesis  =0 Null distribution t with n-2 d.f. compare How unusual is this test statistic? P < 0.05 P > 0.05 Reject H o Fail to reject H o T-test for correlation

Quick Reference Guide - Spearman’s Rank Correlation What is it for? To test zero correlation between the ranks of two variables What does it assume? Linear relationship between ranks and random sampling Test statistic: r s Null distribution: See table; if n>100, use t-distribution Formulae: Same as linear correlation but based on ranks

Sample Test statistic r s Null hypothesis  =0 compare How unusual is this test statistic? P < 0.05 P > 0.05 Reject H o Fail to reject H o Spearman’s rank correlation Null distribution Spearman’s rank Table H

Quick Reference Guide - Independent Contrasts What is it for? To test for correlation between two variables when data points come from related species What does it assume? Linear relationship between variables, correct phylogeny, difference between pairs of species in both X and Y has a normal distribution with zero mean and variance proportional to the time since divergence

Regression The method to predict the value of one numerical variable from that of another Predict the value of Y from the value of X Example: predict the size of a dinosaur from the length of one tooth

Linear Regression Draw a straight line through a scatter plot Use the line to predict Y from X

Linear Regression Formula Y =  +  X  = intercept –The predicted value of Y when X is zero  = slope – the rate of change in Y per unit of change in X Parameters

Interpretations of  &  positive  negative  = 0 higher  lower  X XXX Y

Linear Regression Formula Y = a + bX a = estimated intercept –The predicted value of Y when X is zero b = estimated slope – the rate of change in Y per unit of change in X ^

How to draw the line? X Y residuals Y1Y1 Y1Y1 ^ Y2Y2 Y2Y2 ^ Y3Y3 Y3Y3 ^ Y4Y4 Y4Y4 ^ (Y 1 -Y 1 ) ^

Least-squares Regression Draw the line that minimizes the sum of the squared residuals from the line Residual is (Y i -Y i ) Minimize the sum: SS residuals =Σ(Y i -Y i ) 2 ^ ^

Formulae for Least-Squares Regression The slope and intercept that minimize the sum of squared residuals are: sum of products sum of squares for X

Example: How old is that lion? X = proportion black Y = age in years

Example: How old is that lion?

X = proportion black Y = age in years X = Y = Σ(X-X) 2 =1.222 Σ(Y-Y) 2 = Σ(X-X)(Y-Y)=13.012

A certain lion has a nose with 0.4 proportion of black. Estimate the age of that lion.

Standard error of the slope Sum of squares Sum of products

Lion Example, continued…

Confidence interval for the slope

Lion Example, continued…

Predicting Y from X What is our confidence for predicting Y from X? Two types of predictions: What is the mean Y for each value of X? –Confidence bands What is a particular individual Y at each value of X? –Prediction intervals

Predicting Y from X Confidence bands: measure the precision of the predicted mean Y for each value of X Prediction intervals: measure the precision of predicted single Y values for each value of X

Predicting Y from X Confidence bandsPrediction interval

Predicting Y from X Confidence bandsPrediction interval How confident can we be about the regression line? How confident can we be about the predicted values?

Testing Hypotheses about a Slope t-test for regression slope Ho: There is no linear relationship between X and Y (  = 0) Ha: There is a linear relationship between X and Y (  ≠ 0)

Testing Hypotheses about a Slope Test statistic: t Null distribution: t with n-2 d.f.

df = n-2 = 32-2 = 30 Critical value: > 2.04 so we reject the null hypothesis Conclude that  0 Lion Example, continued…

Source of variation Sum of squares dfMean squares FP Regression1 Residualn-2 Totaln-1 Testing Hypotheses about a Slope – ANOVA approach

Source of variation Sum of squares dfMean squares FP Regression <0.001 Residual Total Lion Example, continued…

Testing Hypotheses about a Slope – R 2 R 2 measures the fit of a regresion line to the data Gives the proportion of variation in Y that is explained by variation in X R 2 = SS regression SS total

Lion Example, Continued

Assumptions of Regression At each value of X, there is a population of Y values whose mean lies on the “true” regression line At each value of X, the distribution of Y values is normal The variance of Y values is the same at all values of X At each value of X the Y measurements represent a random sample from the population of Y values

Detecting Linearity Make a scatter plot Does it look like a curved line would fit the data better than a straight one?

Non-linear relationship: Number of fish species vs. Size of desert pool

Taking the log of area:

Detecting non-normality and unequal variance These are best detected with a residual plot Plot the residuals (Y i -Y i ) against X Look for: –symmetric cloud of points –Little noticeable curvature –Equal variance above and below the line ^

Residual plots help assess assumptions Original:Residual plot

Transformed data Logs:Residual plot

What if the relationship is not a straight line? Transformations Non-linear regression

Transformations Some (but not all) nonlinear relationships can be made linear with a suitable transformation Most common – log transform Y, X, or both