CORRELATION & REGRESSION

Slides:



Advertisements
Similar presentations
The Simple Regression Model
Advertisements

Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Copyright © 2010 Pearson Education, Inc. Slide
Describing Relationships Using Correlation and Regression
Objectives (BPS chapter 24)
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Don’t spam class lists!!!. Farshad has prepared a suggested format for you final project. It will be on the web
Linear Regression and Correlation Analysis
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
SIMPLE LINEAR REGRESSION
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Correlation and Regression Analysis
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Introduction to Regression Analysis, Chapter 13,
Relationships Among Variables
Correlation and Regression
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Chapter 8: Bivariate Regression and Correlation
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Inference for regression - Simple linear regression
Linear Regression and Correlation
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Simple Linear Regression Models
Correlation.
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Chapter 15 Correlation and Regression
Inferences for Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
Lecture 10: Correlation and Regression Model.
1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.
Chapter 14 Correlation and Regression
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Chapter 7 Calculation of Pearson Coefficient of Correlation, r and testing its significance.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
The simple linear regression model and parameter estimation
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Correlation and Simple Linear Regression
Correlation and Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
Simple Linear Regression and Correlation
Product moment correlation
SIMPLE LINEAR REGRESSION
Warsaw Summer School 2017, OSU Study Abroad Program
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

CORRELATION & REGRESSION Chapter 9

Correlation vs. Regression This chapter will speak of both correlations and regressions. both use similar mathematical procedures to provide a measure of relation; the degree to which two continuous variables vary together. . . .or covary. The regression term is used when 1) one of the variables is a fixed variable, and 2) the end goal is use the measure of relation to predict values of the random variable based on values of the fixed variable Chapter 9

Correlation vs. Regression Examples: In this class, height and ratings of physical attractiveness (both random variables) vary across individuals. We could ask, “What is the correlation between height and these ratings in our class”? Essentially, we are asking “As height increases, is there any systematic increase (positive correlation) or decrease (negative correlation) in one’s rating of their own attractiveness?” Chapter 9

Correlation vs. Regression Examples: Alternatively, we could do an experiment in which the experimenter compliments a subject on their appearance one to eight times prior to obtaining a rating (note that ‘number of compliments’ is a fixed variable). We could now ask “can we predict a person’s rating of their attractiveness, based on the number of compliments they were given?” Chapter 9

Scatterplots The first way to get some idea about a possible relation between two variables is to do a scatterplot of the variables. Let’s consider the first example discussed previously where we were interested in the possible relation between height and ratings of physical attractiveness. Chapter 9

Scatterplots The following is a sample of the data from our class as it pertains to this issue: Chapter 9

Scatterplots We can create a scatterplot of these data by simply plotting one variable against the other: correlation = 0.146235 or +0.15 Chapter 9

Scatterplots Correlations range from -1 (perfect negative relation) through 0 (no relation) to +1 (perfect positive relation). We’ll see exactly how to calculate these in a moment, but the scatterplots would look like. . . Chapter 9

Scatterplots Chapter 9

Covariance The first step in calculating a correlation co-efficient is to quantify the covariance between two variables. For the sake of an example, consider the height and weight variables from our class data set. . . Chapter 9

Covariance We’ll just focus on the first 12 subjects data for now. Chapter 9

Covariance The covariance of these variables is computed as: Chapter 9

Covariance The covariance formula should look familiar to you. If all the Ys were exchanged for Xs, the covariance formula would be the variance formula. Note what this formula is doing, however, it is capturing the degree to which pairs of points systematically vary around their respective means. Chapter 9

Covariance If paired X and Y values tend to both be above or below their means at the same time, this will lead to a high positive covariance. However, if the paired X and Y values tend to be on opposite sides of their respective means, this will lead to a high negative covariance. If there is no systematic tendencies of the sort mentioned above, the covariance will tend towards zero. Chapter 9

Covariance To make life easier there is also a computationally more workable version of the covariance formula: Chapter 9

Covariance For our height versus weight example then: The covariance itself gives us little info about the relation we are interested in, because it is sensitive to the standard deviation of X and Y. It must be transformed (standardized) before it is useful. Hence. . . . Chapter 9

The Pearson Product-Moment Correlation Coefficient (r) The Pearson Product-Moment Correlation Coefficient, r, is computed simple by standardizing the covariance estimate as follows: This results in r values ranging from -1.0 to +1.0 as discussed earlier. Chapter 9

The Pearson Product-Moment Correlation Coefficient (r) So, if we apply this to the example used earlier... Chapter 9

Adjusted r Unfortunately, the r we measure using our sample is not an unbiased estimator of the population correlation coefficient  (rho). We can correct for this using the adjusted correlation coefficient which is computed as follows: Chapter 9

Adjusted r So, for our example: Chapter 9

The Regression Line Often scatter plots will include a ‘regression line’ that overlays the points in the graph: Chapter 9

The Regression Line The regression line represents the best prediction of the variable on the Y axis (Weight) for each point along the X axis (Height). For example, my (Marty’s) data is not depicted in the graph. But if I tell you that I am about 72 inches tall, you can use the graph to predict my weight. Chapter 9

The Regression Line Going back to your high school days, you perhaps recall that any straight line can be depicted by an equation of the form: Where Ŷ is the predicted value of Y a is the slope of the line b is the intercept Chapter 9

The Regression Line Since the regression line is supposed to be the line that provides the best prediction of Y, given some value of X, we need to find values of a & b that produce a line that will be the best-fitting linear function (i.e., the predicted values of Y will come as close as possible to the obtained values of Y). Chapter 9

The Regression Line The first thing we need to do when finding this function is to define what we mean by best. Typically, the approach we take is to assume that the best regression line is the one the minimizes errors in prediction which are mathematically defined as the difference between the obtain and predicted values of Y (Y – Ŷ) this difference is typically termed the residual. Chapter 9

The Regression Line For reasons similar to those involved in computations of variance, we cannot simply minimize Σ(Y-Ŷ) because that sum will equal zero for any line passing through the point (X, Y). Instead, we must minimize Σ(Y-Ŷ)2 Chapter 9

The Regression Line At this point, the text book goes through a bunch of mathematical stuff showing you how to solve for a and b by substituting the equation for a line in for Ŷ in the equation Σ(Y-Ŷ)2, and then minimizing the result. You don’t have to know any of that, just the result, which is: Chapter 9

The Regression Line For our height versus weight example, b = 2.36 and a = -28.26 (you should confirm these for yourself -- as a check of your understanding). Thus, the regression line for our data is: Ŷ=2.36X+(-28.26) Chapter 9

Residual (or error) variance Once we have obtained a regression line, the next issue concerns how well the regression line actually fits the data. Analogous to how we calculated the variance around a mean, we can calculate the variance around a regression line, termed the residual variance or error variance and denoted as , in the following manner: Chapter 9

Residual (or error) variance This equation uses N-2 in the denominator because two degrees of freedom were lost when computing Ŷ (calculating a and b). The square root of this term is called the standard error of the estimate and is denoted as: Chapter 9

Residual (or error) variance 1) The hard (but logical) way: Chapter 9

Residual (or error) variance Chapter 9

Residual (or error) variance 2) The easy (but don’t ask me why it works) way. In another feat of mathematical wizardry, the textbook shows how you can go from the formula above, to the following (easier to work with) formula: Chapter 9

Residual (or error) variance 2) The easy (but don’t ask me why it works) way. If we use the non-corrected value of r, we should get the same answer as when we used the hard way: Chapter 9

Residual (or error) variance 2) The easy (but don’t ask me why it works) way. The difference is due partially to rounding errors, but mostly to the fact that this “easy” formula is actually an approximation that assumes large N. When N is small, the obtained value over-estimates the actual value by a factor of (N-1)/(N-2). Chapter 9

Hypothesis Testing The text discusses a number of hypothesis testing situations relevant to r and b, and gives the test to be performed in each situation. I only expect you to know how to test 1) whether a computed correlation coefficient is significantly different from zero, and 2) whether two correlations are significantly different. Chapter 9

Hypothesis Testing One of the most common reasons one examines correlations is to see if two variables are related. If the computed correlation coefficient is significantly different from zero, that suggests that there is a relation ... the sign of the correlation describes exactly what that relation is. Chapter 9

Hypothesis Testing To test whether some computed r is significantly different from zero, you first compute the following t-value: That value is then compared to a critical t with N-2 degrees of freedom. Chapter 9

Hypothesis Testing If the obtained t is more extreme than the critical t, then you can reject the null hypothesis (that the variables are not related). For our height versus weight example: tcrit(10) = 2.23, therefore we cannot reject H0. Chapter 9

Hypothesis Testing Testing for a difference between two independent rs (i.e. does r1-r2=0?) turns out to be a trickier issue, because the sampling distribution of the difference in two r values is not normally distributed. Fisher has shown that this problem can be compensated for by first transforming each of the r values using the following formula: Chapter 9

Hypothesis Testing this leads to an r value that is normally distributed and whose standard error is reflected by the following formula: Chapter 9

Hypothesis Testing Given all this, one can test for the difference between two independent rs using the following z-test: Chapter 9

Hypothesis Testing So, the steps we have to go through to test the difference of two independent rs are: 1) compute both r values. 2) transform both r values. 3) get a z value based on the formula above. 4) find the probability associated with that z value. 5) compare the obtained probability to alpha divided by two (or alpha if doing a one-tailed test). Chapter 9

Hypothesis Testing Chapter 9