Chapter 9: Interpretative aspects of correlation and regression.

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Chapter 12: Testing hypotheses about single means (z and t) Example: Suppose you have the hypothesis that UW undergrads have higher than the average IQ.
Correlation and Linear Regression.
Chapter 8: Prediction Eating Difficulties Often with bivariate data, we want to know how well we can predict a Y value given a value of X. Example: With.
Regression What is regression to the mean?
Chapter 6: Standard Scores and the Normal Curve
Education 793 Class Notes Joint Distributions and Correlation 1 October 2003.
Math 3680 Lecture #19 Correlation and Regression.
Overview Correlation Regression -Definition
Looking at data: relationships - Correlation IPS chapter 2.2 Copyright Brigitte Baldi 2005 ©
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Multiple Choice Review
Multiple Regression Research Methods and Statistics.
Chris Morgan, MATH G160 March 2, 2012 Lecture 21
Chapter 11: Random Sampling and Sampling Distributions
Chapters 10 and 11: Using Regression to Predict Math 1680.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
CORRELATION & REGRESSION
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
Wednesday, October 12 Correlation and Linear Regression.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Sampling W&W, Chapter 6. Rules for Expectation Examples Mean: E(X) =  xp(x) Variance: E(X-  ) 2 =  (x-  ) 2 p(x) Covariance: E(X-  x )(Y-  y ) =
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
3.2 Least Squares Regression Line. Regression Line Describes how a response variable changes as an explanatory variable changes Formula sheet: Calculator.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Chapter 5 Residuals, Residual Plots, & Influential points.
Chapter 5 Residuals, Residual Plots, Coefficient of determination, & Influential points.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Will how tall you are tell us what size shoe you wear?
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Measures of variability: understanding the complexity of natural phenomena.
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
7.2 Means and Variances of Random Variables, cont.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
CHAPTER 5: Regression ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
What do you need to know the truth about?. b = a ab = a 2 ab – b 2 = a 2 – b 2 b(a – b) = (a + b)(a – b) b = a + b b = 2b 1 = 2.
The simple linear regression model and parameter estimation
Statistical analysis.
Regression and Correlation
Statistics 200 Lecture #5 Tuesday, September 6, 2016
Statistical analysis.
The Normal Distribution
Summary (Week 1) Categorical vs. Quantitative Variables
Summary (Week 1) Categorical vs. Quantitative Variables
Presentation transcript:

Chapter 9: Interpretative aspects of correlation and regression

Fun facts about the regression line Equation of regression line: If we convert our X and Y scores to z x and z y, the regression line through the z-scores is: Because the means of the z-scores are zero and the standard deviations are 1. If we convert our scores to z-scores, the slope of the regression line is equal to the correlation.

Regression to the mean: When |r|<1, the more extreme values of X will tend to be paired to less extreme values of Y. The slope of the regression line is flatter for lower correlations. This means that the expected values of Y are closer to the mean of Y for lower correlations. Remember, the slope of the regression line is: X Y r= X Y r=0.26

“I had the most satisfying Eureka experience of my career while attempting to teach flight instructors that praise is more effective than punishment for promoting skill-learning. When I had finished my enthusiastic speech, one of the most seasoned instructors in the audience raised his hand and made his own short speech, which began by conceding that positive reinforcement might be good for the birds, but went on to deny that it was optimal for flight cadets. He said, “On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don’t tell us that reinforcement works and punishment does not, because the opposite is the case.” This was a joyous moment, in which I understood an important truth about the world: because we tend to reward others when they do well and punish them when they do badly, and because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them. I immediately arranged a demonstration in which each participant tossed two coins at a target behind his back, without any feedback. We measured the distances from the target and could see that those who had done best the first time had mostly deteriorated on their second try, and vice versa. But I knew that this demonstration would not undo the effects of lifelong exposure to a perverse contingency.” -Daniel Kahneman

z-score, Shot 1 z-score, Shot 2

Shot 1Shot Mean Z-Score Students in top half for shot 1 Regression to the mean z-score, Shot 1 z-score, Shot 2

A classic example of regression to the mean: Correlations between husband and wives’ IQs

Regression to the mean Example: correlation of IQs of husband and wives. The IQ’s of husbands and wives have been found to correlate with r=0.5. Both wives and husbands have mean IQs of 100 and standard deviations of 15. Here’s a scatter plot of a typical sample of 200 couples. Wife IQ Husband IQ n= 200, r= 0.50, Y' = 0.50 X

Regression to the mean Example: correlation of IQs of husband and wives. According to the regression line, the expected IQ of a husband of wife with an IQ of 115 should be (0.5)(100)+50 = This is above average, but closer to the mean of Wife IQ Husband IQ X = 115, Y' = 0.50 (100) = 107.5

“Homoscedasticity” Today’s word:

“Homoscedasticity” Variability around the regression line is constant. Variability around the regression line varies with x.

Interpretation of S YX, the standard error of the estimate If the points are distributed with ‘homoscedasticity’, then the Y-values should be normally distributed above and below the regression line. S YX is a measure of the standard deviation of this normal distribution. This means that 68% of the scores should fall within +/- 1 standard deviation of the regression line, and 98% should fall within +/- 2 standard deviations of the regression line. 68% of Husband’s IQ fall within +/ IQ points of the regression line Wife IQ Husband IQ

Example: correlation of IQs of husband and wives. What percent of women with IQ’s of 115 are married to men with IQ’s of 115 or more? To calculate the proportion above 115, we calculate z and use Table A: z = ( )/12.99 =.5744 The area above z =.5744 is So 28.43% of women with IQ’s of 115 are married to men with IQ’s of 115 or more. Answer: We just calculated that the mean IQ of a man married to a woman with an IQ of 115 is If we assume normal distributions and ‘homoscedasticity’, then the standard deviation of the IQ’s of men married to women with IQ’s of 115 is S YX:

Example: The correlation between IQs of twins reared apart was found to be Assume that IQs are distributed normally with a mean of 100 and standard deviation of 15 points, and also assume homoscedasticity. 1)Find the regression line that predicts the IQ of one twin based on another’s 2)Find the standard error of the estimate S YX. 3)What is the mean IQ of a twin that has an IQ of 130? 4)What proportion of all twin subjects have an IQ over 130? 5)What proportion of twins that have a sibling with an IQ of 130 have an IQ over 130?

Batting average this year Batting average next year Another example: The year to year correlation for a typical baseball player’s batting averages is Suppose that the batting averages for this year is distributed normally with a mean of 225 and a standard deviation of For players that batted 300 one year, what is the expected distribution of batting averages for next year?

Answer: The batting averages will be distributed normally with mean determined by the regression line, and the standard deviation equal to the standard error of the mean. For X = 300, Y’ =.41(300) = So the expected batting average next year should be distributed normally with a mean of and a standard deviation of Note that the mean is higher than the overall mean of 225, but lower than the previous year of 300. Another example: The year to year correlation for a typical baseball player’s batting averages is Suppose that the batting averages for this year is distributed normally with a mean of 225 and a standard deviation of For players that batted 300 one year, what is the expected distribution of batting averages for next year?

Answer: We know that the expected batting average next year should be distributed normally with a mean of and a standard deviation of This like our old z-transformation problems. z = ( )/31.92 = 1.39 Pr(z>1.39) = Only 8.23% of the players will bat 300 or higher. On the other hand, only 1.61% of all batters will bat 300 or higher. What percent of players that bat 300 this year will bat 300 or higher next year?

Proportion of variance in Y associated with variance in X. Remember this example? Here are two hypothetical samples showing scatter plots of ages of brides and grooms for a correlation of 1 and a correlation of 0. Correlation of r= Age of Bride Age of Groom r=0.00, Regression line: Y'=0.00X+26.8 Correlation of r=0.0 For r=1, all of the variance in Y can be explained (or predicted) by the variance in X. For r=0, none of the variance in Y can be explained (or predicted) by the variance in X. So r reflects the amount that variability in Y can be explained by variability in X Age of Bride Age of Groom r=1.00, Regression line: Y'=1.20X-3.32

Stress Eating Difficulties Y’ total variance of Y variance of Y not explained by X variance of Y explained by X It turns out that the total variance is the sum of the corresponding component variances. The deviation between Y and the mean can be broken down into two components: The total variance is the sum of the variances explained and not explained by x.

How does this relate to the correlation, r? Let’s look at: Which is the proportion of total variance explained by X. This is the same as: Remember, After a little algebra (page 151) we can show that

r 2 is the proportion of variance in Y explained by variance in X, and is called the coefficient of determination. The remaining variance, k 2 = 1-r 2, is called the coefficient of nondetermination x y r = 0.71 If r=.7071, then r 2 = 0.5, which means that half the variance in Y can be explained by variance in X. The other half cannot be explained by variance in X

Factor that influences r: (1) ‘Range of Talent’ (sometimes called ‘restricted range’) Example: The IQ’s of husbands and wives IQ wife IQ husband r=0.5

Now suppose we were to only sample from women with IQ’s of 115 or higher. This is called ‘restricting the range’ IQ wife IQ husband

IQ wife IQ husband r = The correlation among these remaining couples’ IQs is much lower

IQ wife IQ husband r = 0.50 Restricting the range to make a discontinuous distribution can often increase the correlation:

IQ wife IQ husband Restricting the range to make a discontinuous distribution can often increase the correlation: r = 0.50

IQ wife IQ husband r = Restricting the range to make a discontinuous distribution can often increase the correlation:

Factor that influences r: (2) ‘Homogeneity of Samples’ Correlation values can be both increased or decreased if we accidentally include two (or more) distinct sub-populations Average of parent's height (in) Height of all students n = 96, r = 0.43 Female Male Average of parent's height (in) Male student's height (in) n = 21, r = 0.44

Factor that influences r: (2) ‘Homogeneity of Samples’ Correlation values can be both increased or decreased if we accidentally include two (or more) distinct sub-populations.

Factor that influences r: (2) ‘Homogeneity of Samples’ Correlation values can be both increased or decreased if we accidentally include two (or more) distinct sub-populations Height(in) Video game playing (male) n = 21, r = 0.10

Factor that influences r: (2) Outliers Just like the mean, extreme values have a large influence on the calculation of correlation.