Math 3680 Lecture #19 Correlation and Regression.

Slides:



Advertisements
Similar presentations
Regression Inferential Methods
Advertisements

Correlation and regression Dr. Ghada Abo-Zaid
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Sampling Distributions
Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --
MA-250 Probability and Statistics
Inference for Regression 1Section 13.3, Page 284.
Optical illusion ? Correlation ( r or R or  ) -- One-number summary of the strength of a relationship -- How to recognize -- How to compute Regressions.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
LINEAR REGRESSIONS: Cricket example About lines Line as a model:
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Linear Regression and Correlation Analysis
Section 10-3 Chapter 10 Correlation and Regression Correlation
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Statistics 303 Chapter 10 Least Squares Regression Analysis.
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
LINEAR REGRESSIONS: About lines Line as a model: Understanding the slope Predicted values Residuals How to pick a line? Least squares criterion “Point.
Chapter 11: Random Sampling and Sampling Distributions
Chapters 10 and 11: Using Regression to Predict Math 1680.
Least Squares Regression
Correlation and Linear Regression
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation
Regression Analysis (2)
Chapters 8 and 9: Correlations Between Data Sets Math 1680.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Regression. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other words, there is a distribution.
Regression. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for each. Using the slope from each.
Elementary Statistics Correlation and Regression.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Regression with Inference Notes: Page 231. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.3.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Least Squares Regression.   If we have two variables X and Y, we often would like to model the relation as a line  Draw a line through the scatter.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Regression Inference. Height Weight How much would an adult male weigh if he were 5 feet tall? He could weigh varying amounts (in other words, there is.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Regression and Correlation
Regression.
Regression.
Regression Fallacy.
Lecture Slides Elementary Statistics Thirteenth Edition
Chapter 12 Regression.
Regression Inference.
Regression.
Regression.
Descriptive Analysis and Presentation of Bivariate Data
Regression.
Regression Chapter 8.
Regression.
SIMPLE LINEAR REGRESSION
Regression.
SIMPLE LINEAR REGRESSION
REGRESSION ANALYSIS 11/28/2019.
Presentation transcript:

Math 3680 Lecture #19 Correlation and Regression

The Correlation Coefficient: Limitations

Moral: Correlation coefficients only measure linear association.

Film starring Matthew McConaughey x, Minutes Shirtless y, Opening Weekend Gross (millions of dollars) We Are Marshall06.1 ED tv Reign of Fire Sahara Fool's Gold Correlation

Film starring Matthew McConaughey x, Minutes Shirtless y, Opening Weekend Gross (millions of dollars) We Are Marshall06.1 ED tv Reign of Fire Sahara Fool's Gold Correlation Moral: Correlation coefficients are most appropriate for football-shaped scatter diagrams and can be very sensitive to outliers.

Regression

The heights and weights from a survey of 988 men are shown in the scatter diagram. Avg height = 70 in SD height = 3 in Avg weight = 162 lb SD weight = 30 lb r = 0.47

Example. Suppose a man is one SD above average in height ( = 73 inches). Should you guess his weight to be one SD above average ( = 192 pounds)?

Solution: No. Notice that maybe 10 or 11 of the men 73 inches tall have weights above 192 pounds, while dozens have weights below 192 pounds. (73, 192)

A better prediction is obtained by increasing by not a full SD but by r SDs: Prediction = Average + ( r )(# SDs)(SD) = (0.47) (1) (30) = lb This is our second interpretation of the correlation coefficient.

Prediction =162+(1) (0.47) (30) =176.1 lb (73, 176.1)

(70, 162) 3 inches (0.47)(30) lbs Slope =

Example: Predict the height of a man who is pounds. Does this contradict our previous example?

Example: Predict the weight of a man who is 5’6”. Where does this prediction appear in the diagram?

Notice that these points are displayed on the solid line in the diagram. This line is called the regression line. To obtain this line, you start at the point of averages, and draw a line with slope In other words, the equation of the regression line is Reverse the roles of x and y when predicting in the other direction. sy sy  r sx sx

Example: Find the equation of the regression line for the height-weight diagram.

Average SAT score = 550SD = 80 Average first-year GPA = 2.6SD = 0.6 Example: A university has made a statistical analysis of the relationship between SAT-M scores and first-year GPA. The results are: 40.  r The scatter diagram is football shaped. Find the equation of the regression line. Then predict the first-year GPA of a randomly chosen student with an SAT-M score of 650.

Both Excel and TI calculators are capable of computing and visualizing regression lines. (See book p. 426). In Excel 2007, highlight the x- and y-values and use Insert, Scatter, to draw a scatter plot. Click the data points, and then right-click Add Trendline to see the regression line.

To get the coefficients of the regression line in Excel 2007, use Data, Data Analysis, Regression.

The Regression Effect

For a study of 1,078 fathers and sons: Average fathers’ height = 68 in SD = 2.7 in Average sons’ height = 69 in SD = 2.7 inches 0.5  r Suppose a father is 72 inches tall. How tall would you predict his son to be? Repeat for a father who is 64 inches tall.

Notice that tall fathers tend to have tall sons – though sons who are not as tall. Likewise, short fathers on average will have short sons – just not as short. Hence the term, “regression.” A pioneering but aristocratic statistician (Galton) called this effect, the “regression toward mediocrity,” and the term has stuck. There is no biological cause to this effect – it is strictly statistical. Thinking that the regression effect is due to something important is called the regression fallacy.

Example: A preschool program attempts to boost students’ IQs. The children are tested when they enter the program (pretest), and again when they leave the program (post-test). On both occasions, the average IQ score was 100, with an SD of 15. Also, students with below-average IQs on the pretest had scores that went up by 5 points, while students with above average scores of the pretest had their scores drop by an average of 5 points. What is going on? Does the program equalize intelligence?

Example. Suppose someone gets a score of 140 on the pretest. Does this mean that the student has an IQ of exactly 140?

Solution: No. There will always be chance error associated with the measurement. For the sake of argument, let’s assume that the chance error is equal to 5 points. Then there are two likely explanations, they are: Actual IQ of 135, with a chance error of +5 Actual IQ of 145, with a chance error of -5 Which of the above is the most likely explanation?

This explains the regression effect. If someone scores above average on the first test, we would estimate that the true score is probably a bit lower than the observed score.

Example: An instructor gives a midterm. She asks the students who score 20 points below average to see her regularly during her office hours for special tutoring. They all scored at least average on the final. Can this improvement be attributed to the regression effect?

Regression and Error Estimation

n = 988 Avg height = 70 in SD height = 3 in Avg weight = 162 lb SD weight = 30 lb r = 0.47 For a man 73 in tall, we predict weight of 162+(1) (0.47) (30) =176.1 lb Next question: what is the error for this estimate? Based on the picture, is it 30 lb? Or less?

THEOREM. Assuming the data is normally distributed, we have For the current example, that means The weight is therefore estimated as ± 26.5 lb.

Note. If n is large, the last factor in may be safely ignored. In other words, if n is large, then

Example: For a study of 1,078 fathers and sons: Fathers: Average height = 68 in SD = 2.7 in Sons: Average height = 69 in SD = 2.7 inches r = 0.5 Suppose a father is 63 inches tall. What percentage have sons who are at least 66 inches tall?

Testing Correlation

Recall the equation of the regression line: so that the slope of the regression line is. The standard error for the slope is given by

Under the assumptions of: 1. normality, and 2. homoscedasticity (see below), the t distribution with df = n - 2 may be used to find confidence intervals and perform hypothesis tests. Homoscedasticity means that the variability of the data about the regression line does not depend on the value of x.

n = 988 Avg height = 70 in SD height = 3 in Avg weight = 162 lb SD weight = 30 lb r = 0.47

For df = 986, the Student t distribution is almost normal. So a 95% confidence interval for the slope is The corresponding confidence interval for the correlation coefficient for all men is

Example: For a study of 1,078 fathers and sons: Fathers: Average height = 68 in SD = 2.7 in Sons: Average height = 69 in SD = 2.7 inches r = 0.5 Test the hypothesis that the correlation coefficient for all fathers and sons is positive.