Chapters 10 and 11: Using Regression to Predict Math 1680.

Slides:



Advertisements
Similar presentations
Stat 1301 More on Regression. Outline of Lecture 1. Regression Effect and Regression Fallacy 2. Regression Line as Least Squares Line 3. Extrapolation.
Advertisements

Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
Theoretical Probability Distributions We have talked about the idea of frequency distributions as a way to see what is happening with our data. We have.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Math 3680 Lecture #19 Correlation and Regression.
Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --
MA-250 Probability and Statistics
LSRL Least Squares Regression Line
Optical illusion ? Correlation ( r or R or  ) -- One-number summary of the strength of a relationship -- How to recognize -- How to compute Regressions.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
LINEAR REGRESSIONS: Cricket example About lines Line as a model:
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Linear Regression and Correlation Analysis
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise.
LINEAR REGRESSIONS: About lines Line as a model: Understanding the slope Predicted values Residuals How to pick a line? Least squares criterion “Point.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Least Squares Regression
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing.
Chapter 9: Interpretative aspects of correlation and regression.
Correlation and Linear Regression
Inference for regression - Simple linear regression
EC339: Lecture 6 Chapter 5: Interpreting OLS Regression.
Chapters 8 and 9: Correlations Between Data Sets Math 1680.
3.3 Density Curves and Normal Distributions
CORRELATION & REGRESSION
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.
Please turn off cell phones, pagers, etc. The lecture will begin shortly. There will be a quiz at the end of today’s lecture.
Regression. Correlation and regression are closely related in use and in math. Correlation summarizes the relations b/t 2 variables. Regression is used.
Stat 1510: Statistical Thinking and Concepts 1 Density Curves and Normal Distribution.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 4 Regression Analysis: Exploring Associations between Variables.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Elementary Statistics Correlation and Regression.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Objective Find the line of regression. Use the Line of Regression to Make Predictions.
Least Squares Regression.   If we have two variables X and Y, we often would like to model the relation as a line  Draw a line through the scatter.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Example: set E #1 p. 175 average ht. = 70 inchesSD = 3 inches average wt. = 162 lbs.SD = 30 lbs. r = 0.47 a)If ht. = 73 inches, predict wt. b)If wt. =
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Answering Descriptive Questions in Multivariate Research When we are studying more than one variable, we are typically asking one (or more) of the following.
Introduction We learned from last chapter that histogram can be used to summarize large amounts of data. We learned from last chapter that histogram can.
The Normal Approximation for Data. History The normal curve was discovered by Abraham de Moivre around Around 1870, the Belgian mathematician Adolph.
Unit 3 – Association: Contingency, Correlation, and Regression Lesson 3-3 Linear Regression, Residuals, and Variation.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Ch. 11 R.M.S Error for Regression error = actual – predicted = residual RMS(error) for regression describes how far points typically are above/below the.
Regression Inference. Height Weight How much would an adult male weigh if he were 5 feet tall? He could weigh varying amounts (in other words, there is.
Statistical analysis.
Regression and Correlation
CHAPTER 12 More About Regression
Statistics 200 Lecture #5 Tuesday, September 6, 2016
Statistical analysis.
AND.
Regression Fallacy.
I271B Quantitative Methods
Presentation transcript:

Chapters 10 and 11: Using Regression to Predict Math 1680

Overview Predicting Values The Regression Line The RMS Error The Regression Effect A Second Regression Line Summary

Predicting Values We have previously seen that a pair of data sets, X and Y, can be characterized by their five-statistic summary µ X, the average value in X SD X, the standard deviation of X µ Y, the average value in Y SD Y, the standard deviation of Y r, the correlation coefficient Often, we want to predict a y-value given a particular x-value Want to use only the five-statistic summary to make prediction

Predicting Values Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µ X = 70 inches, SD X = 3 inches µ Y = 162 lbs, SD Y = 30 lbs r = 0.47 If you had to guess what the weight of any man would be, what is your best bet?

Predicting Values Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µ X = 70 inches, SD X = 3 inches µ Y = 162 lbs, SD Y = 30 lbs r = 0.47 Suppose you know the man is 1 SD above average Would your best guess for his weight be 1 SD above average?

The SD line is the dashed line running through the scatter plot If we guessed 1 SD above average weight, where would we be on the plot? What would a better guess be?

The Regression Line Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µ X = 70 inches, SD X = 3 inches µ Y = 162 lbs, SD Y = 30 lbs r = 0.47 It turns out that the correlation coefficient determines the best guess For every SD we move in X, we should move r SD’s in Y

The Regression Line The regression line from X to Y Runs through the point of averages Has a slope of r time the slope of the SD line The regression line predicts the average value for y within the narrowed-down range specified by a given x

The Regression Line The formula for the regression line from X to Y is Or, alternately, When is the regression line the same as the SD line? When r = 1 or -1

The regression line is the solid line running through the scatter plot If we looked at heights 1 SD above the average, the regression line runs through the point 0.47 SD’s above average in weight

The Regression Line Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µ X = 70 inches, SD X = 3 inches µ Y = 162 lbs, SD Y = 30 lbs r = 0.47 What is the average weight of all the men who are 73 inches tall? For a man 73 inches tall, what weight should we predict? lbs

The Regression Line Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µ X = 70 inches, SD X = 3 inches µ Y = 162 lbs, SD Y = 30 lbs r = 0.47 What is the average weight of all the men who are 64 inches tall? For a man 64 inches tall, what weight should we predict? lbs

The Regression Line To use the regression line from X to Y… Standardize the given x-value to get z x Use the regression equation to go from X to Y  z Y = rz X Unstandardize z Y to get y

The Regression Line Suppose we have the following five- number summary stats for the height (X) and weight (Y) of men in the US µ X = 70 inches, SD X = 3 inches µ Y = 162 lbs, SD Y = 30 lbs r = 0.47 Predict the weight of a man who is 6’4” lbs

The Regression Line Suppose we have the following five- number summary stats for the height (X) and weight (Y) of men in the US µ X = 70 inches, SD X = 3 inches µ Y = 162 lbs, SD Y = 30 lbs r = 0.47 Predict the weight of a man who is 5’6” lbs

The Regression Line Important notes about the regression line from X to Y It predicts the average value for y given an x value  If the scatter plot is football shaped, this prediction will be above about half of the sample and below the other half This is because the variables are approximately normal The slope of the regression line will always be

The RMS Error Recall that an average alone did not uniquely describe a data set A spread measure was needed Since the regression method only gives us an average value as its prediction, we can’t really tell by this alone how good a guess it is

The prediction given by the regression line for a height of 73 inches is at (73 in, 176 lbs) How much does the heaviest 73” tall man weigh? How much does the lightest 73” tall man weigh?

The RMS Error If we are given a specific man to predict, we are likely to be a little off with the regression prediction You can think of the prediction error as being the vertical distance from the point to the regression line That is, error = actual – predicted If we want to get a good sense of what the typical error for a given x-value is, we can find the RMS of all the errors for all the points This value is called the RMS error for the regression line

The RMS Error The RMS error is to the regression line what the SD is to the average The RMS error measures the spread around a prediction from the regression line Recall we are generally assuming the data sets are approximately normal  About 68% of the points on a scatter plot will fall within the strip that runs from one RMS error below to one RMS error above the regression line

The RMS Error Regression Line 1 RMS error, 68% 2 RMS errors, 95%

The RMS Error The RMS error for regression from X to Y (denoted R) can be calculated from the five- statistic summary by What units would R have? What happens when r gets close to 0? What happens when r gets close to 1 or -1?

The RMS Error The RMS error allows us to give a range around our prediction If the scatter plot is football-shaped, the RMS error is roughly constant across the entire range of the data set The vertical spread around one part is about the same as the vertical spread around other parts

The RMS Error Suppose we have the following five- number summary stats for the height (X) and weight (Y) of men in the US µ X = 70 inches, SD X = 3 inches µ Y = 162 lbs, SD Y = 30 lbs r = 0.47 Predict and give the RMS error for the weight of a man who is 6’2” ± 26.5 lbs

The RMS Error Suppose we have the following five- number summary stats for the height (X) and weight (Y) of men in the US µ X = 70 inches, SD X = 3 inches µ Y = 162 lbs, SD Y = 30 lbs r = 0.47 Predict and give the RMS error for the weight of a man who is 5’4” ± 26.5 lbs

The Regression Effect A preschool program attempts to boost students’ IQ scores The children are tested when they enter the program (pretest) The children are retested when they leave the program (post-test)

The Regression Effect On both occasions, the average IQ score was 100, with an SD of 15 Also, students with below-average IQs on the pretest had scores that went up on the average by 5 points Students with above average scores on the pretest had their scores drop by an average of 5 points

The Regression Effect Does the program equalize intelligence? No. If the program really equalized intelligence, then the SD for the post-test results should be smaller than that of the pre-test results. This is an example of the regression effect.

The Regression Effect The regression effect is a byproduct of the fact that predictions from a regression line are average values Some of the people who did very well on the pre- test may simply have had a good test day  Their scores shouldn’t necessarily be as high on the post- test as they were on the pretest Similarly, some of the people who did poorly on the pre-test may simply have had a bad test day  Their scores shouldn’t necessarily be as low on the post- test as they were on the pretest

The Regression Effect Sometimes researchers mistake the regression effect for some important underlying cause in the study (regression fallacy) Tall fathers tend to have tall sons who are slightly shorter than the father There is no biological cause for this reduction  It is strictly statistical

The Regression Effect As part of their training, air force pilots make practice landings with instructors, and are rated on performance The instructors discuss the ratings with the pilots after each landing  Statistical analysis shows that pilots who make poor landings the first time tend to do better the second time  Conversely, pilots who make good landings the first time tend to do worse the second time

The Regression Effect The conclusion is that criticism helps the pilots while praise makes them do worse As a result, instructors were ordered to criticize all landings, good or bad Was this warranted by the facts? No. This is an example of regression fallacy.

The Regression Effect An instructor gives a midterm She asks the students who score 20 points below average to see her regularly during her office hours for special tutoring They all score at class average or above on the final Can this improvement be attributed to the regression effect? Why/why not? No. If it was only the regression effect, most of the students still would have scored below average. The fact that everyone in the tutoring group scored above average indicated that the tutoring had the proper effect.

A Second Regression Line The focus so far has been on the regression line from X to Y Note, however, that there is also a regression line from Y to X What would the difference between the two lines be? The regression line from X to Y is given by z Y = rz X, while the regression line from Y to X is given by z X = rz Y

A Second Regression Line A study of 1,000 families gives the following The husbands’ average height was 68 inches with an SD of 2.7 inches The wives’ average height was 63 inches with an SD of 2.5 inches The correlation between them was 0.25 Predict and give the RMS error for the husband’s height when his wife’s height is 68 inches inches, give or take 2.61 inches

A Second Regression Line A study of 1,000 families gives the following The husbands’ average height was 68 inches with an SD of 2.7 inches The wives’ average height was 63 inches with an SD of 2.5 inches The correlation between them was 0.25 Predict and give the RMS error for the wife’s height when her husband’s height is inches inches, give or take 2.42 inches

A Second Regression Line Regression Line from X to Y Regression Line from Y to X SD Line

A Second Regression Line Regression Line from X to Y Regression Line from Y to X SD Line

A Second Regression Line Regression Line from X to Y Regression Line from Y to X SD Line

Summary When trying to make predictions from a football- shaped plot, a good predictor is the average value for one variable within a restricted range in the other The regression line runs through all of these averages  For every SD moved in the independent variable, the regression line predicts a move of r SD’s in the dependent variable The prediction from the regression line is likely to be off by the RMS error  The RMS error can be calculated as

Summary The regression effect is purely statistical It does not reflect a significant underlying trend in the data There are two regression lines for a scatter plot Which one to use depends on which variable you are predicting