Regression What is regression to the mean?

Slides:



Advertisements
Similar presentations
AP Statistics Section 3.2 C Coefficient of Determination
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Kin 304 Regression Linear Regression Least Sum of Squares
The Role of r2 in Regression Target Goal: I can use r2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,
Regression Greg C Elvers.
 Coefficient of Determination Section 4.3 Alan Craig
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate.
Accuracy of Prediction How accurate are predictions based on a correlation?
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Correlation 10/30. Relationships Between Continuous Variables Some studies measure multiple variables – Any paired-sample experiment – Training & testing.
Correlation and Regression
Correlation and Linear Regression
Chapter 8: Bivariate Regression and Correlation
Section #6 November 13 th 2009 Regression. First, Review Scatter Plots A scatter plot (x, y) x y A scatter plot is a graph of the ordered pairs (x, y)
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
1.6 Linear Regression & the Correlation Coefficient.
Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary.
Class 4 Simple Linear Regression. Regression Analysis Reality is thought to behave in a manner which may be simulated (predicted) to an acceptable degree.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
3.2 Least Squares Regression Line. Regression Line Describes how a response variable changes as an explanatory variable changes Formula sheet: Calculator.
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Part II Exploring Relationships Between Variables.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
Theme 6. Linear regression
Correlation.
The simple linear regression model and parameter estimation
Regression and Correlation of Data Summary
Regression Analysis AGEC 784.
Reasoning in Psychology Using Statistics
Review Guess the correlation
Regression 11/6.
The Lease Squares Line Finite 1.3.
Regression 10/29.
Warm-up: This table shows a person’s reported income and years of education for 10 participants. The correlation is .79. State the meaning of this correlation.
Correlation 10/27.
G Lecture 10b Example: Recognition Memory
Correlation 10/27.
Chapter 15 Linear Regression
Regression and Residual Plots
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Regression
Simple Linear Regression and Correlation
Introduction to Regression
Honors Statistics Review Chapters 7 & 8
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Regression What is regression to the mean? Suppose the mean temperature in November is 5 degrees What’s your best guess for tomorrow’s temperature? exactly 5? warmer than 5? colder than 5?

Regression What is regression to the mean? Suppose the mean temperature in November is 5 degrees and today the temperature is 15 What’s your best guess for tomorrow’s temperature? exactly 15 again? exactly 5? warmer than 15? something between 5 and 15?

Regression What is regression to the mean? Regression to the mean is the fact that scores tend to be closer to the mean than the values they are paired with e.g. Daughters tend to be shorter than mothers if the mothers are taller than the mean and taller than mothers if the mothers are shorter than the mean e.g. Parents with high IQs tend to have kids with lower IQs, parents with low IQs tend to have kids with higher IQs

Regression What is regression to the mean? The strength of the correlation between two variables tells you the degree to which regression to the mean affects scores strong correlation means little regression to the mean weak correlation means strong regression to the mean no correlation means that one variable has no influence on values of the other - the mean is always your best guess

Regression Suppose you measured workload and credit hours for 8 students Could you predict the number of homework hours from credit hours?

Regression Suppose you measured workload and credit hours for 8 students Your first guess might be to pick the mean number of homework hours which is 12.9

Regression Sum of Squares Adding up the squared deviation scores gives you a measure of the total error of your estimate

Regression Sum of Squares ideally you would pick an equation that minimized the sum of the squared deviations You would need a line is as close as possible to each point

Regression The regression line That line is called the regression line The sum of squared deviations from it is called the sum of squared error or SSE

Regression The regression line That line is called the regression line its equation is:

Regression remember: y = ax + b ax + b predicted y

Regression What happens if you had transformed all the scores to z scores and were trying to predict a z score?

Regression What happens if you had transformed all the scores to z scores and were trying to predict a z score? but… Sy = Sx = 1 So….

The Regression Line The regression line is a linear function that generates a y for a given x

The Regression Line The regression line is a linear function that generates a y for a given x What should its slope and y-intercept be to be the best predictor?

The Regression Line The regression line is a linear function that generates a y for a given x What should its slope and y-intercept be to be the best predictor? What does best predictor mean? It means least distance between the predicted y and an actual y for a given x

The Regression Line The regression line is a linear function that generates a y for a given x What should its slope and y-intercept be to be the best predictor? What does best predictor mean? It means least distance between the predicted y and an actual y for a given x in other words, how much variability is residual after using the correlation to explain the y scores

Mean Square Residual Recall that

Mean Square Residual The variance of Zy is the average squared distance of each point from the x axis (note that the mean of Zy = 0)

Mean Square Residual Some of the variance in the Zy scores is due to the correlation with x Some of the variance in the Zy scores is due to other (probably random) factors

Mean Square Residual the variance due to other factors is called “residual” because it is “leftover” after fitting a regression line The best predictor should minimize this residual variance

Mean Square Residual MSres is the average squared deviation of the actual scores from the regression line

Minimizing MSres the regression line (the best predictor of y) is the line with a slope and y intercept such that MSres is minimized

Minimizing MSres What will be its y intercept? if there was no correlation at all, your best guess for y at any x would be the mean of y if there was a strong correlation between x and y, your best guess for the y that matches the mean x would be the mean y the mean of Zx is zero so the best guess for the Zy that goes with it will be zero (the mean of the Zy’s)

Minimizing MSres In other words, the regression line will predict zero when Zx is zero so the y intercept of the regression line will be zero (only so for Z scores !)

Minimizing MSres y intercept is zero

Minimizing MSres what is the slope?

Minimizing MSres what is the slope? consider the extremes: Do the slopes look familiar? Zy = Zx Zy’=Zx slope = 1 Zy=-Zx Zy’=-Zx slope = -1 Zy is random with respect to Zx Zy’=mean Zy=0 slope = 0

Minimizing MSres a line (regression of Zy on Zx) that has a slope of rxy and a y intercept of zero minimizes MSres

Predicting raw scores we have a regression line in z scores: can we predict a raw-score y from a raw-score x?

Predicting raw scores recall that: and

Predicting raw scores by substituting we get:

Predicting raw scores + b a y = ax + b by substituting we get: note that this is still of the form: note that the slope still depends on r and the intercept still depends on the mean of y + b a y = ax + b

Interpreting rxy in terms of variance Recall that rxy is the slope of the regression line that minimizes MSres

Interpreting rxy in terms of variance Recall that rxy is the slope of the regression line that minimizes MSres

Interpreting rxy in terms of variance MSres can be simplified to:

Interpreting rxy in terms of variance Thus:

Interpreting rxy in terms of variance Thus: So can be thought of as the proportion of original variance accounted for by the regression line

Interpreting rxy in terms of variance Observed y Subtract this distance What % of this distance Regression Line is this distance Predicted y Mean of y

Interpreting rxy in terms of variance it follows that 1 - is the proportion of variance not accounted for by the regression line - this is the residual variance

Interpreting rxy in terms of variance this can be thought of as a partitioning of variance into the variance accounted for by the regression and the variance unaccounted for

Interpreting rxy in terms of variance this can be thought of as a partitioning of variance into the variance accounted for by the regression and the variance unaccounted for

Interpreting rxy in terms of variance often written in terms of sums of squares: or simply SStotal = SSregression + SSresidual