EC339: Lecture 6 Chapter 5: Interpreting OLS Regression.

Slides:



Advertisements
Similar presentations
Chapter 12 Simple Linear Regression
Advertisements

Review ? ? ? I am examining differences in the mean between groups
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter 12 Simple Linear Regression
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Chapter Topics Types of Regression Models
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Introduction to Probability and Statistics Linear Regression and Correlation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Leon-Guerrero and Frankfort-Nachmias,
Introduction to Regression Analysis, Chapter 13,
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Relationships Among Variables
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Simple Linear Regression
Correlation and Linear Regression
Chapter 8: Bivariate Regression and Correlation
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
Linear Regression Least Squares Method: the Meaning of r 2.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Descriptive & Inferential Statistics Adopted from ;Merryellen Towey Schulz, Ph.D. College of Saint Mary EDU 496.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Correlation/Regression - part 2 Consider Example 2.12 in section 2.3. Look at the scatterplot… Example 2.13 shows that the prediction line is given by.
POD 09/19/ B #5P a)Describe the relationship between speed and pulse as shown in the scatterplot to the right. b)The correlation coefficient, r,
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
Is their a correlation between GPA and number of hours worked? By: Excellent Student #1 Excellent Student #2 Excellent Student #3.
ANOVA, Regression and Multiple Regression March
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
Linear Regression Models Andy Wang CIS Computer Systems Performance Analysis.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
The simple linear regression model and parameter estimation
Regression and Correlation of Data Summary
CHAPTER 3 Describing Relationships
Inference for Least Squares Lines
(Residuals and
Econ 3790: Business and Economics Statistics
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Lecture Slides Elementary Statistics Thirteenth Edition
Section 3.3 Linear Regression
AP Statistics, Section 3.3, Part 1
Least-Squares Regression
Least-Squares Regression
Linear Regression and Correlation
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
Linear Regression and Correlation
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Presentation transcript:

EC339: Lecture 6 Chapter 5: Interpreting OLS Regression

Regression as Double Compression  DoubleCompression.xls DoubleCompression.xls  EastNorthCentralFTWorkers.xls EastNorthCentralFTWorkers.xls  [DoubleCompression]SATData  Math and Verbal scores Note simple statistics summary and correlation  Conditional Mean E[Y|X] Start with Scatterplot

First Compression  Examine values of Y, over various small ranges of X Recognize the ‘variance’ of Y in these strips  Create a conditional mean function Average values of Y, given X (VerticalStrips) Slide bar along X-axis  Move to Accordian Examine 0, 2, 4, and Many intervals

First Compression  Before Compressing: What is best guess at Math SAT unconditional on X?  After Compressing: What is best guess at Math SAT conditional on X?  Individual variation is hidden in “Graph of Averages”  Graphical equivalent of PivotTable (See AccordianPivot)  500+ observations summarized with 35

Second Compression  Linearizing graph of averages  ‘Regression line w/ observations & averages Smooth linear version of plot of averages Predicted Math SAT = Verbal SAT This equation, gives SIMILAR results to the previous compression  Now summarized with two numbers  Interpret as predicted Y given X

Only an interpretation not a method  “Double Compression” of the data is an interpretation of regression, not how it is done  Method is either analytical or numerical (or using an algorithm)  In [EastNorthCentralFTWorkers]Regression  Given education, estimate annual earnings PivotTable: E[earn|edu=12] = 27,933 Regression: E[earn|edu=12] = 28,399 What else might be going on here???

Regression and SD Line  [DoubleCompression]SATData  Examine two lines SD Line: y = x Reg Line: y = x  Notice slope of Regression line is shallower  Remember equation for slope in SIMPLE regression  Notice poor prediction with SD line  Calculate residuals

Another Example (SD vs. OLS)  Go back to Reg.xls, and calculate SSR from both lines. (SDx = 4.18, SDy = 50.56, xbar = 7.5, ybar = 145.6Reg.xls  OLS Line: y = x Calculate SD Line Compare SSR from SD and OLS lines What does point of averages mean here?

OLS Regression vs. SD  Simple Linear Regression (SLR) Slope is SD line slope * correlation Must have slope less than SD line

Two Regression Lines  Open TwoRegressionLines.xlsTwoRegressionLines.xls In PivotTables what do you notice? What happens in table TwoLines?  Do equations change when you switch axes?  Compare with SDLine How do you phrase the different regression lines?  Do these lines have different meanings?  Can you just solve one regression line to find the other? “Someone who is 89 points above the mean in Verbal (1 sd) is predicted to be 0.55 x 87 (r x SDmath) or 48 points above the mean Math score” (thus regress!)

Two Regression Lines  Given a verbal score what is the best guess of a person’s math score? Predicted Math SAT = Verbal SAT If Verbal = 600, Predicted Math SAT = 642  Given a math score, what is the best guess of a person’s verbal score Solve for verbal? From above  Predicted Verbal SAT = Math SAT  NO!!! This is not correct! Must regress verbal on math (verbal is predicted)  Predicted Verbal SAT = Math SAT  If Math SAT = 642, you would predict Verbal SAT = 538

Properties of Sample Average and Regression Line  Examine OLSFormula.xls, SampleAveIsOLSOLSFormula.xls Sample average is a weighted sum Sample average is also the least squares estimator of central tendency Here, weights sum to 1  Examine Excel’s “Auditing Tool”  Average SSR is never greater than Median SSR  Sample average is least squares estimate of measure of central tendency

Mean minimizes SSR (or OLS estimate)  Run Solver, minimize SSR by changing “Solver Estimate” starting at 100 Note the sum of the residuals (F16) Try with “Draw another dataset” in Live sheet What happens to sum of residuals? Try using the median

[OLSFormula.xls]Example  Recall “weighted sum” calculation of slope coefficient Regression goes through point of averages Slope is weighted sum of Y’s Weights bigger in absolute value the farther the x-value is from the average value of x Weights sum to zero Change in Y value has a predictable effect on OLS slope and intercept

Residuals and Root Mean Squared Error (RMSE) (ResidualPlot.xls)ResidualPlot.xls  Residual Plots are “diagnostic tools” There should be no discernable pattern in the plot Residual plots can be done in Excel and SPSS  Try using LINEST method here to calculate the residuals. First, find the equation, then calculate predicted values, and then calculate residuals  Remember (Residual = Actual Y – Predicted Y) Now, square the residuals, and find the average, and take the square root…  Root (of the) Mean (of the) Squared Errors  Measures your average mistake  Examine a scatterplot and histogram of the residuals

RMSE is “like” the standard deviation of the residuals (but slightly different, see RMSE.xls for true difference)RMSE.xls General measure of dispersion Also known as “Standard Error of the Regression”

RMSE.xls  For many data sets, 68% of the observations fall within +/- 1 RMSE and 95% fall within +/- 2 RMSEs. When the RMSE is working as advertised, it should reflect these facts  Try changing spread in Computation  Examine Histograms in SATData and Accordian to understand that RMSE is the spread of the residuals  Pictures sheet shows residual plots and regressions

RSquared.xls  Play the game to convey the idea of the improvement in prediction.  R 2 measures the percentage improvement in prediction over just guessing average Y.  R 2 ranges from 0 to 1.  R 2 is a dangerous statistic because it is sometimes confused as measuring the quality of a regression. Notice that Excel’s Trendline offers R 2 (and only this statistic) as an option. There is no single statistic that can be used to decide if a particular regression equation is good or bad.

R 2 Calculation  Total Sum of Squares (TSS or SST)  Sum of Squared Residuals (SSR)  Sum of Squares Explained (SSE)

Ordinary Least Squares-Fit Note: This is the RATIO of what variation in y is EXPLAINED by the regression!

Infant Mortality IMRGDPReg.xlsIMRGDPReg.xls  Start by moving data into SPSS  Plot data: What is relationship?  Save the residuals and plot against the independent variable  What and does this plot tell us?

SameRegLineDifferentData.xls

Real Data: HourlyEarnings.xlsHourlyEarnings.xls  Residuals: Why are data shown in “strips”?  Regress & Save residuals

Regression Interpretation: Summary  Simplified Conditional Mean (more on this)  Intercept and Slope coefficient  Need theory to guide causation BH call this “Two Regression Lines”  OLS is weighted average  RMSE and R 2 are helpful