Regression Analysis: Statistical Inference

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Chapter 12 Inference for Linear Regression
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Forecasting Using the Simple Linear Regression Model and Correlation
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Objectives (BPS chapter 24)
Simple Linear Regression
Chapter 12 Simple Linear Regression
Introduction to Regression Analysis
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Simple Linear Regression and Correlation
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Introduction to Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10: Correlation and Regression Model.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Inference about the slope parameter and correlation
23. Inference for regression
Chapter 20 Linear and Multiple Regression
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Correlation and Simple Linear Regression
Inference for Regression
Chapter 11: Simple Linear Regression
Chapter 11 Simple Regression
Slides by JOHN LOUCKS St. Edward’s University.
Correlation and Simple Linear Regression
The Practice of Statistics in the Life Sciences Fourth Edition
Correlation and Regression
Simple Linear Regression
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression
3.2. SIMPLE LINEAR REGRESSION
Introduction to Regression
St. Edward’s University
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Regression Analysis: Statistical Inference

Simple Linear Regression Model (SLR) Assume relationship to be linear y = 0 + 1x +  Where y = dependent variable x = independent variable 0 = y-intercept 1 = slope  = random error

Random Error Component () Makes this a probabilistic model... Represents uncertainty   random variation not explained by x Deterministic Model = Exact relationship Example: Temperature: oF = 9/5 oC + 32 Assets = Liabilities + Equity Probabilistic Model = Det. Model + Error

Model Parameters 0 and 1 Estimated from the data Data collected as a pair (x,y)

Model Assumptions E() = 0 Var() = 2  is normally distributed I are independent Before performing regression analysis, these assumptions should be validated.

Assumptions for Regression Unknown Relationship Y = b0 + b1X Recall that the model for the linear regression has the form Y=0+1X+. When you perform a regression analysis, several assumptions about the distribution of the error terms must be met to provide valid tests of hypothesis and confidence intervals. The assumptions are that the error terms 𝑒 ~ 𝑖.𝑖.𝑑 𝑁(0, 𝜎 2 ) have a mean of 0 at each value of the predictor variable are normally distributed at each value of the predictor variable have the same variance at each value of the predictor variable are independent, thus making them IID.

Scatter Plot of Correct Model Y = 3.0 + 0.5X R2 = 0.67 To illustrate the importance of plotting data, consider the following four examples. In each example, the scatter plot of the data values is different. However, the regression equation and the R-square statistic are the same. In the first plot, a regression line adequately describes the data.

Scatter Plot of Curvilinear Model Y = 3.0 + 0.5X R2 = 0.67 In the second plot, a simple linear regression model is not appropriate because you are fitting a straight line through a curvilinear relationship.

Scatter Plot of Outlier Model Y = 3.0 + 0.5X R2 = 0.67 In the third plot, there seems to be an outlying data value that is affecting the regression line.

Scatter Plot of Influential Model Y = 3.0 + 0.5X R2 = 0.67 In the fourth plot, the outlying data point dramatically changes the fit of the regression line. In fact, the slope would be undefined without the outlier.

Homogeneous Variance

Heterogeneous Variance

Model Assumptions (Cont.) Recall N(0, 2) 2 is unknown and must be estimated Recall one-sample case In regression, we have the Mean Squared Error to estimate 2.

Degrees of Freedom (df) In general, the df associated with the estimation of 2 in regression is n - (k + 1) where n = sample size k = number of independent variables “1” represents the intercept

Degrees of Freedom - Example Model y = 0 + 1x1+ 2x2 + 3x3 +  Degrees of Freedom associated with this model are

What Does MSE Mean? (see – Central Company Output) Just like the sample variance, a more intuitive meaning would come from the standard deviation Approximately 95% of all predicted values should be between 2s

Inferences about 1 Goal is to model the relationship between x and y via y = 0 + 1x +  What does it mean if there is no relationship?

Inferences about 1 (Cont.) Graphically...

Inferences about 1 (Cont.) What hypothesis are we interested in? We want to test whether 1 is significantly different from 0 That is, H0: 1 = 0 H1: 1  0

Inferences about 1 (Cont.) Need sampling dist. of the est. for 1 FACT: For the model y = 0 + 1x + , with N(0, 2), the LS estimator of 1 is normal with a mean of 1 and a variance of 2/SSxx.

Inferences about 1 (Cont.) Test Statistic Has (n-2) degrees of freedom for SLR 1 normally will be 0 because we just want to determine if there is a relationship between x and y

Hypothesis Test for 1 Null Hypothesis: H0: 1 = 0 Alternative Hypothesis H1: 1 < 0 H1: 1 > 0 H1: 1  0 Test Statistic Rejection Region - Rej. H0 if tobs < -t,df - Rej. H0 if tobs > t,df - Rej. H0 if tobs < -t/2,df or if tobs > t/2,df Decision and Conclusion in terms of problem

Confidence Interval for 1 A 100(1-)% confidence interval (CI) for 1 is given by Interpretation: We are 100(1-)% confident that the true mean change in response per unit change in x is within the LCL and the UCL for 1. What affects CI? confidence level sample size

Inferences about Slope - Example 1 The director of admissions of a small college administered a newly designed entrance test to 20 students selected at random from the new freshman class in a study to determine whether a student’s grade point average (GPA) at the end of the freshman year (y) can be predicted from the entrance test score (x).

Inferences about Slope - Example 1(Cont.) Obtain the least squares estimates of 0 and 1, and state the estimated regression equation.

Inferences about Slope - Example 1(Cont.) Obtain a 99% confidence interval for 1. Interpret your confidence interval.  = 0.01, /2 = 0.005, df = 20 - 2 = 18 t0.005,18 = 2.878 99% Confidence interval is: 0.8399  2.878 (0.4350)/3.0199 0.8399  0.4146 (0.4253, 1.2545) Interpretation: We are 99% confident that the true value of 1 will be contained in the above interval. Meaning: ?

F Test for Linear Regression Model To test H0: 1= 2 = …= k = 0 versus Ha: At least one of the 1, 2, …, k is not equal to 0 Test Statistic: Reject H0 in favor of Ha if: F(model) > Fa or p-value < a Fa is based on k numerator and n-(k+1) denominator degrees of freedom.

The Partial F Test: Testing the Significance of a Portion of a Regression Model To test H0: g+1= g+2 = …= k = 0 versus Ha: At least one of the g+1, g+2, …, k is not equal to 0 Partial F Statistic: Reject H0 in favor of Ha if: F > Fa or p-value < a Fa is based on k-g numerator and n-(k+1) denominator degrees of freedom.

Multiple Regression Salsberry Realty

Estimation & Prediction The fitted SLR model is Estimating y at a given value of x, say xp, yields the same value as predicting y at a given value of xp. Difference is in precision of the estimate... the sampling errors

Estimation & Prediction (Cont.) Sampling Error for the Estimate of the mean of y at xp Sampling Error for the Prediction of y at xp

Estimation & Prediction (Cont.) A 100(1-)% Confidence Interval for y at x=xp is given by

Estimation & Prediction (Cont.) A 100(1-)% Prediction Interval for y at x=xp is given by