Analisa Regresi Week 7 The Multiple Linear Regression Model

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Topic 12: Multiple Linear Regression
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Linear regression models
EPI 809/Spring Probability Distribution of Random Error.
Simple Linear Regression and Correlation
Objectives (BPS chapter 24)
Chapter 13 Multiple Regression
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Multiple regression analysis
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
Statistics for Business and Economics
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Chapter 11 Multiple Regression.
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
1 732G21/732A35/732G28. Formal statement  Y i is i th response value  β 0 β 1 model parameters, regression parameters (intercept, slope)  X i is i.
Simple Linear Regression and Correlation
Chapter 7 Forecasting with Simple Regression
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Simple Linear Regression ANOVA for regression (10.2)
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Inference for Least Squares Lines
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Statistics for Managers using Microsoft Excel 3rd Edition
Linear Regression and Correlation Analysis
Chapter 11 Simple Regression
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Multiple Regression Models
Introduction to Regression
Presentation transcript:

Analisa Regresi Week 7 The Multiple Linear Regression Model Key ideas from case study Model Description and assumption The general linear model and the least square procedure Inference for multiple regression

Key ideas from case study (2) The relationship between a response variable and an explanatory variable depends on what other explanatory variables are in the model A variable can be a significant (P<.05) predictor alone and not significant (P>0.5) when other Xs are in the model

Key ideas from case study (1) First, look at graphical and numerical summaries for one variable at a time Then, look at relationships between pairs of variables with graphical and numerical summaries. Use plots and correlations

Key ideas from case study (3) Regression coefficients, standard errors and the results of significance tests depend on what other explanatory variables are in the model

Key ideas from case study (4) Significance tests (P values) do not tell the whole story Squared multiple correlations give the proportion of variation in the response variable explained by the explanatory variables) can give a different view We often express R2 as a percent

Key ideas from case study (5) You can fully understand the theory in terms of Y = Xβ + ε To effectively use this methodology in practice you need to understand how the data were collected, the nature of the variables, and how they relate to each other

Model Description and Assumptions Consider an experiment in which data are generated of the type

Model Description and Assumptions (2) If the experimenter is willing to assume that in the region of the x’s defined by the data, yi is related approximately to the regressor variables, then the model formulation: (1) where Yi is the response variable for the ith case Xi1, Xi2, … , Xik are k explanatory variables for cases i = 1 to n and n ≥ k + 1 ε i is a model error β0 is the intercept β1, β2, … , βk are the regression coefficients for the explanatory variables

Model Description and Assumptions (3) What is a Linear Model? A linear model is defined as a model that is linear in the parameter, i.e., linear in the coefficients, the β’s in the eq (1) For example : A model quadratic in X A linear model with interaction

Model Description and Assumptions (4) What is the meaning of Regression Coefficients? βo is the Y intercept of the regression plane. If the scope of the model includes Xi1 = 0, … , Xik = 0, it gives the mean response at Xi1 = 0, … , Xik = 0. The parameter β1 indicates the change in the mean response E(Y) per unit change in X1 when other X’s held constant. The β’s are often caled partial regression coefficients because they reflect the partial effect on one independent variable when the other variables are included in the model and are held constant.

Model Description and Assumptions (5) ε i are independent normally distributed random errors with mean 0 and variance σ2 Xji are not random and are measured with negligible error

The GLM and the Least Square Procedure In matrix term: Where: Y is a vector of responses β is a vector of parameters X is a matrix of constants ε is a vector of model errors which ε ~ N(0,σ2I) Consequently, Y ~ N(Xβ,σ2I)

The GLM and the Least Square Procedure (2) minimize e’e = (Y-Xb)’(Y-Xb) The least square normal equations: (X’X)b = X’y Assuming X is of full column rank, b =(X’X)-1X’y

The GLM and the Least Square Procedure (3) The fitted predicted values: And the residual terms by Or The variance-covariance matrix of the residuals is Which is estimated by s2{e}= MSE(I-H)

Distribution of b

Estimation of variance of b

ANOVA Table To organize arithmetic Sources of variation are Model (SAS) or Regression (NKNW) Error (SAS, NKNW) or Residual Total SS and df add SSM + SSE =SST dfM + dfE = dfT

SS

SS (2) The sum squares for ANOVA in matrix terms are :

df

Mean Squares

Mean Squares (2)

ANOVA Table Source SS df MS Model SSM dfM MSM Error SSE dfE MSE Total SST dfT (MST) F = MSM/MSE

ANOVA F test H0: β1 = β2 = … βp-1 = 0 Ha: βk neq 0, for at least one k=1, … , p-1 Another form of the null hypothesis is H0: β1 = 0, and β2 = 0, … , and βp-1 = 0 Under H0, F ~ F(p-1,n-p) Reject H0 if F is large, use P value

Example NKNW p 249 The Zartan company sells a special skin cream through fashion stores exclusively in 15 districts Y is sales X1 is target population X2 is per capita discretionary income n = 15 districts

Check the data obs yi x1i x2i 1 162 274 2450 9 116 195 2137 2 120 180 3254 10 55 53 2560 3 223 375 3802 11 252 430 4020 4 131 205 2838 12 232 372 4427 5 67 86 2347 13 144 236 2660 6 169 265 3782 14 103 157 2088 7 81 98 3008 15 212 370 2605 8 192 330  

Hypothesis Tested by F H0: β1 = β2 = … βp-1 = 0 F = MSM/MSE Reject H0 if the P value is leq .05

ANOVA Table What do we conclude?

R2 The squared multiple regression correlation (R2) gives the proportion of variation in the response variable explained by the explanatory variables included in the model It is usually expressed as a percent It is sometimes called the coefficient of multiple determination (NKNW p 230)

R2 (2) R2 = SSM/SST, the proportion of variation explained R2 = 1 – (SSE/SST), 1 – the proportion of variation not explained H0: β1 = β2 = … βp-1 = 0 is equivalent to H0: the population R2 is zero F = [ (R2)/(p-1) ] / [ (1- R2)/(n-p) ]

What and Why At this point we have examined the distribution of the explanatory variables (and the response variable if that is appropriate) and we have taken remedial measures where needed We have looked at plots and numerical summaries

What and Why (2) The P-value for the F significance test tells us one of the following: there is no evidence to conclude that any of our explanatory variables can help us to model the response variable using this kind of model (P gt .05) one or more of the explanatory variables in our model is potentially useful for predicting the response variable in a linear model (P leq .05)

R2 output R-Sq = 0.999 Adj R-Sq = 0.999 Coeff Var = 6.0

Inference for individual regression coefficients b ~ N(β, (s(bi, bk))) s(bi, bi) = s2(bi) CI: bi ± t*s(bi) Significance test for H0i: βi, = 0 uses the test statistic t =bi/s(bi), df=dfE=n-p, and the P-value computed from the t(n-p) distribution

Coef. Regr Est Par St Var Est Err t P Int 3.453 2.431 1.420 0.181 Pop 0.496 0.006 81.924 <.0001 income 0.009 0.001 9.502 <.0001

Estimation by Doolittle General Format: X’X | X’Y | I By using row transformation we’ll have I | b | (X’X)-1 Example : see worksheet 1

Estimation of E(Yh) Xh is now a vector (1, Xh1, Xh2, … , Xh1)’ We want an point estimate and a confidence interval for the subpopulation mean corresponding to Xh

Estimation of E(Yh) (2) The mean response to be estimated is The estimated mean response corresponding to Xh, is This estimated is unbiased and its variance is

Estimation of E(Yh) (3) The estimated variance s2(Ŷh) is given by The 1 – α CI for E(Yh) are Example (in the class)

F Test for Lack of Fit It is used when a data set requires repeat observation Repeat observation in multiple regression are replicate observation on Y coresponding to level of each of the X variables that are constant from trial to trial. Whit two independent variables repeat observations require that X1 and X2 each remain at given levels from trial to trial.

F Test for Lack of Fit (2) SSE is decomposed into pure error and lack of fit component SSE = SSPE + SSLF The pure error sum of square SSPE is obtained by first calculating for each replicate group the sum of squared deviations of the Y observation around the group mean, where a replicate group has the same values for each of the X variables.

F Test for Lack of Fit (3) If the linear regression function is appropriate, then the means Ȳj will be near the fitted values Ŷij calculated from the esyimated linear regression function and SSLF will be small. See the illistration on NWK page 138. Df(SSPE) = n – c and df(SSLF) = (n-p)-(n-c)=c-p where c is is the number of replicate group

F Test for Lack of Fit (4) The hypotheses statements are: The approriate test statistic is And the appropriate decision rule is conclude Ho if

Prediction of new observation Yh(new) Xh is now a vector (1, Xh1, Xh2, … , Xh1)’ We want a prediction for Yh with an interval that expresses the uncertainty in our prediction

Prediction of new observation Yh(new)(2) The prediction of new observation Yh(new) corresponding to Xh has 1 – α CI of Where Example (in the class)

Prediction of new observation Yh(new)(3) When m new observation at Xh and their mean Ȳh(new) is to be predicted, the 1- corresponding to has 1 – α CI of Where Example (in the class)

Last slide Reading NKMW 7.1 to 7.8 Exercise NKMW page 264 no 7.8-7.11 Homework NKMW page 264-267 n0 7.12-7.19