Multiple Regression Petter Mostad 2005.10.17. Review: Simple linear regression We define a model where are independent (normally distributed) with equal.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Ch11 Curve Fitting Dr. Deshi Ye
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
Lecture 23: Tues., Dec. 2 Today: Thursday:
BA 555 Practical Business Analysis
The Simple Regression Model
Petter Mostad Linear regression Petter Mostad
SIMPLE LINEAR REGRESSION
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Chapter 11 Multiple Regression.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
SIMPLE LINEAR REGRESSION
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Simple Linear Regression and Correlation
Simple Linear Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Chapter 13: Inference in Regression
Chapter 12 Multiple Regression and Model Building.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Simple Linear Regression Models
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
Correlation & Regression
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Simple linear regression Tron Anders Moger
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Correlation & Regression Analysis
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
The simple linear regression model and parameter estimation
Chapter 20 Linear and Multiple Regression
Inference for Least Squares Lines
Essentials of Modern Business Statistics (7e)
Correlation and Regression
CHAPTER 29: Multiple Regression*
Multiple Regression Models
SIMPLE LINEAR REGRESSION
Product moment correlation
SIMPLE LINEAR REGRESSION
Presentation transcript:

Multiple Regression Petter Mostad

Review: Simple linear regression We define a model where are independent (normally distributed) with equal variance We can then use data to estimate the model parameters, and to make statements about their uncertainty

Multiple regression model The errors are independent random (normal) variables with expectation zero and variance The parameters are estimated by minimizing the sum of squares of errors, as before

Choice of independent variables for the model The set of measured variables that the response variable might depend on Do we expect the relationship to the response variable to be linear? The explanatory (independent) variables x 1i, x 2i, …, x Ki cannot be linearily related

Questions asked in connection with regression What would be a prediction of the dependent variable given new values of the independent variables? HOW do various independent variables influence the dependent variable? –Difficult question, as it depends on the WHOLE model!

Least squares estimation The least squares estimates of are the values b 1, b 2, …, b K minimizing They can be computed with similar but more complex formulas as with simple regression

Explanatory power Defining We get as before We define We also get that Coefficient of determination

Adjusted coefficient of determination Adding more independent variables will generally increase SSR and decrease SSE Thus the coefficient of determination will tend to indicate that models with many variables always fit better. To avoid this effect, the adjusted coefficient of determination may be used:

Drawing inference about the model parameters Similar to simple regression, we get that the following statistic has a t distribution with n-K-1 degrees of freedom: where b j is the least squares estimate for and s bj is its estimated standard deviation s bj is computed from SSE and the correlation between independent variables

Confidence intervals and hypothesis tests A confidence interval for becomes Testing the hypothesis vs –Reject if or

Testing sets of parameters We can also test the null hypothesis that a specific set of the betas are simultaneously zero. The alternative hypothesis is that at least one beta in the set is nonzero. The test statistic has an F distribution, and is computed by comparing the SSE in the full model, and the SSE when setting the parameters in the set to zero.

Making predictions from the model As in simple regression, we can use the estimated coefficients to make predictions As in simple regression, the uncertainty in the predictions has two sources: –The variance around the regression estimate –The variance of the estimated regression model

Nonlinear transformations and models Sometimes, a linear model does not fit Some alternatives: –Make a transformation of the y values –Make a transformation of the x values –Predict y from a combination of transformations of the x values

Example 1 When then Use standard formulas on the pairs (x 1,log(y 1 )), (x 2, log(y 2 )),..., (x n, log(y n )) We get estimates for log(a) and b, and thus a and b

Example 2 Another natural model may be We get that Use standard formulas on the pairs (log(x 1 ), log(y 1 )), (log(x 2 ), log(y 2 )),...,(log(x n ),log(y n )) Note: In this model, the curve goes through (0,0)

Example 3 Assume data (x 1,y 1 ),..., (x n,y n ) seem to follow a third degree polynomial We use multivariate regression on (x 1, x 1 2, x 1 3, y 1 ), (x 2, x 2 2, x 2 3, y 2 ),... We get estimated a,b,c,d, in a third degree polynomial curve

Indicator variables Binary variables (yes/no, male/female, …) can be represented as 1/0, and used as independent variables. Also called dummy variables in the book. When used directly, they influence only the constant term of the regression It is also possible to use a binary variable so that it changes both constant term and slope for the regression

Part 2: Problem solving with SPSS 1.Going from practial problem to statistically formulated problem 2.Clicking the right places in SPSS 3.Interpreting the result produced by SPSS Often, several rounds of using SPSS and interpreting the results will be needed

From practical problem to statistical problem In practice the hardest part In order to do it, you need to have an understanding of the statistical models available, and the kind of questions they can answer Practice is key

Example You want to investigate the cost and effect of a medical procedure X, and compare it to traditional procedure Y. Your data is, for 40 patients: –The type performed (X or Y) –The cost –The effect, measured by some number eff How would you analyze this?

Key starting point: what questions to ask Is there a difference in cost between X and Y? Is there a difference in effect between X and Y? What is the relationship between cost and effect? Is this relationship different for X and Y?

Example (cont.) You find out that the procedures have been performed by 20 doctors, each doctor performing one X and one Y procedure. Can this help you in your analysis? If we had only data for X patients, what kind of questions could we answer?

Analysis in SPSS: Data input and transformation The format is always: A number of variables (columns) observed for a number of cases (rows) Manual data input, or from tables (e.g., Excel). The rows must always correspond to ”cases” (so that in example with 20 doctors, data must be moved into 20 rows) Transformation of variables

Analysis in SPSS: Data exploration ALWAYS start with exploring data with –descriptive statistics –graphs From this, you can find out: –Strange observations (”outliers”) –Unexpected relationships and effects –Whether your intended analysis model is appropriate

Analysis in SPSS: Fitting models and doing tests, checking the fit Use ”Analyze” and one or more appropriate models to answer the questions you have raised Sometimes the results indicate that the model needs to be changed Several types of plots should be used to investigate the fit of the model –Plotting residuals against independent and dependent variables is useful