Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory.

Slides:



Advertisements
Similar presentations
Inference for Regression
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Multiple Linear Regression
Regression With Categorical Variables. Overview Regression with Categorical Predictors Logistic Regression.
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Statistics for Managers Using Microsoft® Excel 5th Edition
Part I – MULTIVARIATE ANALYSIS C3 Multiple Linear Regression II © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Chapter 12 Multiple Regression
Multiple Regression MARE 250 Dr. Jason Turner.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Ch. 14: The Multiple Regression Model building
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Linear Regression/Correlation
Multiple Regression Models
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Correlation & Regression
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Chapter 14 Introduction to Multiple Regression Sections 1, 2, 3, 4, 6.
Regression Method.
Simple Linear Regression
Understanding Statistics
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
LOGO Chapter 4 Multiple Regression Analysis Devilia Sari - Natalia.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Statistical Methods Statistical Methods Descriptive Inferential
Session 10. Applied Regression -- Prof. Juran2 Outline Binary Logistic Regression Why? –Theoretical and practical difficulties in using regular (continuous)
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Correlation & Regression Analysis
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Specification: Choosing the Independent.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Building the Regression Model –I Selection and Validation KNN Ch. 9 (pp )
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Applied Quantitative Analysis and Practices LECTURE#28 By Dr. Osman Sadiq Paracha.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
Canadian Bioinformatics Workshops
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Regression Analysis.
CHAPTER 3 Describing Relationships
Chapter 9 Multiple Linear Regression
Regression.
Product moment correlation
Presentation transcript:

Shonda Kuiper Grinnell College

Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory variable. Response variable measures the outcome of a study. Explanatory variable explain changes in the response variable.

Each variable can be classified as either categorical or quantitative. Categorical Quantitative Chi-Square test Two proportion test Two-sample t-test ANOVA Logistic Regression Regression Categorical data place individuals into one of several groups (such as red/blue/white, male/female or yes/no). Quantitative data consists of numerical values for which most arithmetic operations make sense.

= = where i =1,2 j = 1,2,3,4

The theoretical model used in the two-sample t-test is designed to account for these two group means ( µ 1 and µ 2 ) and random error. observed mean random value response error = + where i =1,2 j = 1,2,3,4 where i =1,2 j = 1,2,3,4

= where i = 1,2 and j = 1,2,3,4

+ observed mean random value response error = + where i =1,2 j = 1,2,3,4

observed mean random value response error = + where i =1,2 j = 1,2,3,4 where i = 1,2, …, 8

= where i = 1,2,…,8

= where i = 1,2,…,8

When there are only two groups (and we have the same assumptions), all three models are algebraically equivalent. where i =1,2 j = 1,2,3,4 where i =1,2 j = 1,2,3,4 where i = 1,2, …, 8

Shonda Kuiper Grinnell College

Multiple regression analysis can be used to serve different goals. The goals will influence the type of analysis that is conducted. The most common goals of multiple regression are to: Describe: A model may be developed to describe the relationship between multiple explanatory variables and the response variable. Predict: A regression model may be used to generalize to observations outside the sample. Confirm: Theories are often developed about which variables or combination of variables should be included in a model. Hypothesis tests can be used to evaluate the relationship between the explanatory variables and the response.

Build a multiple regression model to predict retail price of cars Price = – 0.22 Mileage R-Sq: 4.1% Slope coefficient (b1): t = (p-value = 0.004) Questions:  What happens to Price as Mileage increases?

Build a multiple regression model to predict retail price of cars Price = – 0.22 Mileage R-Sq: 4.1% Slope coefficient (b1): t = (p-value = 0.004) Questions:  What happens to Price as Mileage increases?  Since b 1 = is small can we conclude it is unimportant?

Build a multiple regression model to predict retail price of cars Price = – 0.22 Mileage R-Sq: 4.1% Slope coefficient (b1): t = (p-value = 0.004) Questions:  What happens to Price as Mileage increases?  Since b 1 = is small can we conclude it is unimportant?  Does mileage help you predict price? What does the p-value tell you?

Build a multiple regression model to predict retail price of cars Price = – 0.22 Mileage R-Sq: 4.1% Slope coefficient (b1): t = (p-value = 0.004) Questions:  What happens to Price as Mileage increases?  Since b 1 = is small can we conclude it is unimportant?  Does mileage help you predict price? What does the p-value tell you?  Does mileage help you predict price? What does the R-Sq value tell you?

Build a multiple regression model to predict retail price of cars Price = – 0.22 Mileage R-Sq: 4.1% Slope coefficient (b1): t = (p-value = 0.004) Questions:  What happens to Price as Mileage increases?  Since b 1 = is small can we conclude it is unimportant?  Does mileage help you predict price? What does the p-value tell you?  Does mileage help you predict price? What does the R-Sq value tell you?  Are there outliers or influential observations?

What happens when all the points fall on the regression line? 0

What happens when the regression line does not help us estimate Y?

R 2 adj includes a penalty when more terms are included in the model. n is the sample size and p is the number of coefficients (including the constant term β 0, β 1, β 2, β 3,…, β p-1 ) When many terms are in the model: p is larger R 2 adj is smaller (n – 1)/(n-p) is larger

Price = – 0.22 Mileage R-Sq: 4.1% Slope coefficient (b1): t = (p-value = 0.004)

Shonda Kuiper Grinnell College

Build a multiple regression model to predict retail price of cars R 2 = 2%

Build a multiple regression model to predict retail price of cars R 2 = 2% Mileage Cylinder Liter Leather Cruise Doors Sound

Build a multiple regression model to predict retail price of cars R 2 = 2% Mileage Cylinder Liter Leather Cruise Doors Sound Price = Cruise Cyl -1543Doors Leather - 787Liter -0.17Mileage Sound R 2 = 44.6%

Step Forward Regression (Forward Selection): Which single explanatory variable best predicts Price? Price = CruiseR 2 = 18.56%

Step Forward Regression: Which single explanatory variable best predicts Price? Price = CruiseR 2 = 18.56% Price = CylR 2 = 32.39%

Step Forward Regression: Which single explanatory variable best predicts Price? Price = CruiseR 2 = 18.56% Price = CylR 2 = 32.39% Price = – 0.17MileageR 2 = 2.04%

Step Forward Regression: Which single explanatory variable best predicts Price? Price = CruiseR 2 = 18.56% Price = CylR 2 = 32.39% Price = – 0.17MileageR 2 = 2.04% Price = LiterR 2 = 31.15%

Step Forward Regression: Which single explanatory variable best predicts Price? Price = CruiseR 2 = 18.56% Price = CylR 2 = 32.39% Price = – 0.17MileageR 2 = 2.04% Price = LiterR 2 = 31.15% Price = – SoundR 2 = 1.55% Price = LeatherR 2 = 2.47% Price = DoorsR 2 = 1.93%

Step Forward Regression: Which combination of two terms best predicts Price? Price = CylR 2 = 32.39% Price = Cyl CruiseR 2 = 38.4% (38.2%)

Step Forward Regression: Which combination of two terms best predicts Price? Price = CylR 2 = 32.39% Price = Cyl – 0.152MileageR 2 = 34% (33.8)

Step Forward Regression: Which combination of two terms best predicts Price? Price = CylR 2 = 32.39% Price = Cyl LiterR 2 = 32.6% (32.4%)

Step Forward Regression: Which combination of terms best predicts Price? Price = CylR 2 = 32.39% Price = Cyl CruiseR 2 = 38.4% (38.2%) Price = Cyl +6362Cruise Leather R 2 = 40.4% (40.2%) Price = Cyl +6492Cruise Leather -0.17Mileage R 2 = 42.3% (42%) Price = Cyl +6320Cruise Leather -0.17Mileage – 1402Doors R 2 = 43.7% (43.3%) Price = Cyl Cruise Leather -0.17Mileage – 1463Doors – 2024Sound R 2 = 44.6% (44.15%) Price = Cyl Cruise Leather -787Liter -0.17Mileage -1543Doors Sound R 2 = 44.6% (44.14%)

Step Forward Regression: Which single explanatory variable best predicts Price? Price = CruiseR 2 = 18.56% Price = CylR 2 = 32.39% Price = – 0.17MileageR 2 = 2.04% Price = LiterR 2 = 31.15% Price = – SoundR 2 = 1.55% Price = LeatherR 2 = 2.47% Price = DoorsR 2 = 1.93%

Step Backward Regression (Backward Elimination): Price = Cyl Cruise Leather -0.17Mileage – 1463Doors – 2024Sound R 2 = 44.6% (44.15%) Price = Cyl Cruise Leather -787Liter -0.17Mileage -1543Doors Sound R 2 = 44.6% (44.14%) Other techniques, such as Akaike information criterion, Bayesian information criterion, Mallows’ Cp, are often used to find the best model. Bidirectional stepwise procedures

Best Subsets Regression: Here we see that Liter is the second best single predictor of price.

Important Cautions: Stepwise regression techniques can often ignore very important explanatory variables. Best subsets is often preferable. Both best subsets and stepwise regression methods only consider linear relationships between the response and explanatory variables. Residual graphs are still essential in validating whether the model is appropriate. Transformations, interactions and quadratic terms can often improve the model. Whenever these iterative variable selections techniques are used, the p-values corresponding to the significance of each individual coefficient are not reliable.