Stat 324 – Day 28 Model Validation (Ch. 11).

Slides:



Advertisements
Similar presentations
All Possible Regressions and Statistics for Comparing Models
Advertisements

Chapter 5 Multiple Linear Regression
Kin 304 Regression Linear Regression Least Sum of Squares
Best subsets regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
12 Multiple Linear Regression CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Linear Regression
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Part I – MULTIVARIATE ANALYSIS C3 Multiple Linear Regression II © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Module 32: Multiple Regression This module reviews simple linear regression and then discusses multiple regression. The next module contains several examples.
Objectives of Multiple Regression
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Selecting Variables and Avoiding Pitfalls Chapters 6 and 7.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Chapter 7 Relationships Among Variables What Correlational Research Investigates Understanding the Nature of Correlation Positive Correlation Negative.
Introduction to regression 3D. Interpretation, interpolation, and extrapolation.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Stat 112 Notes 6 Today: –Chapter 4.1 (Introduction to Multiple Regression)
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Building the Regression Model –I Selection and Validation KNN Ch. 9 (pp )
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Logistic Regression & Elastic Net
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building.
Model selection and model building. Model selection Selection of predictor variables.
LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.
Linear Regression 1 Sociology 5811 Lecture 19 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Multiple Regression.
Chapter 15 Multiple Regression Model Building
Chapter 15 Multiple Regression and Model Building
Chapter 9 Multiple Linear Regression
Generalized regression techniques
Kin 304 Regression Linear Regression Least Sum of Squares
BPK 304W Regression Linear Regression Least Sum of Squares
BPK 304W Correlation.
Multiple Regression.
Regression Model Building
What is Regression Analysis?
Linear Model Selection and Regularization
Linear Model Selection and regularization
Stat 324 – Day 25 Penalized Regression.
Basis Expansions and Generalized Additive Models (1)
Regression Forecasting and Model Building
Lecture 20 Last Lecture: Effect of adding or deleting a variable
Presentation transcript:

Stat 324 – Day 28 Model Validation (Ch. 11)

Announcements Submit lab assignment tonight Be working on project 2 Feedback on project 1 Exam Thursday Posting review handout, problems Submit questions Tuesday night

Previously Variable selection Forward selection Backward selection Stepwise (mixed) selection Best Subsets

Practice Problem - Participation After running a best subsets through Minitab, it appears the best size of model to use would be with two variables: debt and part time. This is because this model contains the smallest S value, the smallest Mallow's cp, the smallest PRESS, and the largest r-squared adjusted. These are all indicators of a good-fitting model. Compared to the principal components analysis, we see this to be true as well, given that the first component is correlated the highest with debt and the second component is correlated highest with parttime. After running the best subsets procedure in JMP, I found that with just one variable, debt, we had explained roughly 96% of the variability in participation. While adding the variable for part-time work to the model does increase our R^2 to 99%, I simply do not think the jump in variability explained is impressive enough to justify adding another variable to the model. Since debt alone is doing such a good job in predicting participation, I would recommend sticking with the single variable model for ease of interpretation.

Previously – Penalized regression Bias vs. Variance

Previously – Penalized regression Original goal: more “robust” estimates of the slope coefficients More recently: can be used for variable selection   Variable selection? Shrinkage? Yes No Lasso, Elastic Net Ridge Forward selection Ordinary least squares

Which method? Shrinkage methods very helpful when p is close to, or even larger than, n Helpful when have lots of multicollinearity Might prefer Ridge if want to keep all the variables in the model (some information in all of the predictors) rather than variable elimination (none of the information from some) Use lasso if believe only a few of the predictors should be important (selection)

Previously: Piecewise Linear E(Y) = b­0 + b1x1 + b2(x1 – C)+ For x1 < C: E(Y) = ­b0 + b1x1 For x1 > C: E(Y) = (b0 - Cb2)+ (b1 + b2)x1 E(Y) = b0 + b1x1 + b2(x1 – C1)+ + b3(x1 – C2)+ For C1 < x1 < C2: E(Y) =(b0 - Cb2)+ (b1 + b2)x1 For x1 > C2 : E(Y) = b0 + b1x1 + b2(x1 – C1) + b3(x1 – C2) = (b0 – b2C1 – b3C2) + (b1 + b2 + b3)x1

Previously: Cubic Spines

Previously More smoothing…

Example: predicting diabetes

Validation Measures (Root) Mean squared prediction error R2 prediction    

Recap Whereas the Lasso method probably has a lower R2 value than the ordinary least squares, the regression coefficients and predictions are much more robust and hold up nearly as well for the test data. Improved prediction and interpretability.