Regression Model Building

Slides:



Advertisements
Similar presentations
All Possible Regressions and Statistics for Comparing Models
Advertisements

Chapter 5 Multiple Linear Regression
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Multiple Linear Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Part I – MULTIVARIATE ANALYSIS C3 Multiple Linear Regression II © Angel A. Juan & Carles Serrat - UPC 2007/2008.
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Lasso regression. The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Nonstationary Time Series Examples Consider the Chemical Process Viscosity data set Is this time series stationary? Why or why not? Time effect present?
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Regression Model Building
Correlation & Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Selecting Variables and Avoiding Pitfalls Chapters 6 and 7.
Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
1 Model Selection Response: Highway MPG Explanatory: 13 explanatory variables Indicator variables for types of car – Sports Car, SUV, Wagon, Minivan There.
Statistical Methods Statistical Methods Descriptive Inferential
Regression. Population Covariance and Correlation.
Regression Model Building LPGA Golf Performance
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
How Good is a Model? How much information does AIC give us? –Model 1: 3124 –Model 2: 2932 –Model 3: 2968 –Model 4: 3204 –Model 5: 5436.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Specification: Choosing the Independent.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Building the Regression Model –I Selection and Validation KNN Ch. 9 (pp )
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
© Galit Shmueli and Peter Bruce 2010 Chapter 6: Multiple Linear Regression Data Mining for Business Analytics Shmueli, Patel & Bruce.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
DEMONSTRATION OF USING SPSS Logistic Regression Models for Prediction 2016/11/71.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Robert Plant != Richard Plant
Chapter 9 Multiple Linear Regression
How Good is a Model? How much information does AIC give us?
Regression Diagnostics
Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess
Chapter Six Normal Curves and Sampling Probability Distributions
Forward Selection The Forward selection procedure looks to add variables to the model. Once added, those variables stay in the model even if they become.
Business Statistics, 4e by Ken Black
Chapter 6: Multiple Linear Regression
Lasso/LARS summary Nasimeh Asgarian.
Regression Model Building
Linear Model Selection and Regularization
Stat 324 – Day 28 Model Validation (Ch. 11).
Linear Model Selection and regularization
Cross-validation for the selection of statistical models
Bootstrapping Jackknifing
Multiple Linear Regression
Example on the Concept of Regression . observation
Model generalization Brief summary of methods
Lecture 20 Last Lecture: Effect of adding or deleting a variable
Chapter 11 Variable Selection Procedures
Presentation transcript:

Regression Model Building Predicting Number of Crew Members of Cruise Ships

Data Description n=158 Cruise Ships Dependent Variable – Crew Size (100s) Potential Predictor Variables Age (2013 – Year Built) Tonnage (1000s of Tons) Passengers (100s) Length (100s of feet) Cabins (100s) Passenger Density (Passengers/Space)

Data – First 20 Cases

Full Model (6 Predictors, 7 Parameters, n=158)

Backward Elimination – Model Based AIC (minimize)

Forward Selection (AIC Based)

Stepwise Regression (AIC Based)

Summary of Automated Models Backward Elimination Drop Passenger Density (AIC drops from 1.055 to -0.943) Drop Age (AIC drops from -0.943 to -2.062) Stop: Keep Tonnage, Passengers, Length, Cabins Forward Selection Add Cabins (AIC drops from 397.18 to 28.82) Add Length (AIC drops from 28.82 to 9.8661) Add Passengers (AIC drops from 9.8661 to -0.0565) Add Tonnage (AIC drops from -0.0565 to -2.06) Stepwise – Same as Forward Selection

All Possible (Subset) Regressions

All Possible (Subset) Regressions (Best 4 per Grp) BIC Adj-R2 Cp

Cross-Validation Hold-out Sample (Training Sample = 100, Validation = 58) Fit Model on Training Sample, and obtain Regression Estimates Apply Regression Estimates from Training Sample to Validation Sample X levels for Predicted  MSEP = sum(obs-pred)2/n Fit Model on Validation Sample and Compare regression coefficients with model for Training Sample PRESS Statistic (Delete observations 1-at-a-time) Fit model with each observation deleted 1-at-a-time Obtain Residual for each observation when it was deleted PRESS = sum(obs-pred(deleted))2 K-fold Cross-validation Extension of PRESS to where K groups of cases are deleted Useful for computationally intensive models

Hold-Out Sample – nin = 100 nout = 58 Coefficients keep signs, but significance levels change a lot. See Tonnage and Length.

Testing Bias = 0 from Training data to Validation

PRESS Statistic Model appears to be valid, very little difference between PRESS/n and MS(Resid)

K-fold Cross-Validation Results (k=10)