Regression Model Building

Slides:



Advertisements
Similar presentations
All Possible Regressions and Statistics for Comparing Models
Advertisements

Chapter 5 Multiple Linear Regression
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Multiple Linear Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Part I – MULTIVARIATE ANALYSIS C3 Multiple Linear Regression II © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Lasso regression. The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Nonstationary Time Series Examples Consider the Chemical Process Viscosity data set Is this time series stationary? Why or why not? Time effect present?
Regression Model Building
Correlation & Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Simple Linear Regression
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Statistical Methods Statistical Methods Descriptive Inferential
Regression. Population Covariance and Correlation.
Regression Model Building LPGA Golf Performance
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Subjects Review Introduction to Statistical Learning Midterm: Thursday, October 15th :00-16:00 ADV2.
Stat 112 Notes 6 Today: –Chapter 4.1 (Introduction to Multiple Regression)
How Good is a Model? How much information does AIC give us? –Model 1: 3124 –Model 2: 2932 –Model 3: 2968 –Model 4: 3204 –Model 5: 5436.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Specification: Choosing the Independent.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Building the Regression Model –I Selection and Validation KNN Ch. 9 (pp )
© Galit Shmueli and Peter Bruce 2010 Chapter 6: Multiple Linear Regression Data Mining for Business Analytics Shmueli, Patel & Bruce.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Canadian Bioinformatics Workshops
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Generalized linear models
Correlation, Bivariate Regression, and Multiple Regression
Chapter 9 Multiple Linear Regression
How Good is a Model? How much information does AIC give us?
Regression Diagnostics
Chapter Six Normal Curves and Sampling Probability Distributions
Business Statistics, 4e by Ken Black
بحث في التحليل الاحصائي SPSS بعنوان :
Regression Model Building - Diagnostics
Solutions of Tutorial 10 SSE df RMS Cp Radjsq SSE1 F Xs c).
Linear Regression.
Lasso/LARS summary Nasimeh Asgarian.
Model Selection In multiple regression we often have many explanatory variables. How do we find the “best” model?
Hypothesis testing and Estimation
Regression Model Building
Stat 324 – Day 28 Model Validation (Ch. 11).
Linear Model Selection and regularization
Bootstrapping Jackknifing
Regression Model Building - Diagnostics
Combined predictor Selection for Multiple Clinical Outcomes Using PHREG Grisell Diaz-Ramirez.
Multiple Linear Regression
Example on the Concept of Regression . observation
Solutions of Tutorial 9 SSE df RMS Cp Radjsq SSE1 F Xs c).
Ch 4.1 & 4.2 Two dimensions concept
Business Statistics, 4e by Ken Black
Presentation transcript:

Regression Model Building Predicting Number of Crew Members of Cruise Ships

Data Description n=158 Cruise Ships Dependent Variable – Crew Size (100s) Potential Predictor Variables Age (2013 – Year Built) Tonnage (1000s of Tons) Passengers (100s) Length (100s of feet) Cabins (100s) Passenger Density (Passengers/Space)

Correlation Plot / Matrix High correlations among the predictors. (Passengers and Cabins is very problematic) Passengers/Cabins: r=.976 Tonnage / Cabins: r=.949 Tonnage / Passengers: r=.945 Consider model with Predictors: Age, Tonnage, Passdens, Cabins, Length

Data – First 20 Cases

Full Model (5 Predictors, 6 Parameters, n=158)

Backward Elimination – Model Based AIC (minimize)

Forward Selection (AIC Based)

Stepwise Regression (AIC Based)

Summary of Automated Models Backward Elimination Drop Age (AIC drops from 9.09 to 7.24) Drop Tonnage (AIC drops from 7.24 to 5.52) Stop: Keep Passdens, Length, Cabins Forward Selection Add Cabins (AIC drops from 397.18 to 28.82) Add Length (AIC drops from 28.82 to 9.87) Add Passdens (AIC drops from 9.87 to 5.52) Stepwise – Same as Forward Selection

All Possible (Subset) Regressions

All Possible (Subset) Regressions (Best 4 per Grp)

Cross-Validation Hold-out Sample (Training Sample = 100, Validation = 58) Fit Model on Training Sample, and obtain Regression Estimates Apply Regression Estimates from Training Sample to Validation Sample X levels for Predicted  MSEP = sum(obs-pred)2/n Fit Model on Validation Sample and Compare regression coefficients with model for Training Sample PRESS Statistic (Delete observations 1-at-a-time) Fit model with each observation deleted 1-at-a-time Obtain Residual for each observation when it was deleted PRESS = sum(obs-pred(deleted))2 K-fold Cross-validation Extension of PRESS to where K groups of cases are deleted Useful for computationally intensive models

Hold-Out Sample – nin = 100 nout = 58 Very similar coefficients for the 2 data sets

Testing Bias = 0 from Training data to Validation

PRESS Statistic Model appears to be valid, very little difference between PRESS/n and MS(Resid)

K-fold Cross-Validation (k=10) – Full Model