Lecture 12 Model Building BMTRY 701 Biostatistical Methods II.

Slides:



Advertisements
Similar presentations
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Advertisements

1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lecture 6: Multiple Regression
Lecture 24: Thurs., April 8th
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Multiple Regression Dr. Andy Field.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Correlation & Regression
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Objectives of Multiple Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Simple Covariation Focus is still on ‘Understanding the Variability” With Group Difference approaches, issue has been: Can group membership (based on ‘levels.
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 4: Regression Models and Multivariate Analyses.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
MULTIPLE REGRESSION Using more than one variable to predict another.
LOGO Chapter 4 Multiple Regression Analysis Devilia Sari - Natalia.
1 Chapter 3 Multiple Linear Regression Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends.
Research Methods I Lecture 10: Regression Analysis on SPSS.
Lecture 13 Diagnostics in MLR Variance Inflation Factors Added variable plots Identifying outliers BMTRY 701 Biostatistical Methods II.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Lecture 11 Multicollinearity BMTRY 701 Biostatistical Methods II.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Multiple Regression INCM 9102 Quantitative Methods.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.
ANOVA, Regression and Multiple Regression March
Lecture 13 Diagnostics in MLR Added variable plots Identifying outliers Variance Inflation Factor BMTRY 701 Biostatistical Methods II.
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
Ch15: Multiple Regression 3 Nov 2011 BUSI275 Dr. Sean Ho HW7 due Tues Please download: 17-Hawlins.xls 17-Hawlins.xls.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Regression Analysis1. 2 INTRODUCTION TO EMPIRICAL MODELS LEAST SQUARES ESTIMATION OF THE PARAMETERS PROPERTIES OF THE LEAST SQUARES ESTIMATORS AND ESTIMATION.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 15 Multiple Regression Model Building
Multiple Regression Prof. Andy Field.
Chapter 9 Multiple Linear Regression
CHAPTER 29: Multiple Regression*
Lecture 12 Model Building
Regression Forecasting and Model Building
Presentation transcript:

Lecture 12 Model Building BMTRY 701 Biostatistical Methods II

The real world regression  datasets will have a large number of covariates!  There will be a number of covariates to consider for inclusion in the model  The inclusion/exclusion of covariates will not always be obvious will be affected by multicollinearity will depend on the questions of interest will depend on the scientific ‘precedents’ in that area  The model building process is important for determining a “final model”

The “final model”  At the end of the analytic process, there is generally one model from which you make inferences  it usually is a multiple regression model  it is not logical to make inferences based on more than one model  Recall the ‘principle of parsimony’

Principle of Parsimony  Also known as Occam’s Razor  The principle states that the explanation of any phenomenon should make as few assumptions as possible, eliminating those that make no difference in the observable predictions of the explanatory hypothesis or theory.  The principle recommends selecting the hypothesis that introduces the fewest assumptions and postulates the fewest entities  Translation for regression: the fewest possible covariates that explains the greatest variance is best! The addition of each covariate should be weighed against the increase in complexity of the model.

General Process of Model Building 1.Exploratory Data analysis 2.Choose initial model 3.Fit model 4.Check model assumptions 5.Repeat 2 – 4 as needed 6.Interpret findings

Exploratory Data Analysis  Consider the covariates and the outcome variables look at each covariate and outcome  what forms do they take?  might transformations need to be made? look at relationships between Y and each X  are the relationships linear?  what form should a covariate take to enter the model (e.g. categorical? spline? quadratic?) look at the relatioships between the X’s  is there strong correlation between some covariates?

Exploratory Data Analysis  Individual variable analysis histograms boxplots dotplots (by categories?)  Two-way associations scatterplots color-coded by third variable? SIMPLE LINEAR REGRESSIONS  For categorical variables tables color code other graphical displays

SENIC

SENIC example  We need a scientific question/hypothesis!!  Examples: What factors are predictive of length of stay? Is the number of beds strongly related to length of stay? Is there a difference in length of stay by region? how do infection risk and number of cultures relate to length of stay? is it possible to reduce the length of stay by reducing infection risk and number of cultures?

Work through the exploration…

Next step: Pick an initial model  Use the information that you learned in the exploratory step  Some guidelines covariates not associated in SLR models will probably not be associated in MLR models Choose threshold: alpha < 0.10 or 0.20 in SLR to be included in initial MLR  Recall multicollinearity might want to spend some extra time learning about the interrelationships between two variables and the outcome.

Next step: Pick an initial model  Many approaches to the initial model  My approach: start big, and then pare down initial model includes all of the covariates and potentially their interactions fit model with all of the covariates of interest remove ONE AT A TIME based on insignificant p- values and model coefficients  find the most insignificant covariate  refit the model without it  look at model: what happened to other coefficients? what happened to R 2 not hard-fast rules!

SENIC  What is an appropriate initial model?  Are there any interactions to consider?  Work through the model…

Check model assumptions  Based on a reasonable model (in terms of ‘significance’ of covariates), check the assumptions  Residual plots  Other diagnostics  Recall your assumptions: independence of errors homoscedasticity/constant variance normality of errors

Does it fit?  If so….go to next step  If not, deal with misspecifications transform Y? another type of regression?! transform X? consider more exploration (e.g., smoothers to inform about relationships) outlier problems? Then, refit all over again…

Last step  Interpret results  Oddly, this step often leads you back to refitting  Sometimes trying to summarize results causes you to think of additional modeling considerations adding another variable using a different parameterization using a different reference level for a categorical variable

SENIC  What is the final model?  How to present it?

Other model building issues: Stepwise approaches  “Stepwise” approaches are computer driven  you give the computer a set of covariates and it finds an ‘optimal’ model  “forward” and “backward”  Problems: models are only ‘stepwise’ optimal ignore magnitude of β and simply focus on p-value! you need to set criteria for optimality which are not always obvious gives you no ability to give different variables different priorities can have problematic interpretations: e.g. a main effect is removed, but the interaction is included. stepwise forward and backward give different models.

Is stepwise ever a good idea?  If you have a very large set of predictors that are somewhat ‘interchangeable’  Example: gene expression microarrays you may have >10000 genes to select from automated procedures can find optimal set that describe a large amount of variation in the outcome of interest (e.g. cancer vs. no cancer) it would be physically impossible to use manual model-fitting Specialized software for this (standard ‘lm’ type approach will not work).

Stepwise Approaches  I don’t condone it but,  In R: step(reg)

Other model building issues: R 2  Some people use increase in as a criteria of inclusion/exclusion of a covariate  Not that common in biomedical research, but not totally absent either  Look at either coefficient of multiple determination coefficient of partial determination  Tells the fraction of variance explained.

Other model building issues: Information Criteria  Information Criteria (IC) help with choosing between two models compare ‘parsimony’ adjusted statistics. choose the model with the smallest IC AIC = Akaike information criteria BIC = Bayesain information criteria  More with logistic regression…

A step further  Model validation  Addresses issue of ‘overfitting’ etc.  Is this model specific to this dataset, or does it actually work in the general setting?  Need to collect more data split data into ‘training set’ and ‘test set’ other cross-validation approaches such a ‘leave-k out’ approach

Wednesday: Diagnostics in MLR  Added variable plots  Identifying outliers hat matrix: shows leverage and influence studentized or standardized residuals  variance inflation factor