Advanced Quantitative Techniques

Slides:



Advertisements
Similar presentations
Qualitative predictor variables
Advertisements

Specification Error II
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Predictive Analysis in Marketing Research
Multiple Linear Regression
Comparison of Regularization Penalties Pt.2 NCSU Statistical Learning Group Will Burton Oct
Finding help. Stata manuals You have all these as pdf! Check the folder /Stata12/docs.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Problems with Incorrect Functional Form You cannot compare R 2 between two different functional forms. ▫ Why? TSS will be different. One should also remember.
Shuyu Chu Department of Statistics February 17, 2014 Lisa Short Course Series R Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis.
Multiple Linear Regression. Multiple Regression In multiple regression we have multiple predictors X 1, X 2, …, X p and we are interested in modeling.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Chapter 18 Four Multivariate Techniques Angela Gillis & Winston Jackson Nursing Research: Methods & Interpretation.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six.
Applied Epidemiologic Analysis - P8400 Fall 2002
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Simple linear regression Tron Anders Moger
Multiple regression.
© Department of Statistics 2012 STATS 330 Lecture 19: Slide 1 Stats 330: Lecture 19.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Visualizing shapes of interaction patterns between two categorical independent.
A first order model with one binary and one quantitative predictor variable.
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Visualizing shapes of interaction patterns with continuous independent variables.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Multiple Linear Regression An introduction, some assumptions, and then model reduction 1.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Stats Methods at IC Lecture 3: Regression.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Chapter 8 Linear Regression.
Advanced Quantitative Techniques
Simple Linear Regression
Advanced Quantitative Techniques
Linear Regression.
Advanced Quantitative Techniques November 3, 2016
QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)
The Correlation Coefficient (r)
Understanding Regression Analysis Basics
Correlation – Regression
Multiple Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Regression.
Multiple Regression – Part II
University of North Carolina at Chapel Hill
Prepared by Lee Revere and John Large
Multi Linear Regression Lab
The greatest blessing in life is
Lecture 12 Model Building
Regression III.
Regression Forecasting and Model Building
Checking Assumptions Primary Assumptions Secondary Assumptions
Chapter 13 Additional Topics in Regression Analysis
Individual Assignment 6
Exercise 1: Gestational age and birthweight
Testing whether a multivariate specification can be simplified
Exercise 1: Gestational age and birthweight
The Correlation Coefficient (r)
Presentation transcript:

Advanced Quantitative Techniques Lab 7

Low Birth Weight Example The goal of this study was to identify risk factors associated with giving birth to a low birth weight baby (weighing less than 2500 grams). Data were collected on 189 women, 59 of which had low birth weight babies and 130 of which had normal birth weight babies. Four variables which were thought to be of importance were age, weight of the subject at her last menstrual period, race, and the number of physician visits during the first trimester of pregnancy (This dataset is from a famous study which led to important clinical recommendations).

LIST OF VARIABLES: Variable Abbreviation Identification Code ID Birth Weight in Grams BWT Low Birth Weight (0 = Birth Weight >= 2500g, LOW 1 = Birth Weight < 2500g) Age of the Mother in Years AGE Weight in Pounds at the Last Menstrual Period LWT Race (1 = White, 2 = Black, 3 = Other) RACE Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE History of Premature Labor (0 = None 1 = One, etc.) PTL History of Hypertension (1 = Yes, 0 = No) HT Presence of Uterine Irritability (1 = Yes, 0 = No) UI Number of Physician Visits During the First Trimester FTV (0 = None, 1 = One, 2 = Two, etc.)

Model Building Step 1: Without looking at the data, record expectations: what factors are likely to explain birth weight (make a ‘wish list’ of independent variables)? Step 2: Reconcile “wish list” with available data. Take note of variables that you can’t measure because they aren’t available (to gauge omitted variable bias). List those variables here. Step 3: Create a list of the variables in your wish list that are available in the data (or have close proxies). Add any other variables that might reasonably be predictors of birth weight (you should test most variables). But eliminate variables that have no possible predictive power or that are circular. The variables that you keep are your candidate independent variables.

Step 4: Perform basic checks of the candidate variables Step 4: Perform basic checks of the candidate variables. Any missing value or out of range data problems? Create a dummy variable for race. In light of theory, I made black =1, other races =0. Be sure to check that you coded this correctly. Race can not be included “as is” because it is a nominal variable. You need the dummy variable transformation. sum bwt age lwt smoke ht ui ftv black gen black=. replace black=1 if race==2 replace black=0 if race==1|race==3

Step 5: Build a correlation matrix which includes your dependent variable and candidate independent variables. What did your check of the correlation matrix find? Which variables seem most highly correlated with birth weight? Does it look like you need to worry about multicollinearity? Don’t include variables that you eliminated in step 3 in the correlation matrix corr bwt age lwt smoke ht ui ftv black

pwcorr bwt age lwt smoke ht ui ftv black, obs sig The most important difference between correlate and pwcorr is the way in which missing data is handled. With correlate, an observation or case is dropped if any variable has a missing value, in other words, correlate uses listwise , also called casewise, deletion. pwcorr uses pairwise deletion, meaning that the observation is dropped only if there is a missing value for the pair of variables being correlated.

Step6: Rank your independent variables based on logic/reasoning or theory. Write down the order of entry based on your best guess given your knowledge of field (protection against specification error) . If you are not sure, you can use the correlation results as a guide, but try to let reasoning and logic drive the order of entry. Step7: Add your first independent variable to the regression model. Show your bivariate model. Did it accord with your expectations? Step 8: Check for regression violations for this bivariate mode. Did you find any major violations?

Step 9:Sequentially build up the model adding variables in the order you specified (don’t check reg. assumptions at each stage) Add variables one by one. As we add variables: Drop variables that are insignificant unless strong theoretical reason to keep. If an insignificant variable makes existing variable insignificant just drop the new one. If the new variable is significant but adding it makes an old variable insignificant, keep both. Theory led you to think the other important, so keep it. Keep track of variables which are not significant. This is important to document. Briefly document what you kept and what you dropped.

regress bwt age lwt smoke ht ui ftv black, beta

Step 10: Recheck model assumptions, for your final model (You do NOT need to check assumptions for each variable you add, only do this for the bivariate model and your final model). Discuss your final model, review the coefficient table in detail, and the other key statistics. Also, briefly discuss if the final model satisfied regression assumptions overall. If not, what are some options for improving the model fit?

predict pr list pr bwt in 1/10 predict res, residual list res in 1/10

Residual regress bwt age lwt smoke ht ui ftv black, beta rvpplot age