Download presentation
Presentation is loading. Please wait.
1
Advanced Quantitative Techniques
Lab 7
2
Low Birth Weight Example
The goal of this study was to identify risk factors associated with giving birth to a low birth weight baby (weighing less than 2500 grams). Data were collected on 189 women, 59 of which had low birth weight babies and 130 of which had normal birth weight babies. Four variables which were thought to be of importance were age, weight of the subject at her last menstrual period, race, and the number of physician visits during the first trimester of pregnancy (This dataset is from a famous study which led to important clinical recommendations).
3
LIST OF VARIABLES: Variable Abbreviation Identification Code ID Birth Weight in Grams BWT Low Birth Weight (0 = Birth Weight >= 2500g, LOW 1 = Birth Weight < 2500g) Age of the Mother in Years AGE Weight in Pounds at the Last Menstrual Period LWT Race (1 = White, 2 = Black, 3 = Other) RACE Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE History of Premature Labor (0 = None 1 = One, etc.) PTL History of Hypertension (1 = Yes, 0 = No) HT Presence of Uterine Irritability (1 = Yes, 0 = No) UI Number of Physician Visits During the First Trimester FTV (0 = None, 1 = One, 2 = Two, etc.)
4
Model Building Step 1: Without looking at the data, record expectations: what factors are likely to explain birth weight (make a ‘wish list’ of independent variables)? Step 2: Reconcile “wish list” with available data. Take note of variables that you can’t measure because they aren’t available (to gauge omitted variable bias). List those variables here. Step 3: Create a list of the variables in your wish list that are available in the data (or have close proxies). Add any other variables that might reasonably be predictors of birth weight (you should test most variables). But eliminate variables that have no possible predictive power or that are circular. The variables that you keep are your candidate independent variables.
5
Step 4: Perform basic checks of the candidate variables
Step 4: Perform basic checks of the candidate variables. Any missing value or out of range data problems? Create a dummy variable for race. In light of theory, I made black =1, other races =0. Be sure to check that you coded this correctly. Race can not be included “as is” because it is a nominal variable. You need the dummy variable transformation. sum bwt age lwt smoke ht ui ftv black gen black=. replace black=1 if race==2 replace black=0 if race==1|race==3
6
Step 5: Build a correlation matrix which includes your dependent variable and candidate independent variables. What did your check of the correlation matrix find? Which variables seem most highly correlated with birth weight? Does it look like you need to worry about multicollinearity? Don’t include variables that you eliminated in step 3 in the correlation matrix corr bwt age lwt smoke ht ui ftv black
7
pwcorr bwt age lwt smoke ht ui ftv black, obs sig
The most important difference between correlate and pwcorr is the way in which missing data is handled. With correlate, an observation or case is dropped if any variable has a missing value, in other words, correlate uses listwise , also called casewise, deletion. pwcorr uses pairwise deletion, meaning that the observation is dropped only if there is a missing value for the pair of variables being correlated.
8
Step6: Rank your independent variables based on logic/reasoning or theory. Write down the order of entry based on your best guess given your knowledge of field (protection against specification error) . If you are not sure, you can use the correlation results as a guide, but try to let reasoning and logic drive the order of entry. Step7: Add your first independent variable to the regression model. Show your bivariate model. Did it accord with your expectations? Step 8: Check for regression violations for this bivariate mode. Did you find any major violations?
9
Step 9:Sequentially build up the model adding variables in the order you specified (don’t check reg. assumptions at each stage) Add variables one by one. As we add variables: Drop variables that are insignificant unless strong theoretical reason to keep. If an insignificant variable makes existing variable insignificant just drop the new one. If the new variable is significant but adding it makes an old variable insignificant, keep both. Theory led you to think the other important, so keep it. Keep track of variables which are not significant. This is important to document. Briefly document what you kept and what you dropped.
10
regress bwt age lwt smoke ht ui ftv black, beta
11
Step 10: Recheck model assumptions, for your final model (You do NOT need to check assumptions for each variable you add, only do this for the bivariate model and your final model). Discuss your final model, review the coefficient table in detail, and the other key statistics. Also, briefly discuss if the final model satisfied regression assumptions overall. If not, what are some options for improving the model fit?
12
predict pr list pr bwt in 1/10 predict res, residual list res in 1/10
13
Residual regress bwt age lwt smoke ht ui ftv black, beta rvpplot age
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.