EHS 655 Lecture 14: Linear and logistic regression, task-based assessment
What we’ll cover today Robust linear regression with R2 Prediction with linear regression Logistic regression modeling
Linear regression Robust regression with R2 Download rregfit http://stats.idre.ucla.edu/stata/faq/how-can-i-get-an-r2-with-robust-regression-rreg/ Download rregfit Stata: rreg depvar indepvar rregfit
Regression - predict Runs a regression model and saves the predicted values for each individual i.e. can use regression to predict TWA based on job title, gender, perceived noise, etc. Linear regression Stata: rreg depvar indepvar1, cluster(idvar) predict fittedvar Logistic regression Stata: logistic depvar indepvar1, cluster(idvar) predict fittedvar
EXPOSURE MODELING Logistic regression Task-based estimation Will use this in our analyses Task-based estimation May use this in our analysis
Binary logistic regression Consider binary categorical (not continuous) outcome Want to know probability of outcome Described by odds ratio Odds of event occurring, [p / (1-p)], where p is probability Note that Stata will give you log odds or odds ratio (OR) Assumptions similar to linear regression e.g., independent observations, equal variance Example: odds of TWA exposure >85 dBA
Binary logistic regression Assumes outcome categorical and binary (i.e., 0 or 1) 0 = negative outcome, 1 = positive outcome Predictor variables Cannot be nominal variables unless “i.” function used Continuous variables okay Ordinal variables okay “vce(robust)” accounts for mild assumption violations Stata: logistic depvar indepvar, vce(robust)
Binary logistic regression We model log odds Where Xi is a continuous variable Xi=0, log odds = β0 Xi=x, log odds = β0 + β1x Xi=x+1, log odds = β0 + β1x + β1
Binary logistic regression Interpreting odds Where Xi=0, odds = eβ0 Xi=x, odds = eβ0 + eβ1x Xi=x+1, odds = eβ0 + eβ1x + eβ1 For groups varying by 1 unit, odds = eβ1
Interpreting logistic regression results Odds ratio when predictor = 0 Exponentiate intercept if using log odds: eβ0 (Otherwise have Stata output odds ratio) Odds ratio between predictors differing by one unit Exponentiate slope from regression: eβ1 Compare nested models with log likelihood score E.g., model with variables 1 and 2 vs variable 1 alone; cannot compare variables 1 vs 2
Binary logistic regression Interpreting model output http://stats.idre.ucla.edu/stata/seminars/stata-logistic/ p-value for chi-square (aka likelihood ratio) Test that all estimated coefficients equal zero Test of whole model Pseudo R2 (aka likelihood ratio index) (log likelihood full model/log likelihood intercept-only model) Log pseudolikelihood Smaller value = better model fit
Binary logistic regression Logistic regression with repeated measures Stata: logistic depvar indepvar, cluster(idvar) or
Multiple logistic regression Like simple logistic regression But with 2 or more predictors Stata: logistic depvar indepvar1 indepvar2, vce(robust)
Repeated measures logistic regression Use if you have Binary outcome Predictor measured repeatedly for each subject And want to Run logistic regression that accounts for multiple (i.e., correlated) measures from single subjects Stata: logistic depvar indepvar, cluster(idvar)
Multinominal logistic regression Use if dependent variable is more than 2 categories, but not continuous E.g., low, medium, high exposure E.g., no, low, medium, severe health impact Not expecting this level of sophistication in your analyses…but http://stats.idre.ucla.edu/stata/dae/multinomiallogistic-regression/ Example: create 3-level ordinal exposure variable from TWA Stata: mlogit depvar indepvar, or vce(robust)
Task-based exposure estimation Develop exposure model based not on personal characteristics or grouping strategy, but on tasks Assume task directly or indirectly determines exposure Create estimates of personal exposure Example: industry where tasks vary widely within groups Where T = total duration, t = duration of task n, f = frequency of task n, C = intensity of task n
Illustration of task-based approach
Task-based assessment Can work better than other approaches Neitzel et al, 2011
Task-based assessment for noise: example of implementation Ignacio and Bullock, AIHA, 2006
Task-based assessment Complicated First, evaluate task-specific exposure levels Then apply exposure levels to individual task durations and frequencies Combination yields personal exposure estimate We don’t always have information about all tasks person performs; may instead focus on Longest task Most frequent task Task with highest exposure
(Pseudo) task-based estimation What about our dataset? To do task-based estimation right, we’d need info on all tasks performed, levels, and durations We just have primary task performed Stata: rreg depvar indepvar1 indepvar2 predict fitted