Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

Slides:



Advertisements
Similar presentations
Apr-15H.S.1 Stata: Linear Regression Stata 3, linear regression Hein Stigum Presentation, data and programs at: courses.
Advertisements

Qualitative predictor variables
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
Computing for Research I Spring 2013 Primary Instructor: Elizabeth Garrett-Mayer Regression Using Stata February 19.
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression Example: Horseshoe Crab Data
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
FINAL REVIEW BIOST/EPI 536 December 14, Outline Before the midterm: Interpretation of model parameters (Cohort vs case-control studies) Hypothesis.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Log-linear and logistic models
Lecture 24: Thurs., April 8th
Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: courses.
Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at:
Ch. 14: The Multiple Regression Model building
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
1 B. The log-rate model Statistical analysis of occurrence-exposure rates.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Simple Linear Regression
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 4: Regression Models and Multivariate Analyses.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
Linear correlation and linear regression + summary of tests
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Lecture Slide #1 Logistic Regression Analysis Estimation and Interpretation Hypothesis Tests Interpretation Reversing Logits: Probabilities –Averages.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Multiple regression.
Chapter 12: Correlation and Linear Regression 1.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
A first order model with one binary and one quantitative predictor variable.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Mar-16H.S.1Mar-16H.S.1 Stata 5, Mixed Models Not finished Hein Stigum Presentation, data and programs at:
Mar-16H.S.1 Error check in data Hein Stigum Presentation, data and programs at:
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)
A radical view on plots in analysis
BINARY LOGISTIC REGRESSION
EHS Lecture 14: Linear and logistic regression, task-based assessment
Logistic Regression APKC – STATS AFAC (2016).
Advanced Quantitative Techniques
CHAPTER 7 Linear Correlation & Regression Methods
Advanced Quantitative Techniques
Stata Intro Mixed Models
A statistical package for epidemiologists
Introduction to logistic regression a.k.a. Varbrul
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Introduction to analysis DAGitty
Constantly forgotten Hein Stigum Presentation, data and programs at:
Presentation, data and programs at:
Stata 9, Summing up.
Presentation, data and programs at:
Standard Statistical analysis Linear-, logistic- and Cox-regression
Regression diagnostics
Soc 3306a Lecture 11: Multivariate 4
Introduction to Logistic Regression
Linear models in Epidemiology
Summary of Measures and Design
Presentation transcript:

Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

2 July 2015H.S.2 Agenda Linear regression GLM Logistic regression Binary regression (Conditional logistic)

2 July 2015H.S.3 Linear regression Birth weight by gestational age

2 July 2015H.S.4 Regression idea

2 July 2015H.S.5 Model and assumptions Model Assumptions –Independent errors –Linear effects –Constant error variance

2 July 2015H.S.6 Association measure: RD Model: Start with: Hence:

2 July 2015H.S.7 Purpose of regression Estimation –Estimate association between outcome and exposure adjusted for other covariates Prediction –Use an estimated model to predict the outcome given covariates in a new dataset

2 July 2015H.S.8 Adjusting for confounders Not adjust –Cofactor is a collider –Cofactor is in causal path May or may not adjust –Cofactor has missing –Cofactor has error

2 July 2015H.S.9 Workflow Scatterplots Bivariate analysis Regression –Model fitting Cofactors in/out Interactions –Test of assumptions Independent errors Linear effects Constant error variance –Influence (robustness)

2 July 2015H.S.10 Scatterplot

2 July 2015H.S.11 Syntax Estimation –regress y x1 x2linear regression –xi: regress y x1 i.c1categorical c1 Post estimation –predict yf, xbpredict Manage models –estimates store m1save model

2 July 2015H.S.12 Model 1: outcome+exposure

2 July 2015H.S.13 Model 2: Add counfounders Estimate association: m1=m2 Prediction: m2 is best

”Dummies” 2 July 2015H.S.14 Assume educ is coded 1, 2, 3 for low, medium and high education Choose low educ as reference Make dummies for the two other categories: generate medium=(educ==2) if educ<. generate high =(educ==3) if educ<.

2 July 2015H.S.15 Interaction Model: Start with: Hence:

2 July 2015H.S.16 Model 3: with interaction

2 July 2015H.S.17 Test of assumptions Predict y and residuals –predict y, xb –predict res, resid Plot resid vs y –independent? –linear? –const. var? twoway (scatter res y )(qfitci res y)

2 July 2015H.S.18 Violations of assumptions Dependent residuals Mixed models: xtmixed Non linear effects gen gest2=gest^2 regress weigth gest gest2 sex Non-constant variance regress weigth gest sex, robust

2 July 2015H.S.19 Measures of influence Measure change in: –Outcome (y) –Deviance –Coefficients (beta) Delta beta, Cook’s distance Remove obs 1, see change remove obs 2, see change

2 July 2015H.S.20 Points with high influence lvr2plot, mlabel(id)

Added variable plot: gestational age 2 July 2015H.S.21 avplot gest, mlabel(id)

2 July 2015H.S.22 Removing outlier

2 July 2015H.S.23 Influence

2 July 2015H.S.24 Final model sum gest/* find smallest value */ generate gest2=gest-204/* smallest gest=204 */ generate sex2=sex-1/* boys=0, girls=1 */ regress weight gest2 sex2/* final model */ estimates store m4 Give meaning to constant term:

2 July 2015H.S.25 Logistic regression Being bullied

2 July 2015H.S.26 Model and assumptions Model Assumptions –Independent residuals –Linear effects

2 July 2015H.S.27 Association measure, Odds ratio Model: Start with: Hence:

2 July 2015H.S.28 Syntax Estimation –logistic y x1 x2logistic regression –xi: logistic y x1 i.c1categorical c1 Post estimation –predict yf, prpredict probability Manage models –estimates store m1save model –est table m1, eformshow OR

2 July 2015H.S.29 Workflow Bivariate analysis Regression –Model fitting Cofactors in/out Interactions –Test of assumptions Independent errors Linear effects –Influence (robustness)

2 July 2015H.S.30 Bivariate Generate dummies gen Island=(country==2) if country<. gen Norway=(country==3) gen Finland=(country==4) gen Denmark=(country==5)

2 July 2015H.S.31 Model 1: outcome and exposure xi:logistic bullied i.countryuse xi: i.var for categorical variables xi:logistic bullied i.country, coefcoefs instead of OR's xi:logistic bullied i.country if sex!=. & age!=.do if sex and age not missing Alternative commands:

2 July 2015H.S.32 Model 2: Add confounders Estimate associations: m1=m2 Predict:m2 best

2 July 2015H.S.33 Interaction Model: Start with: Hence:

2 July 2015H.S.34 Model 3: interaction

2 July 2015H.S.35 Test of assumptions Linear effects (of age) –findit linchecksearch and install –lincheck xi:logistic bullied age I.country sex

2 July 2015H.S.36 Points with high influence estimates restore m2restore best model predict p, pprobability (mu in our notation) predict db, dbdelta-beta (one value, not one per estimate) scatter db pdelta-beta plot

2 July 2015H.S.37 Removing 2 observations Conclusion: Robust results

2 July 2015H.S.38 Generalized Linear Models Being bullied

Designs and measures 2 July 2015H.S.39 Models Measures GLMRR, RD, OR Survival Rate Ratio

Jul-15H.S.40 Generalized Linear Models, GLM Linear regression Logistic regression Poisson regression

Jul-15H.S.41 GLM: Distribution and link Distribution family –Given by data –Influence p-value, CI Link function –May chose –Shape (=link -1 ) –Scale –Association measure NormalBinomialPoisson IdentityLogitLog AdditiveMulti. RDORRR

Jul-15H.S.42 Distribution and link examples Link: Identity  linear model  additive scale OBS: not for traditional case control data

Jul-15H.S.43 Being bullied, 3 models glm bullied Island Norway Finland Denmark sex age, family(binomial) link(logit) glm bullied Island Norway Finland Denmark sex age, family(binomial) link(log) glm bullied Island Norway Finland Denmark sex age, family(binomial) link(identity)

2 July 2015H.S.44 Convergence problems If glm does not converge, use: –poisson y x1 x2, irr robustRR –regress y x1 x2, robustRD Stop

2 July 2015H.S.45 Association measure, RR Model: Start with: Hence:

2 July 2015H.S.46 Association measure: RD Model: Start with: Hence:

2 July 2015H.S.47 The importance of scale Additive scale Absolute increase Females: 30-20=10 Males: 20-10=10 Conclusion: Same increase for males and females RD Multiplicative scale Relative increase Females: 30/20=1.5 Males: 20/10=2.0 Conclusion: More increase for males RR

2 July 2015H.S.48 Conditional logistic regression For Matched Case Control data

2 July 2015H.S.49 Truths and Misconceptions Cohort studies –Exposed and unexposed should be as similar as possible, except for exposure –Matching removes confounding Case-Control studies –Cases and controls should be as similar as possible, except for disease –Matching removes confounding Exposed Unexposed Diseased/Cases Healthy/Controls

2 July 2015H.S.50 Matching and analysis Unmatched (age) –Ordinary model –May adjust for age –May interpret age effect Frequency matched (age) –Ordinary model –Must adjust for age –Can not interpret age effect One-one matched (age) –Conditional model –No effect measure for age

2 July 2015H.S.51 Data preparation Save as tab-delimited in Excel Read and fix in Stata –insheet using ”file.txt", clear –mvdecode m*,mv(9) –gsort id -cc

2 July 2015H.S.52 Syntax Estimation clogit y x1 x2, group(id)conditional logistic clogit y x1 x2, group(id) orOR instead of coef Post estimation predict yf, pc1predict probability Manage models estimates store m1save model

2 July 2015H.S.53 Bivariate analysis Loop thru all variables foreach var of varlist m* { quietly: clogit cc `var', group(id) or est store `var' } Show results

2 July 2015H.S.54 Multivariable analysis Stepwise stepwise, pe(0.25): clogit cc m2 m4 m5 m12 m13 m18, group(id) or Final model:

2 July 2015H.S.55 Stata regression commands

2 July 2015H.S.56 Regression with simple error structure –regresslinear regression (also heteroschedastic errors) –nlnon linear least squares GLM –logisticlogistic regression –poissonPoisson regression –binregbinary outcome, OR, RR, or RD effect measures Conditional logistc –clogitfor matched case-control data Multiple outcome –mlogitmultinomial logit (not ordered) –ologitordered logit Regression with complex error structure –xtmixedlinear mixed models –xtlogitrandom effect logistic

2 July 2015H.S.57 Estimation –regress y x1 x2linear regression –logistic y x1 x2logistic regression –xi:regress y x1 i.x2categorical x2 Manage results –estimates store m1store results –estimates table m1 m2table of results –estimates stats m1 m2statistics of results Post estimation –predict y, xblinear prediction –predict res, residresiduals –lincom b0+2*b3linear combination Help –help logistic postestimation Syntax