Standard Statistical analysis Linear-, logistic- and Cox-regression

Slides:



Advertisements
Similar presentations
Apr-15H.S.1 Stata: Linear Regression Stata 3, linear regression Hein Stigum Presentation, data and programs at: courses.
Advertisements

Qualitative predictor variables
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
Section VIII Simple Linear Regression & Correlation.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Log-linear and logistic models
Linear Regression MARE 250 Dr. Jason Turner.
Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:
Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: courses.
Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at:
MARE 250 Dr. Jason Turner Correlation & Linear Regression.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
A Longitudinal Study of Maternal Smoking During Pregnancy and Child Height Author 1 Author 2 Author 3.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
Regression and Correlation
Advantages of Multivariate Analysis Close resemblance to how the researcher thinks. Close resemblance to how the researcher thinks. Easy visualisation.
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Simple Linear Regression
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 4: Regression Models and Multivariate Analyses.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Linear correlation and linear regression + summary of tests
 Muhamad Jantan & T. Ramayah School of Management, Universiti Sains Malaysia Data Analysis Using SPSS.
Overview of our study of the multiple linear regression model Regression models with more than one slope parameter.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
A first order model with one binary and one quantitative predictor variable.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Bivariate analysis. * Bivariate analysis studies the relation between 2 variables while assuming that other factors (other associated variables) would.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
A radical view on plots in analysis
Chapter 15 Multiple Regression Model Building
Advanced Quantitative Techniques
Advanced Quantitative Techniques
REGRESSION G&W p
Statistical Data Analysis - Lecture /04/03
From t-test to multilevel analyses Del-2
CHAPTER 7 Linear Correlation & Regression Methods
Advanced Quantitative Techniques
B&A ; and REGRESSION - ANCOVA B&A ; and
Generalized Linear Models
Checking Regression Model Assumptions
A statistical package for epidemiologists
بحث في التحليل الاحصائي SPSS بعنوان :
I271B Quantitative Methods
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Stats Club Marnie Brennan
Introduction to analysis DAGitty
Quantitative Methods What lies beyond?.
CHAPTER 29: Multiple Regression*
Checking Regression Model Assumptions
Constantly forgotten Hein Stigum Presentation, data and programs at:
Presentation, data and programs at:
What is Regression Analysis?
Stata 9, Summing up.
Presentation, data and programs at:
Regression diagnostics
Quantitative Methods What lies beyond?.
Scale, Causal Pies and Interaction 1h
Linear models in Epidemiology
Learning outcomes By the end of this session you should know about:
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

Standard Statistical analysis Linear-, logistic- and Cox-regression Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/ courses 2 h talk: Part 1: start->robustness Part 2: recap, robustness -> end Jan-19 H.S.

Goal of analysis Adjusting removes confounding and some selection bias Outcome How much: Mean, risk, rate Bi-variate analysis (crude or unadjusted) Exposure More among exposed: Diff. in mean, RR or OR Multivariable analysis (adjusted, regression) Adjusting removes confounding and some selection bias Jan-19 H.S.

Workflow DAG Scatter- and density plots Bivariate analysis Regression Model estimation Test of assumptions Independent errors Linear effects No interactions (Constant error variance) Influence E gest age D birth weight C2 education C1 sex Model fitting Adjust if cofactor is confounder, problems cofacor has missing, cofactor has error, cofactor is in causal path Test of assumptions Stata: Estimation remains in memory. Post estimation commands Normally do assumtions first, then influence. Note leverage value of outlier Jan-19 H.S.

Purpose of regression Estimation Prediction Estimate association between exposure and outcome adjusted for other covariates Prediction Use an estimated model to predict the outcome given covariates in a new dataset DAGs, bias, precision Predictive power, model fit, R2 Counfounding matter in the first Fit of the models matters in the last Jan-19 H.S.

Outcome and regression-types Numerical data Continuous (weight) Linear regression Count (partners) Poisson regression Categorical data Binary (0,1) Risk Linear-risk-, log-risk model Rate Cox, Poisson Odds logistic Ordinal (small,med,large) ologit Multinomial (m. status) mlogit 1/1/2019 H.S.

ANALYSIS 1, CONTINUOUS OUTCOME 01.01.2019 H.S.

Data: m:\pc\dokumenter\a_Courses\Master UiO\birth1.sav SPSS from Kiosk Data: m:\pc\dokumenter\a_Courses\Master UiO\birth1.sav 01.01.2019 H.S.

Table 1 Outcome: Birth weight Exposure: Gestational age Covariates: Jan-19 H.S.

Table 2 Outcome: Birth weight Exposure: Gestational age Adjusted for: Education and sex Change to crude and adjusted models Jan-19 H.S.

DAG: Gestational age and Birthweight C2 education C1 sex E gest age D birth weight Birth weight analysis Continuous outcome Plots by gestational age Compare means Linear regression Jan-19 H.S.

Regression idea Jan-19 H.S. X=gest, Y=weigth straigth line: y=constant +slope times x beta0=constant or intercept, beta1=reg. coef. Or slope, y=dep. continous, x=.. Jan-19 H.S.

Model, measure and assumptions Association measure b1 = change in y for one unit increase in x1 Assumptions Independent errors Linear effects Constant error variance Influence In GLM b-s and betas are coefficients In some fields (and in SPSS) b-s are coefs and betas are standardizes coefs Jan-19 H.S.

Outcome distributions by exposure Linear regression cutoff, logistic regression Linear regression or Log-transform, linear regression Jan-19 H.S.

Scatter and density plots Distribution of birth weight for low/high gestational age Scatter of birth weight by gestational age Look for deviations from linearity and outliers Look for shift in shape Jan-19 H.S.

Bi-variate Weight by sex (continuous by binary) ttest bw, by(sex) t-test Weight by education (continuous by categorical-3) anova bw, by(educ) one way anova Weight by gest. age (continuous by continuous) regress bw gest regression ttest bw, by(gest2) cut in 2, t-test Jan-19 H.S.

Bi-variate result Jan-19 H.S.

Model 1: outcome+exposure regress bw gest crude model estimates store m1 store model results Jan-19 H.S.

Model 2 and 3: Add covariates regress bw gest i.educ sex add covariates estimates table m1 m2 m3 compare coefs Estimate association: m1 is biased, m2=m3 m3 more precise? m2: se(gest)=0.934 m3: se(gest)=0.926 m2: se(gest)=0.934 m3: se(gest)=0.926 estimates stats m1 m2 m3 compare fit Prediction: m3 is best Jan-19 H.S.

Results so far Westreich & Greenland, The table 2 fallacy, 2013 Jan-19 H.S.

Assumptions Jan-19 H.S.

Test of assumptions Assumptions Plot (Test) Independent residuals: Linear effects: Constant variance: No interactions Plot (Test) discuss plot residuals versus predicted y predict res, residuals predict pred, xb scatter res pred Dependent residuals if many children from same mother estat hettest p=0.2 no heteroskedasticity Jan-19 H.S.

Violations of assumptions Dependent residuals robust(cluster) or mixed model Non linear effects Add square term or spline Non-constant variance Use robust variance estimation regress y x, robust Interactions Add product (interaction) terms Robust variance estimation with cluster variable: regress y x1 x2, vce(cluster village) Linear mixed models: xtmixed Robust variance: regress y x1 x2, robust Jan-19 H.S.

Influence Measures of influence Jan-19 H.S. Easier to understand influence when effects are linear and without interaction Influence Jan-19 H.S.

Influence idea (different data) delta beta*se=-6.8 Have taken N down to 1000 and moved outlier out to gest=400 Beta changes from 10 to 17 when removing influential outlier Est change=Delta beta*se(gest)=-3.5*2.1=-6.8 Influence=“leverage*residual” Jan-19 H.S.

Delta-beta for gestational age dfbeta(gest) create delta-beta scatter _dfbeta_1 id plot vs id-variable OBS, variable specific If obs nr 370 is removed, beta will change -0.7 se’s -0.7*0.9 = -0.6 gr 17.9-(-0.6)=18.5 gr Jan-19 H.S.

Removing outlier regress bw gest i.educ sex if id!=370 est store m4 est table m3 m4, b(%8.1f) Est change=delta-beta*se=-0.7*0.9=-0.6 Jan-19 H.S.

Removing outlier cont. Full model N=5000 Outlier removed N=4999 One outlier affected several estimates Final model Report only the effect of exposure Jan-19 H.S.

bw2 Non-linear effects Jan-19 H.S.

bw2: Non-linear effects scatter bw2 gest or scatter res pred Handle: add polynomial or spline Weak non-linearity in exposure: would care Weak non-linearity in confounder: may not care Jan-19 H.S.

Non-linear effects: polynomial regress bw2 c.gest##c.gest i.educ sex 2. order polynomial in gest margins, at(gest=(250(10)310)) predicted bw2 by gest marginsplot plot Could also ask for the effect of gest on bw2: Margins, dydx(gest) at(gest=(250(10)310)) Jan-19 H.S.

Non-linear effects: cubic spline Plot List of knots display mkspline g=gest, cubic nknots(4) make spline with 4 knots (g1,g2,g3) regress bw2 g1 g2 g3 i.educ sex regression with spline gen igest=5*round(gest/5) 5-year integer values of gest margins, over(igest) predicted bw by gest * marginsplot plot Package: findit postrcspline ssc hot author(Buis) Statistical Software Components * findit postrcspline Jan-19 H.S.

Non-linear effects: linear spline Plot (as previous) mkspline g1 280 g2=gest make linear spline with knot at 280 regress bw2 g1 g2 i.educ sex regression with spline Jan-19 H.S.

Interaction only linear effects bw3 Interaction only linear effects Jan-19 H.S.

Interaction definitions Interaction: combined effect of two variables Scale Linear models additive y=b0+b1x1+b2x2 both x1 and x2 = b1+b2 Logistic, Poisson, Cox multiplicative both x1 and x2 = OR1*OR2 Interaction deviation from additivity (multiplicativity)  effect of x1 depends on x2 Jan-19 H.S.

bw3: Interaction (only linear effects) Add interaction terms Show results regress bw3 c.gest##i.sex i.educ main + gest-sex interaction margins, dydx(gest) at(sex=0) effect of gest for boys margins, dydx(gest) at(sex=1) effect of gest for girls Jan-19 H.S.

Summing up DAG Scatter- and density plots Bivariate analysis Regression Model estimation Test of assumptions Independent errors Linear effects No interactions Constant error variance Influence E gest age D birth weight C2 education C1 sex Model fitting Adjust if cofactor is confounder, problems cofacor has missing, cofactor has error, cofactor is in causal path Test of assumptions Stata: Estimation remains in memory. Post estimation commands Normally do assumtions first, then influence. Note leverage value of outlier Jan-19 H.S.

Summing up 1 Build model Interaction Assumptions regress bw gest crude model est store m1 store regress bw gest i.educ sex full model est store m2 est table m1 m2 compare coefficients Interaction regress bw3 c.gest##i.sex i.educ test interaction margins, dydx(gest) at(sex=0) gest for boys Assumptions predict res, residuals residuals predict pred, xb predicted scatter res pred plot Jan-19 H.S.

Summing up 2 Non-linearity (linear spline) Robustness mkspline g1 280 g2=gest spline with knot at 280 regress bw2 g1 g2 i.educ sex regression with spline Robustness dfbeta(gest) delta-beta scatter _dfbeta_1 id plot versus id Jan-19 H.S.

ANALYSIS 2, BINARY OUTCOME SPSS from Kiosk Data: m:\pc\dokumenter\a_Courses\Master UiO\Smoking and Alzheimer.sav ANALYSIS 2, BINARY OUTCOME 01.01.2019 H.S.

Table 1 Outcome: Alzheimer Exposure: Smoking Covariates: Jan-19 H.S.

Table 2 Outcome: Alzheimer Exposure: Smoking Adjusted for: Age, Physical activity and Education Change to crude and adjusted models Jan-19 H.S.

Logistic model and assumptions Independent residuals Linear effects (on the log-odds scale) No interactions (on the multiplicative scale) Inverse link in the plots Mu=proportion with disease in each group Influence: use db Logistic.pdf: Natalia Sarkisian 1 January 2019 H.S.

Association measure, Odds ratio Model: Start with: OR: variable and two values OR for x1 (2 vs 1) Linear effect on log odds level Hence: 1 January 2019 H.S.

Short: need to know Assume Association measure Scale Linear effects on the log-odds scale Association measure OR=eb Scale Multiplicative exposed to both x1 and x2 : OR1*OR2 1 January 2019 H.S.

Purpose of regression Estimation Prediction Estimate association between exposure and outcome adjusted for other covariates Prediction Use an estimated model to predict the outcome given covariates in a new dataset DAGs, bias, precision Predictive power, model fit, R2 Counfounding matter in the first Fit of the models matters in the last Jan-19 H.S.

Workflow DAG Bivariate analysis Regression Model fitting Exposure + Confounders Test of assumptions Independent errors Linear effects (on the log odds scale) No interactions (on the multiplicative scale) Influence Model fitting Adjust if cofactor is confounder, problems cofacor has missing, cofactor has error, cofactor is in causal path Test of assumptions Stata: Estimation remains in memory. Post estimation commands 1 January 2019 H.S.

Go to syntax 1 January 2019 H.S.

The end 1 January 2019 H.S.