Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: courses
Jul-15H.S.2 Agenda Concepts Bivariate analysis –Continuous symmetrical –Continuous skewed –Categorical Multivariable analysis –Linear regression –Logistic regression Outcome variable decides analysis
CONCEPTS Jul-15H.S.3
Jul-15H.S.4 Precision and bias Measures of populations –precision - random error - statistics –bias - systematic error - epidemiology True value Estimate Precision Bias
Jul-15H.S.5 Precision: Estimation PopulationSample ( | ) Estimate with confidence interval 95% confidence interval: 95% of repeated intervals will contain the true value
Jul-15H.S.6 Precision: Testing PopulationSample | | group 1 group 2 p-value=P(observing this difference or more, when the true difference is zero)
Jul-15H.S.7 Precision: Significance level Birth weight, 500 newborn, observe difference H 0 : boys=girls 10 grp= grp= grp= grp= grp=0.02 H a : boys≠girls p<0.05 Significance level
Jul-15H.S.8 Precision: Test situations 1 sample test Weight =10 2 independent samples Weight by sex K independent samples Weight by age groups 2 dependent samples Weight last year = Weight today
Jul-15H.S.9 Bias: DAGs E gest age D birth weight C2 parity C1 sex AssociationsBivariate (unadjusted) Causal effectsMultivariable (adjusted) Draw your assumptions before your conclusions
WHY USE GRAPHS? Jul-15H.S.10
Jul-15H.S.11 Problem example Lunch meals per week –Table of means (around 5 per week) –Linear regression
Jul-15H.S.12 Problem example 2 Iron level by sex –Both linear and logistic regression –Opposite results Iron level in blood
Jul-15H.S.13 Datatypes Categorical data –Nominal: married/ single/ divorced –Ordinal:small/ medium/ large Numerical data –Discrete:number of children –Continuous:weight
Jul-15H.S.14 Outcome data type dictates type of analysis
BIVARIATE ANALYSIS 1 Continuous symmetric outcome: Birth weight Jul-15H.S.15
Jul-15H.S.16 Distribution kdensity weight drop if weight<2000 kdensity weight
Jul-15H.S.17 Central tendency and dispersion Mean and standard deviation: Mean with confidence interval:
Jul-15H.S.18 Compare groups, equal variance? EqualNot equal
Jul-15H.S.19 2 independent samples Are birth weights the same for boys and girls? Scatterplot Density plot
Jul-15H.S.20 2 independent samples test ttest weight, by(sex) unequalunequal variances ttest var1==var2paired test
Jul-15H.S.21 K independent samples Is birth weight the same over parity? Scatterplot Density plot
Jul-15H.S.22 K independent samples test equal means? Equal variances?
Jul-15H.S.23 Continuous by continuous Does birth weight depend on gestational age? Scatterplot Scatterplot, outlier dropped
Jul-15H.S.24 Continuous by continuous tests Cut gestational age up in groups, then use T-test or ANOVA or Use linear regression with 1 covariate
Jul-15H.S.25 Test situations 1 sample test ttest weight =10 2 independent samples test weight, by(sex) K independent samples oneway weight parity 2 dependent samples (Paired) ttest weight_last_year == weight_today
BIVARIATE ANALYSIS 2 Continuous skewed outcome: Number of sexual partners Jul-15H.S.26
Jul-15H.S.27 Distribution kdensity partners if partners<=50
Jul-15H.S.28 Central tendency and dispersion Median and percentiles:
Jul-15H.S.29 2 independent samples Do males and females have the same number of partners? ScatterplotDensity plot
Jul-15H.S.30 2 independent samples test equal medians?
Jul-15H.S.31 K independent samples Do partners vary with age? ScatterplotDensity plot
Jul-15H.S.32 K independent samples test equal medians?
Jul-15H.S.33 Table of descriptives
Jul-15H.S.34 Table of tests Categorical ordered: use nonparametric tests If N is large: may use parametric tests Remarks:If unequal variance in ANOVA: Use linear regression with robust variance estimation
BIVARIATE ANALYSIS 3 Categorical outcome: Being bullied Jul-15H.S.35
Jul-15H.S.36 Frequency and proportion Frequency: Proportion with CI:
Jul-15H.S.37 Proportion, confidence interval proportion: standard error: confidence interval: x=”disease” n=total number
Jul-15H.S.38 Crosstables equal proportions? Are boys bullied as much as girls?
Jul-15H.S.39 Ordered categories, trend Trend? equal proportions?
Jul-15H.S.40 Table of tests Categorical ordered: use nonparametric tests If N is large: may use parametric tests Remarks:If unequal variance in ANOVA: Use linear regression with robust variance estimation
MULTIVARIABLE ANALYSIS 1 Continuous outcome: Linear regression, Birth weight Jul-15H.S.41
Jul-15H.S.42 Regression idea
Jul-15H.S.43 Model and assumptions Model Association measure 1 = increase in y for one unit increase in x 1 Assumptions –Independent errors –Linear effects –Constant error variance Robustness –influence
Jul-15H.S.44 Workflow DAG Scatterplots Bivariate analysis Regression –Model estimation –Test of assumptions Independent errors Linear effects Constant error variance –Robustness Influence E gest age D birth weight C2 parity C1 sex
Categorical covariates 2 categories –OK 3+ categories –Use “dummies” “Dummies” are 0/1 variables used to create contrasts Want 3 categories for parity: 0, 1 and 2-7 children Choose 0 as reference Make dummies for the two other categories Jul-15H.S.45 generate Parity1 =(parity==1) if parity<. generate Parity2_7 =(parity>=2) if parity<.
Create meaningful constant Expected birth weight at: gest= 0, sex=0, parity=0, not meaningful gest=280, sex=1, parity=0
Model estimation Jul-15H.S.47
Jul-15H.S.48 Test of assumptions Plot residuals versus predicted y –Independent residuals? –Linear effects? –constant variance?
Jul-15H.S.49 Violations of assumptions Dependent residuals Use mixed models or GEE Non linear effects Add square term Non-constant variance Use robust variance estimation
Jul-15H.S.50 Influence
Jul-15H.S.51 Measures of influence Measure change in: –Predicted outcome –Deviance –Coefficients (beta) Delta beta Remove obs 1, see change remove obs 2, see change
Delta beta for gestational age Jul-15H.S.52 If obs nr 539 is removed, beta will change from 6 to 16
Removing outlier Jul-15H.S.53 Full modelOutlier removed One outlier affected two estimatesFinal model
MULTIVARIABLE ANALYSIS 2 Binary outcome: Logistic regression, Being bullied Jul-15H.S.54
Ordered categories and model Jul-15H.S.55 CategoriesRegression model 2Logistic 3-7Ordinal logistic >7Linear (treat as interval) Interval versus ordered scale: Interval scale Ordered scale 123 lowmediumhigh
Jul-15H.S.56 Logistic model and assumptions Association measure Odds ratio in y for 1 unit increase in x 1 Assumptions –Independent errors –Linear effects on the log odds scale Robustness –influence
Jul-15H.S.57Jul-1557Jul-15H.S.57 Being bullied We want the total effect of country on being bullied. –The risk of being bullied depends on age and sex. –The age and sex distribution may differ between countries. Should we adjust for age and sex? E country D bullied C1 age C2 sex No, age and sex are mediating variables
Logistic: being bullied Jul-15H.S.58 OR RR if outcome is rare OR>RR (further from 1) if the outcome is common Prevalence of being bullied=17% Roughly: Same risk of being bullied in Island as in Sweden. 2 times the risk in Norway as in Sweden. 3 times the risk in Finnland as in Sweden.
Jul-15H.S.59 Summing up DAGs –State prior knowledge. Guide analysis Plots –Linearity, variance, outliers Bivariate analysis –Continuous symmetricalMean, T-test, anova –Continuous skewedMedian, nonparametric –CategoricalFreq, cross, chi-square Multivariable analysis –ContinuousLinear regression –BinaryLogistic regression