Download presentation
Presentation is loading. Please wait.
Published byAubrie Doyle Modified over 9 years ago
1
Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/ courses
2
Jul-15H.S.2 Agenda Concepts Bivariate analysis –Continuous symmetrical –Continuous skewed –Categorical Multivariable analysis –Linear regression –Logistic regression Outcome variable decides analysis
3
CONCEPTS Jul-15H.S.3
4
Jul-15H.S.4 Precision and bias Measures of populations –precision - random error - statistics –bias - systematic error - epidemiology True value Estimate Precision Bias
5
Jul-15H.S.5 Precision: Estimation PopulationSample ( | ) Estimate with confidence interval 95% confidence interval: 95% of repeated intervals will contain the true value
6
Jul-15H.S.6 Precision: Testing PopulationSample | | group 1 group 2 p-value=P(observing this difference or more, when the true difference is zero)
7
Jul-15H.S.7 Precision: Significance level Birth weight, 500 newborn, observe difference H 0 : boys=girls 10 grp=0.90 50 grp=0.40 100 grp=0.10 130 grp=0.04 150 grp=0.02 H a : boys≠girls p<0.05 Significance level
8
Jul-15H.S.8 Precision: Test situations 1 sample test Weight =10 2 independent samples Weight by sex K independent samples Weight by age groups 2 dependent samples Weight last year = Weight today
9
Jul-15H.S.9 Bias: DAGs E gest age D birth weight C2 parity C1 sex AssociationsBivariate (unadjusted) Causal effectsMultivariable (adjusted) Draw your assumptions before your conclusions
10
WHY USE GRAPHS? Jul-15H.S.10
11
Jul-15H.S.11 Problem example Lunch meals per week –Table of means (around 5 per week) –Linear regression
12
Jul-15H.S.12 Problem example 2 Iron level by sex –Both linear and logistic regression –Opposite results Iron level in blood
13
Jul-15H.S.13 Datatypes Categorical data –Nominal: married/ single/ divorced –Ordinal:small/ medium/ large Numerical data –Discrete:number of children –Continuous:weight
14
Jul-15H.S.14 Outcome data type dictates type of analysis
15
BIVARIATE ANALYSIS 1 Continuous symmetric outcome: Birth weight Jul-15H.S.15
16
Jul-15H.S.16 Distribution kdensity weight drop if weight<2000 kdensity weight
17
Jul-15H.S.17 Central tendency and dispersion Mean and standard deviation: Mean with confidence interval:
18
Jul-15H.S.18 Compare groups, equal variance? EqualNot equal
19
Jul-15H.S.19 2 independent samples Are birth weights the same for boys and girls? Scatterplot Density plot
20
Jul-15H.S.20 2 independent samples test ttest weight, by(sex) unequalunequal variances ttest var1==var2paired test
21
Jul-15H.S.21 K independent samples Is birth weight the same over parity? Scatterplot Density plot
22
Jul-15H.S.22 K independent samples test equal means? Equal variances?
23
Jul-15H.S.23 Continuous by continuous Does birth weight depend on gestational age? Scatterplot Scatterplot, outlier dropped
24
Jul-15H.S.24 Continuous by continuous tests Cut gestational age up in groups, then use T-test or ANOVA or Use linear regression with 1 covariate
25
Jul-15H.S.25 Test situations 1 sample test ttest weight =10 2 independent samples test weight, by(sex) K independent samples oneway weight parity 2 dependent samples (Paired) ttest weight_last_year == weight_today
26
BIVARIATE ANALYSIS 2 Continuous skewed outcome: Number of sexual partners Jul-15H.S.26
27
Jul-15H.S.27 Distribution kdensity partners if partners<=50
28
Jul-15H.S.28 Central tendency and dispersion Median and percentiles:
29
Jul-15H.S.29 2 independent samples Do males and females have the same number of partners? ScatterplotDensity plot
30
Jul-15H.S.30 2 independent samples test equal medians?
31
Jul-15H.S.31 K independent samples Do partners vary with age? ScatterplotDensity plot
32
Jul-15H.S.32 K independent samples test equal medians?
33
Jul-15H.S.33 Table of descriptives
34
Jul-15H.S.34 Table of tests Categorical ordered: use nonparametric tests If N is large: may use parametric tests Remarks:If unequal variance in ANOVA: Use linear regression with robust variance estimation
35
BIVARIATE ANALYSIS 3 Categorical outcome: Being bullied Jul-15H.S.35
36
Jul-15H.S.36 Frequency and proportion Frequency: Proportion with CI:
37
Jul-15H.S.37 Proportion, confidence interval proportion: standard error: confidence interval: x=”disease” n=total number
38
Jul-15H.S.38 Crosstables equal proportions? Are boys bullied as much as girls?
39
Jul-15H.S.39 Ordered categories, trend Trend? equal proportions?
40
Jul-15H.S.40 Table of tests Categorical ordered: use nonparametric tests If N is large: may use parametric tests Remarks:If unequal variance in ANOVA: Use linear regression with robust variance estimation
41
MULTIVARIABLE ANALYSIS 1 Continuous outcome: Linear regression, Birth weight Jul-15H.S.41
42
Jul-15H.S.42 Regression idea
43
Jul-15H.S.43 Model and assumptions Model Association measure 1 = increase in y for one unit increase in x 1 Assumptions –Independent errors –Linear effects –Constant error variance Robustness –influence
44
Jul-15H.S.44 Workflow DAG Scatterplots Bivariate analysis Regression –Model estimation –Test of assumptions Independent errors Linear effects Constant error variance –Robustness Influence E gest age D birth weight C2 parity C1 sex
45
Categorical covariates 2 categories –OK 3+ categories –Use “dummies” “Dummies” are 0/1 variables used to create contrasts Want 3 categories for parity: 0, 1 and 2-7 children Choose 0 as reference Make dummies for the two other categories Jul-15H.S.45 generate Parity1 =(parity==1) if parity<. generate Parity2_7 =(parity>=2) if parity<.
46
Create meaningful constant Expected birth weight at: gest= 0, sex=0, parity=0, not meaningful gest=280, sex=1, parity=0
47
Model estimation Jul-15H.S.47
48
Jul-15H.S.48 Test of assumptions Plot residuals versus predicted y –Independent residuals? –Linear effects? –constant variance?
49
Jul-15H.S.49 Violations of assumptions Dependent residuals Use mixed models or GEE Non linear effects Add square term Non-constant variance Use robust variance estimation
50
Jul-15H.S.50 Influence
51
Jul-15H.S.51 Measures of influence Measure change in: –Predicted outcome –Deviance –Coefficients (beta) Delta beta Remove obs 1, see change remove obs 2, see change
52
Delta beta for gestational age Jul-15H.S.52 If obs nr 539 is removed, beta will change from 6 to 16
53
Removing outlier Jul-15H.S.53 Full modelOutlier removed One outlier affected two estimatesFinal model
54
MULTIVARIABLE ANALYSIS 2 Binary outcome: Logistic regression, Being bullied Jul-15H.S.54
55
Ordered categories and model Jul-15H.S.55 CategoriesRegression model 2Logistic 3-7Ordinal logistic >7Linear (treat as interval) Interval versus ordered scale: Interval scale Ordered scale 123 lowmediumhigh
56
Jul-15H.S.56 Logistic model and assumptions Association measure Odds ratio in y for 1 unit increase in x 1 Assumptions –Independent errors –Linear effects on the log odds scale Robustness –influence
57
Jul-15H.S.57Jul-1557Jul-15H.S.57 Being bullied We want the total effect of country on being bullied. –The risk of being bullied depends on age and sex. –The age and sex distribution may differ between countries. Should we adjust for age and sex? E country D bullied C1 age C2 sex No, age and sex are mediating variables
58
Logistic: being bullied Jul-15H.S.58 OR RR if outcome is rare OR>RR (further from 1) if the outcome is common Prevalence of being bullied=17% Roughly: Same risk of being bullied in Island as in Sweden. 2 times the risk in Norway as in Sweden. 3 times the risk in Finnland as in Sweden.
59
Jul-15H.S.59 Summing up DAGs –State prior knowledge. Guide analysis Plots –Linearity, variance, outliers Bivariate analysis –Continuous symmetricalMean, T-test, anova –Continuous skewedMedian, nonparametric –CategoricalFreq, cross, chi-square Multivariable analysis –ContinuousLinear regression –BinaryLogistic regression
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.