Download presentation
Presentation is loading. Please wait.
Published byLisa Summers Modified over 9 years ago
1
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology
5
Overview Bias vs chance Types of data Descriptive statistics Histograms and boxplots Inferential statistics Hypothesis testing: P and CI Comparing groups Correlation and regression
7
Research Questions? Does CK level predict in hospital mortality post MI? Is there an association between troponin I and renal function? What is the Incidence of amputation in diabetics with renal failure? HOW ARE THEY MEASURED???
8
Research question Does aspirin reduce CV mortality in diabetics when used for primary prevention? Is there an increased risk between cell phone use and brain cancer? Does level of SES correlate with depression?
9
Research question So your research question must be phrased in such a manner that you can answer YES or NO or provide some quantification of sorts.
10
Data analysis Aim: to provide information on the study sample and to answer the research question !
11
Problems !
12
Problems Bias and confounding also called systematic error…. Typically dealt with in the planning and execution of the study…can also control for it in the data analysis (eg multivariate analysis) Chance also called random error. Classically P values (and CI) can be used to judge role of chance
13
First important issues What type of data are you collecting Typically one has some outcome variable and some exposure variable or variables? How and with what are they measured?
14
Outcome and exposure? Does CK level predict in hospital mortality post MI? Is there an association between troponin I and renal function? What is the Incidence of amputation in diabetics with renal failure? HOW ARE THEY MEASURED???
15
Research question Does aspirin reduce CV mortality in diabetics when used for primary prevention? Is there an increased risk between cell phone use and brain cancer? Does level of SES correlate with depression?
16
Research question So your research question must be phrased in such a manner that you can answer YES or NO or provide some quantification of sorts.
17
Types of data Categorical: HT yes or no, sex, smoking status (usually a %) Ordinal versus nominal Continuous data Spread of continuous data
19
Data analysis Descriptive stats Mean/median SD or range
20
Hypothesis testing Differences between groups: Examples: T test/Mann Whitney (2 groups) ANOVA/ Kruskal Wallis (>2 groups) Chi square if it is %
21
Associations between variables Does coffee cause cancer (OR, RR) Efficacy of Rx (RRR, ARR, NNT) If BMI associated with BP (correlation and regression)
22
2 X 2 table CancerNo cancer Smokeab Non smokercd RR= (a/a+b)/(c/c+d) OR = (a/b)/(c/d)
25
TYPES OF DATA
26
DESCRIPTIVE STATS
36
Graphics
38
Using the SD and the Normal Curve
40
Mean ± 1.96 SD = 95% range of sample Mean ± 1.96 SEM=95% Confidence interval
42
One of many samples
44
95% Confidence Intervals
47
Hypothesis Testing
56
Type I & II Errors Have an Inverse Relationship If you reduce the probability of one error, the other one increases so that everything else is unchanged.
57
Factors Affecting Type II Error True value of population parameter – Increases when the difference between hypothesized parameter and its true value decrease Significance level – Increases when decreases Population standard deviation – Increases when increases Sample size – Increases when n decreases n
60
Examples Difference in glucose between survivors and non survivors = 5 mmol/l (95% CI -5 to 10 mmol/l) RR for cancer =1.4 (95% CI 0.7 to 1.3)
62
P value The H0 is NO difference BUT I can find a difference by chance Eg WHAT is the probability that you can find a difference between groups of 5 mmol/l when in TRUTH the difference is ZERO? P=0.10
63
+-------------------+ | Key | |-------------------| | frequency | | column percentage | +-------------------+ | 0=L E=1 Y/NR | 0 1 | Total -----------+----------------------+---------- N | 28 20 | 48 | 53.85 44.44 | 49.48 -----------+----------------------+---------- Y | 24 25 | 49 | 46.15 55.56 | 50.52 -----------+----------------------+---------- Total | 52 45 | 97 | 100.00 100.00 | 100.00 Pearson chi2(1) = 0.8530 Pr = 0.356
65
Differences between groups
66
Parametric comparisons
67
?
68
T-test ?
69
What about 3 groups anova age ethngr, cat(ethngr) Number of obs = 37 R-squared = 0.0621 Root MSE = 7.7883 Adj R-squared = 0.0069 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 136.560095 2 68.2800477 1.13 0.3362 | ethngr | 136.560095 2 68.2800477 1.13 0.3362 | Residual | 2062.35882 34 60.6576125 -----------+---------------------------------------------------- Total | 2198.91892 36 61.0810811
70
Differences between the 3. regress Source | SS df MS Number of obs = 37 -------------+------------------------------ F( 2, 34) = 1.13 Model | 136.560095 2 68.2800477 Prob > F = 0.3362 Residual | 2062.35882 34 60.6576125 R-squared = 0.0621 -------------+------------------------------ Adj R-squared = 0.0069 Total | 2198.91892 36 61.0810811 Root MSE = 7.7883 ------------------------------------------------------------------------------ age Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------------------------------------------------------------ _cons 56.6 2.462877 22.98 0.000 51.59483 61.60517 ethngr 1 4.635294 3.103845 1.49 0.145 -1.672479 10.94307 2 2.5 3.483034 0.72 0.478 -4.578376 9.578376 3 (dropped) ------------------------------------------------------------------------------
71
Repeated measures One group of schoolkids Muscle strength in January Muscle strength again in March Did things change significantly over time? Paired T –test Two or more groups: RM ANOVA
72
Non-parametric comparisons Two groups ranksum age, by(menopaus) Two-sample Wilcoxon rank-sum (Mann-Whitney) test menopaus | obs rank sum expected -------------+--------------------------------- 0 | 19 210 826.5 1 | 67 3531 2914.5 -------------+--------------------------------- combined | 86 3741 3741 unadjusted variance 9229.25 adjustment for ties -28.04 ---------- adjusted variance 9201.21 Ho: age(menopaus==0) = age(menopaus==1) z = -6.427 Prob > |z| = 0.0000
73
Non Parametric Three groups kwallis s_tg, by(ethngr) Test: Equality of populations (Kruskal-Wallis test) +-------------------------+ | ethngr | Obs | Rank Sum | |--------+-----+----------| | 1 | 17 | 381.00 | | 2 | 10 | 149.50 | | 3 | 10 | 172.50 | +-------------------------+ chi-squared = 3.350 with 2 d.f. probability = 0.1873 chi-squared with ties = 3.352 with 2 d.f. probability = 0.1871
74
summarize Continuous-Non Normal 2 groups: Mann Whitney 3 groups: Kruskal Wallis Continuous-Normal 2 groups: T tests 3 groups: ANOVA
75
Categorical data
78
Relationships
80
Linear Regression
81
Here the DEPENDENT (logTG) and INDEPENDENT VARIABLES are continuous So how much does logTG increase if waist increases by 1cm = the beta coefficient
82
What if the INDEP=Categorical regress age menop Source | SS df MS Number of obs = 86 -------------+------------------------------ F( 1, 84) = 135.01 Model | 3499.71205 1 3499.71205 Prob > F = 0.0000 Residual | 2177.49725 84 25.9225863 R-squared = 0.6164 -------------+------------------------------ Adj R-squared = 0.6119 Total | 5677.2093 85 66.7906977 Root MSE = 5.0914 ------------------------------------------------------------------------------ age | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- menopaus | 15.37628 1.323348 11.62 0.000 12.74465 18.0079 _cons | 46.57895 1.168053 39.88 0.000 44.25615 48.90175 ------------------------------------------------------------------------------ Menop= 0 or 1……. INTERPRETATION??
83
Logistic regression Outcome is heart disease (Yes/No… ?) Independent var = age. logistic CVD age Logistic regression Number of obs = 48 LR chi2(1) = 2.51 Prob > chi2 = 0.1133 Log likelihood = -29.945379 Pseudo R2 = 0.0402 died | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+------------------------------------------------------------ age | 1.093467.064069 1.52 0.127.9748363 1.226535 --------------------------------------------------------------------------- ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.