Download presentation
Presentation is loading. Please wait.
Published byEdwin Bailey Modified over 9 years ago
1
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University
2
Begin at the conclusion
3
Type of the study outcome: Key for selecting appropriate statistical methods Study outcome –Dependent variable or response variable –Focus on primary study outcome if there are more Type of the study outcome –Continuous –Categorical (dichotomous, polytomous, ordinal) –Numerical (Poisson) count –Even-free duration
4
The outcome determine statistics Continuous Mean Median Categorical Proportion (Prevalence Or Risk) Count Rate per “space” Survival Median survival Risk of events at T (t) Linear Reg. Logistic Reg. Poisson Reg. Cox Reg.
5
Statistics quantify errors for judgments Parameter estimation [95%CI] Hypothesis testing [P-value] Parameter estimation [95%CI] Hypothesis testing [P-value]
6
Common types of the statistical goals Single measurements (no comparison) Difference (compared by subtraction) Ratio (compared by division) Prediction (diagnostic test or predictive model) Correlation (examine a joint distribution) Agreement (examine concordance or similarity between pairs of observations)
9
Dependency of the study outcome required special statistical methods to handle it Continuous CategoricalCountSurvival Mean Median Proportion (Prevalence Or Risk) Rate per “space” Median survival Risk of events at T (t) Linear Reg.Logistic Reg.Poisson Reg.Cox Reg. Mixed model, multilevel model, GEE
10
Answer the research question based on lower or upper limit of the CI Back to the conclusion Continuous CategoricalCountSurvival Magnitude of effect 95% CI P-value Magnitude of effect 95% CI P-value Mean Median Proportion (Prevalence or Risk) Rate per “space” Median survival Risk of events at T (t) Appropriate statistical methods
11
Always report the magnitude of effect and its confidence interval Absolute effects: –Mean, Mean difference –Proportion or prevalence, Rate or risk, Rate or Risk difference –Median survival time Relative effects: –Relative risk, Rate ratio, Hazard ratio –Odds ratio Other magnitude of effects: –Correlation coefficient (r), Intra-class correlation (ICC) –Kappa –Diagnostic performance –Etc.
12
Touch the variability (uncertainty) to understand statistical inference idA(x- )(x- ) 2 12-24 22-24 30-416 42-24 51410100 Sum ( ) 200128 Mean( ) 4032.0 SD5.66 Median2 X X X 2+2+0+2+14 = 20 2+2+0+2+14 = 20 = 4 5 5 2+2+0+2+14 = 20 = 4 5 5 0 2 2 2 14 Variance = SD 2 Standard deviation = SD
13
Touch the variability (uncertainty) to understand statistical inference idA(x- )(x- ) 2 12-24 22-24 30-416 42-24 51410100 Sum ( ) 200128 Mean( ) 4032.0 SD5.66 Median2 X X X Measure of variation Measure of central tendency Measure of central tendency
14
Degree of freedom Standard deviation (SD) = The average distant between each data item to their mean
15
Same mean BUT different variation idA 12 22 30 42 514 Sum ( ) 20 Mean4 SD5.66 Median2 idC14 23 35 44 54 Sum ( ) 20 Mean4 SD0.71 Median4 Heterogeneous data Skew distribution Heterogeneous data Symmetry distribution idB10 23 34 45 58 Sum ( ) 20 Mean4 SD2.91 Median4 Homogeneous data Symmetry distribution
16
Facts about Variation Because of variability, repeated samples will NOT obtain the same statistic such as mean or proportion: –Statistics varies from study to study because of the role of chance –Hard to believe that the statistic is the parameter –Thus we need statistical inference to estimate the parameter based on the statistics obtained from a study Data varied widely = heterogeneous data Heterogeneous data requires large sample size to achieve a conclusive finding
17
The Histogram idA 12 22 30 42 514 B14 23 35 44 54 01234567891011121314 01234567891011121314
18
The Frequency Curve idA 12 22 30 42 514 B14 23 35 44 54 01234567891011121314 01234567891011121314
19
Area Under The Frequency Curve idA 12 22 30 42 514 B14 23 35 44 54 01234567891011121314 01234567891011121314
20
Central Limit Theorem Right Skew X1 Symmetry X2 Left Skew X3 Normally distributed X1X Xn
21
Distribution of the sampling mean Distribution of the raw data Central Limit Theorem X1 X2 X3 X1X Xn
22
Central Limit Theorem Distribution of the raw data X1X Xn Distribution of the sampling mean (Theoretical) Normal Distribution Large sample
23
Central Limit Theorem Many X,, SD Standardized for whatever n, Mean = 0, Standard deviation = 1 Large sample X1X Xn Many,, SE X X X Standard deviation of the sampling mean Standard error (SE) Estimated by SE = SD n
24
(Theoretical) Normal Distribution
26
Mean ± 3SD 99.73% of AUC
27
Mean ± 2SD 95.45% of AUC
28
Mean ± 1SD 68.26% of AUC
29
n = 25 X = 52 SD = 5 Sample Population Parameter estimation [95%CI] Hypothesis testing [P-value] Parameter estimation [95%CI] Hypothesis testing [P-value]
30
5 = 1 5 Z = 2.58 Z = 1.96 Z = 1.64
31
n = 25 X = 52 SD = 5 SE = 1 Sample Population Parameter estimation [95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96 We are 95% confidence that the population mean would lie between 50.04 and 53.96 [95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96 We are 95% confidence that the population mean would lie between 50.04 and 53.96 Z = 2.58 Z = 1.96 Z = 1.64
32
n = 25 X = 52 SD = 5 SE = 1 Sample Hypothesis testing Hypothesis testing Population Z = 55 – 52 1 3 H 0 : = 55 H A : 55
33
Hypothesis testing H 0 : = 55 H A : 55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027. Hypothesis testing H 0 : = 55 H A : 55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027. Z = 55 – 52 1 3 P-value = 1-0.9973 = 0.0027 5552 -3SE +3SE
34
P-value vs. 95%CI (1) A study compared cure rate between Drug A and Drug B Setting: Drug A = Alternative treatment Drug B = Conventional treatment Results: Drug A: n1 = 50, Pa = 80% Drug B: n2 = 50, Pb = 50% Pa-Pb = 30% (95%CI: 26% to 34%; P=0.001) An example of a study with dichotomous outcome
35
P-value vs. 95%CI (2) Pa-Pb = 30% (95%CI: 26% to 34%; P< 0.05) Pa > Pb Pb > Pa
36
P-value vs. 95%CI (3) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99
37
Tips #6 (b) P-value vs. 95%CI (4) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99 There were statistically significant different between the two groups.
38
Tips #6 (b) P-value vs. 95%CI (5) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99 There were no statistically significant different between the two groups.
39
P-value vs. 95%CI (4) Save tips: –Always report 95%CI with p-value, NOT report solely p-value –Always interpret based on the lower or upper limit of the confidence interval, p-value can be an optional –Never interpret p-value > 0.05 as an indication of no difference or no association, only the CI can provide this message.
40
Additional Notes
41
Alpha ( ) and Beta ( ) Alpha ( ) –Type I error –The probability that a statistical test will reject the null hypothesis when the null hypothesis is true –Making a false positive decision Beta ( ) –Type II error –The probability that a statistical test will NOT reject the null hypothesis when the null hypothesis is false –Making a false negative decision Power (1- ) –The probability that a statistical test will reject the null hypothesis when the null hypothesis is false –That is, the probability of NOT committing a Type II error or a false negative decision –The higher the power, the lower Type II error –Also known as the specificity
42
Alpha ( ) and Beta ( ) cont. Significance level –A statement of how unlikely a result must be, if the null hypothesis is true, to be considered significant. –Need to be declare in advance, before looking at the data, preferably before data collection –Three most commonly used criteria of probabilities: 0.05 (5%, 1 in 20), 0.01 (1%, 1 in 100), and 0.001 (0.1%, 1 in 1000) P-value –The probability of having a results of as extreme as being obtained, given that the null hypothesis is true –Quantify it based on the data and the hypothesis –This is the evidence for making the decision whether to reject or not reject the null hypothesis –Reject the null hypothesis if the p-value less than the predefined level of significant and not reject otherwise
43
Alpha ( ) and Beta ( ) cont. Decision based on statistical test Hypothesis TrueFalse Not Reject Reject
44
Q & A Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.