Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Similar presentations


Presentation on theme: "Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics."— Presentation transcript:

1 Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University

2 Begin at the conclusion

3 Type of the study outcome: Key for selecting appropriate statistical methods Study outcome –Dependent variable or response variable –Focus on primary study outcome if there are more Type of the study outcome –Continuous –Categorical (dichotomous, polytomous, ordinal) –Numerical (Poisson) count –Even-free duration

4 The outcome determine statistics Continuous Mean Median Categorical Proportion (Prevalence Or Risk) Count Rate per “space” Survival Median survival Risk of events at T (t) Linear Reg. Logistic Reg. Poisson Reg. Cox Reg.

5 Statistics quantify errors for judgments Parameter estimation [95%CI] Hypothesis testing [P-value] Parameter estimation [95%CI] Hypothesis testing [P-value]

6 Common types of the statistical goals Single measurements (no comparison) Difference (compared by subtraction) Ratio (compared by division) Prediction (diagnostic test or predictive model) Correlation (examine a joint distribution) Agreement (examine concordance or similarity between pairs of observations)

7

8

9 Dependency of the study outcome required special statistical methods to handle it Continuous CategoricalCountSurvival Mean Median Proportion (Prevalence Or Risk) Rate per “space” Median survival Risk of events at T (t) Linear Reg.Logistic Reg.Poisson Reg.Cox Reg. Mixed model, multilevel model, GEE

10 Answer the research question based on lower or upper limit of the CI Back to the conclusion Continuous CategoricalCountSurvival Magnitude of effect 95% CI P-value Magnitude of effect 95% CI P-value Mean Median Proportion (Prevalence or Risk) Rate per “space” Median survival Risk of events at T (t) Appropriate statistical methods

11 Always report the magnitude of effect and its confidence interval Absolute effects: –Mean, Mean difference –Proportion or prevalence, Rate or risk, Rate or Risk difference –Median survival time Relative effects: –Relative risk, Rate ratio, Hazard ratio –Odds ratio Other magnitude of effects: –Correlation coefficient (r), Intra-class correlation (ICC) –Kappa –Diagnostic performance –Etc.

12 Touch the variability (uncertainty) to understand statistical inference idA(x- )(x- ) 2 12-24 22-24 30-416 42-24 51410100 Sum (  ) 200128 Mean( ) 4032.0 SD5.66 Median2 X X X 2+2+0+2+14 = 20 2+2+0+2+14 = 20 = 4 5 5 2+2+0+2+14 = 20 = 4 5 5 0 2 2 2 14 Variance = SD 2 Standard deviation = SD

13 Touch the variability (uncertainty) to understand statistical inference idA(x- )(x- ) 2 12-24 22-24 30-416 42-24 51410100 Sum (  ) 200128 Mean( ) 4032.0 SD5.66 Median2 X X X Measure of variation Measure of central tendency Measure of central tendency

14 Degree of freedom Standard deviation (SD) = The average distant between each data item to their mean

15 Same mean BUT different variation idA 12 22 30 42 514 Sum (  ) 20 Mean4 SD5.66 Median2 idC14 23 35 44 54 Sum (  ) 20 Mean4 SD0.71 Median4 Heterogeneous data Skew distribution Heterogeneous data Symmetry distribution idB10 23 34 45 58 Sum (  ) 20 Mean4 SD2.91 Median4 Homogeneous data Symmetry distribution

16 Facts about Variation Because of variability, repeated samples will NOT obtain the same statistic such as mean or proportion: –Statistics varies from study to study because of the role of chance –Hard to believe that the statistic is the parameter –Thus we need statistical inference to estimate the parameter based on the statistics obtained from a study Data varied widely = heterogeneous data Heterogeneous data requires large sample size to achieve a conclusive finding

17 The Histogram idA 12 22 30 42 514 B14 23 35 44 54 01234567891011121314 01234567891011121314

18 The Frequency Curve idA 12 22 30 42 514 B14 23 35 44 54 01234567891011121314 01234567891011121314

19 Area Under The Frequency Curve idA 12 22 30 42 514 B14 23 35 44 54 01234567891011121314 01234567891011121314

20 Central Limit Theorem Right Skew X1 Symmetry X2 Left Skew X3 Normally distributed X1X Xn 

21 Distribution of the sampling mean Distribution of the raw data Central Limit Theorem X1 X2 X3 X1X Xn 

22 Central Limit Theorem Distribution of the raw data X1X Xn  Distribution of the sampling mean (Theoretical) Normal Distribution Large sample

23 Central Limit Theorem Many X,, SD Standardized for whatever n, Mean = 0, Standard deviation = 1 Large sample X1X Xn  Many,, SE X X X Standard deviation of the sampling mean Standard error (SE) Estimated by SE = SD n 

24 (Theoretical) Normal Distribution

25

26 Mean ± 3SD 99.73% of AUC

27 Mean ± 2SD 95.45% of AUC

28 Mean ± 1SD 68.26% of AUC

29 n = 25 X = 52 SD = 5 Sample Population Parameter estimation [95%CI] Hypothesis testing [P-value] Parameter estimation [95%CI] Hypothesis testing [P-value]

30 5 = 1 5 Z = 2.58 Z = 1.96 Z = 1.64

31 n = 25 X = 52 SD = 5 SE = 1 Sample Population Parameter estimation [95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96 We are 95% confidence that the population mean would lie between 50.04 and 53.96 [95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96 We are 95% confidence that the population mean would lie between 50.04 and 53.96 Z = 2.58 Z = 1.96 Z = 1.64

32 n = 25 X = 52 SD = 5 SE = 1 Sample Hypothesis testing Hypothesis testing Population Z = 55 – 52 1 3 H 0 :  = 55 H A :   55

33 Hypothesis testing H 0 :  = 55 H A :   55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027. Hypothesis testing H 0 :  = 55 H A :   55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027. Z = 55 – 52 1 3 P-value = 1-0.9973 = 0.0027 5552 -3SE +3SE

34 P-value vs. 95%CI (1) A study compared cure rate between Drug A and Drug B Setting: Drug A = Alternative treatment Drug B = Conventional treatment Results: Drug A: n1 = 50, Pa = 80% Drug B: n2 = 50, Pb = 50% Pa-Pb = 30% (95%CI: 26% to 34%; P=0.001) An example of a study with dichotomous outcome

35 P-value vs. 95%CI (2) Pa-Pb = 30% (95%CI: 26% to 34%; P< 0.05) Pa > Pb Pb > Pa

36 P-value vs. 95%CI (3) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99

37 Tips #6 (b) P-value vs. 95%CI (4) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99 There were statistically significant different between the two groups.

38 Tips #6 (b) P-value vs. 95%CI (5) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99 There were no statistically significant different between the two groups.

39 P-value vs. 95%CI (4) Save tips: –Always report 95%CI with p-value, NOT report solely p-value –Always interpret based on the lower or upper limit of the confidence interval, p-value can be an optional –Never interpret p-value > 0.05 as an indication of no difference or no association, only the CI can provide this message.

40 Additional Notes

41 Alpha (  ) and Beta (  ) Alpha (  ) –Type I error –The probability that a statistical test will reject the null hypothesis when the null hypothesis is true –Making a false positive decision Beta (  ) –Type II error –The probability that a statistical test will NOT reject the null hypothesis when the null hypothesis is false –Making a false negative decision Power (1-  ) –The probability that a statistical test will reject the null hypothesis when the null hypothesis is false –That is, the probability of NOT committing a Type II error or a false negative decision –The higher the power, the lower Type II error –Also known as the specificity

42 Alpha (  ) and Beta (  ) cont. Significance level –A statement of how unlikely a result must be, if the null hypothesis is true, to be considered significant. –Need to be declare in advance, before looking at the data, preferably before data collection –Three most commonly used criteria of probabilities: 0.05 (5%, 1 in 20), 0.01 (1%, 1 in 100), and 0.001 (0.1%, 1 in 1000) P-value –The probability of having a results of as extreme as being obtained, given that the null hypothesis is true –Quantify it based on the data and the hypothesis –This is the evidence for making the decision whether to reject or not reject the null hypothesis –Reject the null hypothesis if the p-value less than the predefined level of significant and not reject otherwise

43 Alpha (  ) and Beta (  ) cont. Decision based on statistical test Hypothesis TrueFalse Not Reject Reject

44 Q & A Thank you


Download ppt "Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics."

Similar presentations


Ads by Google