Download presentation
Presentation is loading. Please wait.
Published byCatherine Dean Modified over 8 years ago
1
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND
2
Statistical inference revisited Statistical inference use data from samples to make inferences about a population 1. Estimate the population parameter Characterized by confidence interval of the magnitude of effect of interest 2. Test the hypothesis being formulated before looking at the data Characterized by p-value
3
n = 25 X = 52 SD = 5 Sample Population Parameter estimation [95%CI] Hypothesis testing [P-value] Parameter estimation [95%CI] Hypothesis testing [P-value]
4
n = 25 X = 52 SD = 5 SE = 1 Sample Population Parameter estimation [95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96 We are 95% confidence that the population mean would lie between 50.04 and 53.96 [95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96 We are 95% confidence that the population mean would lie between 50.04 and 53.96 Z = 2.58 Z = 1.96 Z = 1.64
5
n = 25 X = 52 SD = 5 SE = 1 Sample Hypothesis testing Hypothesis testing Population Z = 55 – 52 1 3 H 0 : = 55 H A : 55
6
Hypothesis testing H 0 : = 55 H A : 55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027. Hypothesis testing H 0 : = 55 H A : 55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027. Z = 55 – 52 1 3 P-value = 1-0.9973 = 0.0027 5552 -3SE +3SE
7
Calculation of the previous example based on t-distribution Stata command to find probability.di (ttail(24, 3))*2.00620574 Stata command to find t value for 95%CL. di (invttail(24, 0.025)) 2.0638986 Web base stat table: http://vassarstats.net/tabs.htmlWeb base stat table: http://vassarstats.net/tabs.html or www.stattrek.comwww.stattrek.com
8
Revisit the example based on t-distribution (Stata output) Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------- | 25 52 1 49.9361 54.0639 1. Estimate the population parameter 2. Test the hypothesis being formulated before looking at the data One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 52 1 5 49.9361 54.0639 ------------------------------------------------------------------------------ mean = mean(x) t = -3.0000 Ho: mean = 55 degrees of freedom = 24 Ha: mean 55 Pr(T |t|) = 0.0062 Pr(T > t) = 0.9969
9
Mean one group: T-test a 1 2 2 5 5 1. HypothesisH 0 : = 0 H a : 0 2. Data 3. Calculating for t-statistic 4. Obtain p-value based on t-distribution 5. Make a decision P-value = 0.023 Reject the null hypothesis at level of significant of 0.05 The mean of y is statistically significantly different from zero. Stata command.di (ttail(4, 3.59))*2.02296182
10
Mean one group: T-test (cont.) One-sample t test ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- y | 5 3.83666 1.870829.6770594 5.322941 ------------------------------------------------------------------------------ mean = mean(y) t = 3.5857 Ho: mean = 0 degrees of freedom = 4 Ha: mean 0 Pr(T |t|) = 0.0231 Pr(T > t) = 0.0115
11
Comparing 2 means: T-test ab 15 29 29 58 59 1. HypothesisH 0 : A = B H a : A B 2. Data 3. Calculating for t-statistic 4. Obtain p-value based on t-distribution 5. Make a decision P-value = 0.002 (http://vassarstats.net/tabs.html)http://vassarstats.net/tabs.html Reject the null hypothesis at level of significant of 0.05 Mean of Group A is statistically significantly different from that of Group B.
12
T-test Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- a | 5 3.83666 1.870829.6770594 5.322941 b | 5 8.7745967 1.732051 5.849375 10.15063 ---------+-------------------------------------------------------------------- combined | 10 5.5.9916317 3.135815 3.256773 7.743227 ---------+-------------------------------------------------------------------- diff | -5 1.140175 -7.629249 -2.370751 ------------------------------------------------------------------------------ diff = mean(1) - mean(2) t = -4.3853 Ho: diff = 0 degrees of freedom = 8 Ha: diff 0 Pr(T |t|) = 0.0023 Pr(T > t) = 0.9988
13
Mann-Whitney U test Wilcoxon rank-sum test Two-sample Wilcoxon rank-sum (Mann-Whitney) test group | obs rank sum expected -------------+--------------------------------- 1 | 5 16 27.5 2 | 5 39 27.5 -------------+--------------------------------- combined | 10 55 55 unadjusted variance 22.92 adjustment for ties -1.25 ---------- adjusted variance 21.67 Ho: y(group==1) = y(group==2) z = -2.471 Prob > |z| = 0.0135
14
Comparing 2 means : ANOVA Mathematical model of ANOVAX = + + X = Grand mean + Treatment effect + Error X = M + T + E 3. Calculating for F-statistic 4. Obtain p-value based on F-distribution 5. Make a decision P-value = 0.002 (http://vassarstats.net/tabs.html)http://vassarstats.net/tabs.html Reject the null hypothesis at level of significant of 0.05 Mean of Group A is statistically significantly different from that of Group B. XM T Mean: 3 8 = ++ E [3-5.5] [8-5.5] SST SSE Degree of freedom 1 1 8 Between groups Within groups
15
ANOVA 2 groups Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 62.5 1 62.5 19.23 0.0023 Within groups 26 8 3.25 ------------------------------------------------------------------------ Total 88.5 9 9.83333333 Bartlett's test for equal variances: chi2(1) = 0.0211 Prob>chi2 = 0.885
16
Comparing 3 means: ANOVA 1. HypothesisH 0 : A = B = C H a : At least one mean is difference 2. Data abc 154 294 296 588 594
17
ANOVA 3 groups (cont.) Mathematical model of ANOVAX = + + X = Grand mean + Treatment effect + Error X = M + T + E 3. Calculating for F-statistic 4. Obtain p-value based on F-distribution 5. Make a decision P-value = 0.003 (http://vassarstats.net/tabs.html)http://vassarstats.net/tabs.html Reject the null hypothesis at level of significant of 0.05 At least one mean of the three groups is statistically significantly different from the others. XM T Mean: 3 8 5.2 =+ + E [3-5.4] [8-5.4] [5.2-5.4] SST SSE Df: 15 1 2 12 Between groups Within groups
18
ANOVA 3 groups Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 62.8 2 31.4 9.71 0.0031 Within groups 38.8 12 3.23333333 ------------------------------------------------------------------------ Total 101.6 14 7.25714286 Bartlett's test for equal variances: chi2(2) = 0.0217 Prob>chi2 = 0.989
19
Kruskal-Wallis test Kruskal-Wallis equality-of-populations rank test +------------------------+ | group | Obs | Rank Sum | |-------+-----+----------| | 1 | 5 | 22.00 | | 2 | 5 | 61.50 | | 3 | 5 | 36.50 | +------------------------+ chi-squared = 7.985 with 2 d.f. probability = 0.0185 chi-squared with ties = 8.190 with 2 d.f. probability = 0.0167
20
Comparing 2 means: Regression ab 15 29 29 58 59 1. Data xy(x-x)(x-x) 2 (y-y)(x-x)(y-y) 11 -0.50.25-4.52.25 12 -0.50.25-3.51.75 12 -0.50.25-3.51.75 15 -0.50.25-0.50.25 15 -0.50.25-0.50.25 25 0.50.25-0.5-0.30 29 0.50.253.51.75 29 0.50.253.51.75 28 0.50.252.51.25 29 0.50.253.51.75 Mean 1.5 5.5 Sum 2.5 12.5 y = a + bx whereb = 12.5/2.5 = 5, then 5.5=a + 5(1.5) Thusa = 5.5-7.5 = -2
21
Comparing 2 means: Regression Y x 10 0 2 4 6 8 -2 a b ab 15 29 29 58 59
22
Comparing 2 means: Regression (cont.) Y x 10 0 2 4 6 8 -2 1 2 xy 1 1 1 2 1 2 1 5 1 5 25 29 29 28 29
23
Comparing 2 means: Regression (cont.) Y x 10 0 2 4 6 8 -2 1 2 y = a + bx b difference of y between x=1 vs. x=2 a y = 3 if x = 1 y = 8 if x = 2 y = -2 if x = 0 y = -2 + 5x y = 5.5; x = 1.5
24
Regression model (2 means) Source | SS df MS Number of obs = 10 -------------+------------------------------ F( 1, 8) = 19.23 Model | 62.5 1 62.5 Prob > F = 0.0023 Residual | 26 8 3.25 R-squared = 0.7062 -------------+------------------------------ Adj R-squared = 0.6695 Total | 88.5 9 9.83333333 Root MSE = 1.8028 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- group | 5 1.140175 4.39 0.002 2.370751 7.629249 _cons | -2 1.802776 -1.11 0.299 -6.157208 2.157208 ------------------------------------------------------------------------------
25
i.group _Igroup_1-3 (naturally coded; _Igroup_1 omitted) Source | SS df MS Number of obs = 15 -------------+------------------------------ F( 2, 12) = 9.71 Model | 62.8 2 31.4 Prob > F = 0.0031 Residual | 38.8 12 3.23333333 R-squared = 0.6181 -------------+------------------------------ Adj R-squared = 0.5545 Total | 101.6 14 7.25714286 Root MSE = 1.7981 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Igroup_2 | 5 1.137248 4.40 0.001 2.522149 7.477851 _Igroup_3 | 2.2 1.137248 1.93 0.077 -.2778508 4.677851 _cons | 3.8041559 3.73 0.003 1.247895 4.752105 ------------------------------------------------------------------------------ Regression model (3 means)
26
Correlation coefficient Pearson product moment correlation – Denoted by r (for the sample) or (for the population) – Require bivariate normal distribution assumption – Require linear relationship Spearman rank correlation – For small sample, not require bivariate normal distribution assumption
27
Pearson product moment correlation or Indeed it is the mean of the product of standard score.
28
Scatter plot b a 10 0 2 4 6 8 ab 15 29 29 58 59 5 1 234
29
Calculation for correlation coefficient(r) [1] x [2] y [3] (x-x)/SD [4] (y-y)/SD[3] x [4] 15-1.07-1.731.85 29-0.530.58-0.31 29-0.530.58-0.31 581.070.00 591.070.580.62 Sum1.85 Mean38 SD1.871.73
30
Interpretation of correlation coefficient CorrelationNegativePositive None−0.09 to 0.000.00 to 0.09 Small−0.30 to −0.100.10 to 0.30 Medium−0.50 to −0.300.30 to 0.50 Strong−1.00 to −0.500.50 to 1.00 These serve as a guide, not a strict rule. In fact, the interpretation of a correlation coefficient depends on the context and purposes. From Wikipedia, the free encyclopedia
31
The correlation coefficient reflects the non-linearity and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). The figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero. This is a file from the Wikimedia Commons.Wikimedia Commons
32
Inference on correlation coefficient Stata commands:.di tanh(-0.885) -.70891534.di tanh(1.887).95511058
33
Stata command ci2 x y, corr spearman Confidence interval for Spearman's rank correlation of x and y, based on Fisher's transformation. Correlation = 0.354 on 5 observations (95% CI: - 0.768 to 0.942) Warning: This method may not give valid results with small samples (n<= 10) for rank correlations.
34
Inference on correlation coefficient Or use Stata command.di (ttail(3, 0.9))*2.43445103
35
Inference on proportion One proportion Two proportions Three or more proportions
36
One proportion: Z-test y 1 0 1... 0 1. HypothesisH 0 : 1 = 0 H a : 1 0 2. Data 3. Calculating for z-statistic n y = 50, p y = 0.1 4. Obtain p-value based on Z-distribution 5. Make a decision P-value = 0.018 (http://vassarstats.net/tabs.html)http://vassarstats.net/tabs.html Reject the null hypothesis at a level of significant of 0.05 Proportion of Y is statistically significantly different from zero. Stata command to get the p-vale. di (1-normal(2.357))*2.01842325
37
Comparing 2 proportions: Z-test xy 11 10 01... 10 1. HypothesisH 0 : 1 = 0 H a : 1 0 2. Data 3. Calculating for z-statistic n 0 = 50, p 0 = 0.1 n 1 = 50, p 1 = 0.4 4. Obtain p-value based on t-distribution 5. Make a decision P-value = 0.0005 (http://vassarstats.net/tabs.html)http://vassarstats.net/tabs.html Reject the null hypothesis at level of significant of 0.05 Proportion of Y between group of x is statistically significantly different from each other. x y 01Total 045550 1302050 Total7525100
38
Z-test for two proportions Two-sample test of proportions 0: Number of obs = 50 1: Number of obs = 50 ------------------------------------------------------------------------------ Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 0 |.1.0424264.0168458.1831542 1 |.4.069282.2642097.5357903 -------------+---------------------------------------------------------------- diff | -.3.0812404 -.4592282 -.1407718 | under Ho:.0866025 -3.46 0.001 ------------------------------------------------------------------------------ diff = prop(0) - prop(1) z = -3.4641 Ho: diff = 0 Ha: diff 0 Pr(Z z) = 0.9997
39
Comparing 2 proportions: Chi-square-test 1. HypothesisH 0 : ij = i+ +jwhere I = 0, 1; j = 0, 1 H a : ร่ i+ +j 2. Data 3. Calculating for 2 -statistic 4. Obtain p-value based on t-distribution 5. Make a decision P-value = 0.001 (http://vassarstats.net/tabs.html)http://vassarstats.net/tabs.html Reject the null hypothesis at level of significant of 0.05 There is statistically significantly association between x and y. x y 01Total 045550 1302050 Total7525100 OE(O-E)(O-E) 2 (O-E) 2 /E 45 (75/100) 50 = 37.50 7.5056.251.50 5 (25/100) 50 =12.50 -7.5056.254.50 30 (75/100) 50 =37.50 -7.5056.251.50 20 (25/100) 50 =12.50 7.5056.254.50 Chi-square (df = 1)12.00
40
Comparing 2 proportions: Chi-square-test | y x | 0 1 | Total -----------+----------------------+---------- 0 | 45 5 | 50 1 | 30 20 | 50 -----------+----------------------+---------- Total | 75 25 | 100 Pearson chi2(1) = 12.0000 Pr = 0.001
41
csi 20 5 30 45, or exact | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 20 5 | 25 Noncases | 30 45 | 75 -----------------+------------------------+------------ Total | 50 50 | 100 | | Risk |.4.1 |.25 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference |.3 |.1407718.4592282 Risk ratio | 4 | 1.62926 9.820408 Attr. frac. ex. |.75 |.3862245.8981712 Attr. frac. pop |.6 | Odds ratio | 6 | 2.086602 17.09265 (Cornfield) +------------------------------------------------- 1-sided Fisher's exact P = 0.0005 2-sided Fisher's exact P = 0.0010
42
Binomial regression. binreg y x, rr Generalized linear models No. of obs = 100 Optimization : MQL Fisher scoring Residual df = 98 (IRLS EIM) Scale parameter = 1 Deviance = 99.80946404 (1/df) Deviance = 1.018464 Pearson = 99.99966753 (1/df) Pearson = 1.020405 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u) [Log] BIC = -351.4972 ------------------------------------------------------------------------------ | EIM y | Risk Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 4 1.833024 3.03 0.002 1.629265 9.820377 _cons |.1.0424262 -5.43 0.000.0435379.2296851 ------------------------------------------------------------------------------
43
Logistic regression. logistic y x Logistic regression Number of obs = 100 LR chi2(1) = 12.66 Prob > chi2 = 0.0004 Log likelihood = -49.904732 Pseudo R2 = 0.1125 ------------------------------------------------------------------------------ y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 6 3.316625 3.24 0.001 2.030635 17.72844 _cons |.1111111.0523783 -4.66 0.000.044106.2799096 ------------------------------------------------------------------------------
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.