Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND
Statistical inference revisited Statistical inference use data from samples to make inferences about a population 1. Estimate the population parameter Characterized by confidence interval of the magnitude of effect of interest 2. Test the hypothesis being formulated before looking at the data Characterized by p-value
n = 25 X = 52 SD = 5 Sample Population Parameter estimation [95%CI] Hypothesis testing [P-value] Parameter estimation [95%CI] Hypothesis testing [P-value]
n = 25 X = 52 SD = 5 SE = 1 Sample Population Parameter estimation [95%CI] : (1) to (1) to We are 95% confidence that the population mean would lie between and [95%CI] : (1) to (1) to We are 95% confidence that the population mean would lie between and Z = 2.58 Z = 1.96 Z = 1.64
n = 25 X = 52 SD = 5 SE = 1 Sample Hypothesis testing Hypothesis testing Population Z = 55 – H 0 : = 55 H A : 55
Hypothesis testing H 0 : = 55 H A : 55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is Hypothesis testing H 0 : = 55 H A : 55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is Z = 55 – P-value = = SE +3SE
Calculation of the previous example based on t-distribution Stata command to find probability.di (ttail(24, 3))* Stata command to find t value for 95%CL. di (invttail(24, 0.025)) Web base stat table: base stat table: or
Revisit the example based on t-distribution (Stata output) Variable | Obs Mean Std. Err. [95% Conf. Interval] | Estimate the population parameter 2. Test the hypothesis being formulated before looking at the data One-sample t test | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] x | mean = mean(x) t = Ho: mean = 55 degrees of freedom = 24 Ha: mean 55 Pr(T |t|) = Pr(T > t) =
Mean one group: T-test a HypothesisH 0 : = 0 H a : 0 2. Data 3. Calculating for t-statistic 4. Obtain p-value based on t-distribution 5. Make a decision P-value = Reject the null hypothesis at level of significant of 0.05 The mean of y is statistically significantly different from zero. Stata command.di (ttail(4, 3.59))*
Mean one group: T-test (cont.) One-sample t test Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] y | mean = mean(y) t = Ho: mean = 0 degrees of freedom = 4 Ha: mean 0 Pr(T |t|) = Pr(T > t) =
One sample t-test using SPSS Please do the data analysis using SPSS and paste the results here.
Comparing 2 means: T-test ab HypothesisH 0 : A = B H a : A B 2. Data 3. Calculating for t-statistic 4. Obtain p-value based on t-distribution 5. Make a decision P-value = ( Reject the null hypothesis at level of significant of 0.05 Mean of Group A is statistically significantly different from that of Group B.
T-test Two-sample t test with equal variances Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] a | b | combined | diff | diff = mean(1) - mean(2) t = Ho: diff = 0 degrees of freedom = 8 Ha: diff 0 Pr(T |t|) = Pr(T > t) =
Two independent sample t-test using SPSS Please do the data analysis using SPSS and paste the results here.
Mann-Whitney U test Wilcoxon rank-sum test Two-sample Wilcoxon rank-sum (Mann-Whitney) test group | obs rank sum expected | | combined | unadjusted variance adjustment for ties adjusted variance Ho: y(group==1) = y(group==2) z = Prob > |z| =
Mann-Whitney U test Wilcoxon rank-sum test using SPSS Please do the data analysis using SPSS and paste the results here.
Comparing 2 means : ANOVA Mathematical model of ANOVAX = + + X = Grand mean + Treatment effect + Error X = M + T + E 3. Calculating for F-statistic 4. Obtain p-value based on F-distribution 5. Make a decision P-value = ( Reject the null hypothesis at level of significant of 0.05 Mean of Group A is statistically significantly different from that of Group B. XM T Mean: 3 8 = ++ E [3-5.5] [8-5.5] SST SSE Degree of freedom Between groups Within groups
ANOVA 2 groups Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi2(1) = Prob>chi2 = 0.885
Comparing 3 means: ANOVA 1. HypothesisH 0 : A = B = C H a : At least one mean is difference 2. Data abc
ANOVA 3 groups (cont.) Mathematical model of ANOVAX = + + X = Grand mean + Treatment effect + Error X = M + T + E 3. Calculating for F-statistic 4. Obtain p-value based on F-distribution 5. Make a decision P-value = ( Reject the null hypothesis at level of significant of 0.05 At least one mean of the three groups is statistically significantly different from the others. XM T Mean: =+ + E [3-5.4] [8-5.4] [ ] SST SSE Df: Between groups Within groups
ANOVA 3 groups Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi2(2) = Prob>chi2 = 0.989
ANOVA 3 groups using SPSS Please do the data analysis using SPSS and paste the results here.
Kruskal-Wallis test Kruskal-Wallis equality-of-populations rank test | group | Obs | Rank Sum | | | | 1 | 5 | | | 2 | 5 | | | 3 | 5 | | chi-squared = with 2 d.f. probability = chi-squared with ties = with 2 d.f. probability =
Kruskal-Wallis test using SPSS Please do the data analysis using SPSS and paste the results here.
Comparing 2 means: Regression ab Data xy(x-x)(x-x) 2 (y-y)(x-x)(y-y) Mean Sum y = a + bx whereb = 12.5/2.5 = 5, then 5.5=a + 5(1.5) Thusa = = -2
Comparing 2 means: Regression Y x a b ab
Comparing 2 means: Regression (cont.) Y x xy
Comparing 2 means: Regression (cont.) Y x y = a + bx b difference of y between x=1 vs. x=2 a y = 3 if x = 1 y = 8 if x = 2 y = -2 if x = 0 y = x y = 5.5; x = 1.5
Regression model (2 means) Source | SS df MS Number of obs = F( 1, 8) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] group | _cons |
i.group _Igroup_1-3 (naturally coded; _Igroup_1 omitted) Source | SS df MS Number of obs = F( 2, 12) = 9.71 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] _Igroup_2 | _Igroup_3 | _cons | Regression model (3 means)
Correlation coefficient Pearson product moment correlation – Denoted by r (for the sample) or (for the population) – Require bivariate normal distribution assumption – Require linear relationship Spearman rank correlation – For small sample, not require bivariate normal distribution assumption
Regression model using SPSS Please do the data analysis using SPSS and paste the results here.
Pearson product moment correlation or Indeed it is the mean of the product of standard score.
Scatter plot b a ab
Calculation for correlation coefficient(r) [1] x [2] y [3] (x-x)/SD [4] (y-y)/SD[3] x [4] Sum1.85 Mean38 SD
Interpretation of correlation coefficient CorrelationNegativePositive None−0.09 to to 0.09 Small−0.30 to − to 0.30 Medium−0.50 to − to 0.50 Strong−1.00 to − to 1.00 These serve as a guide, not a strict rule. In fact, the interpretation of a correlation coefficient depends on the context and purposes. From Wikipedia, the free encyclopedia
The correlation coefficient reflects the non-linearity and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). The figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero. This is a file from the Wikimedia Commons.Wikimedia Commons
Inference on correlation coefficient Stata commands:.di tanh(-0.885) di tanh(1.887)
Stata command ci2 x y, corr spearman Confidence interval for Spearman's rank correlation of x and y, based on Fisher's transformation. Correlation = on 5 observations (95% CI: to 0.942) Warning: This method may not give valid results with small samples (n<= 10) for rank correlations.
Inference on correlation coefficient Or use Stata command.di (ttail(3, 0.9))*
Inference on correlation coefficient using SPSS Please do the data analysis using SPSS and paste the results here.
Inference on proportion One proportion Two proportions Three or more proportions
One proportion: Z-test y HypothesisH 0 : 1 = 0 H a : 1 0 2. Data 3. Calculating for z-statistic n y = 50, p y = Obtain p-value based on Z-distribution 5. Make a decision P-value = ( Reject the null hypothesis at a level of significant of 0.05 Proportion of Y is statistically significantly different from zero. Stata command to get the p-vale. di (1-normal(2.357))*
Comparing 2 proportions: Z-test xy HypothesisH 0 : 1 = 0 H a : 1 0 2. Data 3. Calculating for z-statistic n 0 = 50, p 0 = 0.1 n 1 = 50, p 1 = Obtain p-value based on t-distribution 5. Make a decision P-value = ( Reject the null hypothesis at level of significant of 0.05 Proportion of Y between group of x is statistically significantly different from each other. x y 01Total Total
Z-test for two proportions Two-sample test of proportions 0: Number of obs = 50 1: Number of obs = Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] | | diff | | under Ho: diff = prop(0) - prop(1) z = Ho: diff = 0 Ha: diff 0 Pr(Z z) =
Comparing 2 proportions: Chi-square-test 1. HypothesisH 0 : ij = i+ +jwhere I = 0, 1; j = 0, 1 H a : ร่ i+ +j 2. Data 3. Calculating for 2 -statistic 4. Obtain p-value based on t-distribution 5. Make a decision P-value = ( Reject the null hypothesis at level of significant of 0.05 There is statistically significantly association between x and y. x y 01Total Total OE(O-E)(O-E) 2 (O-E) 2 /E 45 (75/100) 50 = (25/100) 50 = (75/100) 50 = (25/100) 50 = Chi-square (df = 1)12.00
Comparing 2 proportions: Chi-square-test | y x | 0 1 | Total | 45 5 | 50 1 | | Total | | 100 Pearson chi2(1) = Pr = 0.001
csi , or exact | Exposed Unexposed | Total Cases | 20 5 | 25 Noncases | | Total | | 100 | | Risk |.4.1 |.25 | | | Point estimate | [95% Conf. Interval] | Risk difference |.3 | Risk ratio | 4 | Attr. frac. ex. |.75 | Attr. frac. pop |.6 | Odds ratio | 6 | (Cornfield) sided Fisher's exact P = sided Fisher's exact P =
Binomial regression. binreg y x, rr Generalized linear models No. of obs = 100 Optimization : MQL Fisher scoring Residual df = 98 (IRLS EIM) Scale parameter = 1 Deviance = (1/df) Deviance = Pearson = (1/df) Pearson = Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u) [Log] BIC = | EIM y | Risk Ratio Std. Err. z P>|z| [95% Conf. Interval] x | _cons |
Logistic regression. logistic y x Logistic regression Number of obs = 100 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] x | _cons |