Download presentation
Presentation is loading. Please wait.
Published byErik Warren Modified over 8 years ago
1
2014.3.25 1 Medical Statistics Medical Statistics Tao Yuchun Tao Yuchun 9 http://cc.jlu.edu.cn/ms.html
2
2014.3.25 2 Statistical Analysis of Enumeration Data Statistical Analysis of Enumeration Data 2. Statistical Inference for enumeration data
3
2014.3.25 3 9.1 Sampling error of frequency Example Suppose the death rate is 0.2, if the rats are fed with a kind of poison. What will happen when we do the experiment on n=1, 2, 3 or 4 rat(s)?
4
2014.3.25 4
5
5 In general In general, Supposed the population proportion is , sample size =n. The frequency is a random variable. When is unknown and n is big enough, is approximately equal to
6
2014.3.25 6 Example Example 9-1 HBV Surface antigen. 200 people were tested, 7 positive.
7
2014.3.25 7 In theory If the sample size n is big enough, and observed frequency is p, then we have approximately
8
2014.3.25 8 9.2 Confidence Interval of Probability If the sample size n is big enough, and observed frequency is p, then 95% Confidence interval: 99% Confidence interval:
9
2014.3.25 9 Example Example 9-2 HBV Surface antigen. 200 people were tested, 7 positive. Calculate confidence interval for the π.
10
2014.3.25 10 μ Distinguish between μ and for sampling error and confidence interval
11
2014.3.25 11 9.3 The hypothesis testing of proportion (Z test) (1) Comparison of sample proportion and population proportion ( One- sample Z test) Example Example 9-3 Cerebral infarction Cases Cure rate New Method 98 50% Routine 30%. 50% is sample proportion, p=50%. 30% is population proportion, π 0 =30%.
12
2014.3.25 12 Hypotheses and α : α= 0.05 Statistic Z : Decision rule : If |Z| ≥ Z α, then reject H 0 ; Otherwise, no reason to reject H 0 (accept H 0 ).
13
2014.3.25 13 Z α is : Two sides: One side: Since |Z|=4.32 > Z 0.05 =1.96, reject H 0. New method is better than routine. (2) Comparison of two sample proportions ( Two-samples Z test) Example Example 9-4 Carrier rate of Hepatitis in B City: 522 people were tested, 24 carriers, p 1 = 4.06% (population carrier rate: 1 ); in Countryside: 478 people were tested, 33 carriers, p 2 = 6.90% (population carrier rate: 2 ).
14
2014.3.25 14 α= 0.05
15
2014.3.25 15 here p c is pooled estimation of two sample proportions, S p1-p2 is standard error of p 1 -p 2. Statistic Z : Decision rule : If |Z| ≥ Z α, then reject H 0 ; Otherwise, no reason to reject H 0 (accept H 0 ). Since |Z|=1.565 < Z 0.05 =1.96, not reject H 0. B City is same as Countryside for population carrier rate ( 1 = 2 ).
16
2014.3.25 16 Summary The parameter estimation and hypothesis testing of proportion are based on the normal approximation (when sample size is big enough). How big is enough? By experience, n > 5 and n(1- ) >5. np > 5 and n(1-p) >5 For sample: np > 5 and n(1-p) >5. If the sample size is not big, Z test can’t be used and there is no t-test for proportion. (see more detailed text book)
17
2014.3.25 17 9.4 Chi-square test The Z test can only be used for comparing with a given 0 (one sample) or comparing 1 with 2 (two samples). If we need to compare more than two samples, Chi-square test is widely used.
18
2014.3.25 18 (1) Basic idea of χ 2 test Given a set of actual frequency distribution A 1, A 2, A 3 … to test whether the data follow certain theory. If the theory is true, then we will have a set of theoretical frequency distribution: T 1, T 2, T 3 … Comparing A 1, A 2, A 3 … and T 1, T 2, T 3 …, If they are quite different, then the theory might not be true; Otherwise, the theory is acceptable.
19
2014.3.25 19 (2) Chi-square test for 2×2 table Example Example 9-5 Acute lower respiratory infection TreatmentEffectNon-effectTotalEffect rate Drug A68(64.82) a6(9.18) b74 (a+b)91.89 % Drug B52(55.18) c11(7.82) d63(c+d)82.54 % Total120 (a+c)17 (b+d)13787.59 % H: 1 = 2 H 0 : 1 = 2 H: 1 ≠ 2 H 1 : 1 ≠ 2 =0.05 α=0.05 1 2 here 1 is population effect rate for drug A, 2 is population effect rate for drug B.
20
2014.3.25 20 To calculate the theoretical frequencies; If H 1 = 2 120/137 If H 0 is true, 1 = 2 120/137 T 11 =74 120/137 =64.82, T 21 =63 120/137=55.18 T 11 =74 120/137 =64.82, T 21 =63 120/137=55.18 T 12 =74 17/137 =9.18, T 22 =63 17/137=7.82 T 12 =74 17/137 =9.18, T 22 =63 17/137=7.82 To compare A and T by a statistic 2 ;
21
2014.3.25 21 Chi-square test was invented Karl Pearson by Karl Pearson. Chi-square test is also called Pearson’s chi-square test. Karl Pearson 1857 - 1936 chi-square distribution If H 0 is true, 2 follows a chi-square distribution. = (row-1)(column-1) If the 2 value is big enough, we doubt about H 0, then reject H 0 !
22
2014.3.25 22 ExampleFor Example 9-5 : = (row-1)(column-1)=(2-1)(2-1)=1, 2 α(ν) = 2 0.05(1) =3.84, Now, 2 =2.734<3.84, then P > 0.05, H 0 is not rejected. We have no reason to say the effects of two treatments are different. Question: What is ?Question: What is 2 α(ν) ? Why, then ? Why 2 0.05 ?
23
2014.3.25 23 χ2χ2 ν=3ν=3 ν=5ν=5 ν = 10 ν = 30 Chi-square distribution is a distribution for continuous variable. Chi-square distribution has a parameter-- (degree of freedom), it determines shape of 2 curve. The area under 2 curve is distribution of 2 probability. The 2 curves for different
24
2014.3.25 24 The Table for 2 distribution. 2 critical value denotes 2 α(ν), α is probability, ν is degree of freedom. The area under the 2 curve means [ for 2 0.05(1) ]:
25
2014.3.25 25 2 2 table For 2 2 table, there is a specific formula of chi- square calculation: ExampleFor Example 9-5 :
26
2014.3.25 26 Chi-square test required large sample. Pearson’s chi-square test statistic follows chi-square distribution approximately. (1)andevery (1) If n≥40, and every T i ≥ 5, 2 test is applicable; (2)or (2) If n < 40 or T i < 1, 2 test is not applicable, you Fisher’s Exact Test should use Fisher’s Exact Test; (3)andonly one (3) If n≥40, and only one 1≤T i < 5, 2 test needs adjustment. 2 2 tableFor 2 2 table :
27
2014.3.25 27 2 2 table The correction formula of 2 test for 2 2 table :
28
2014.3.25 28 Example Example 9-6 Hematosepsis TreatmentEffectiveNo effectTotalEffective rate (%) Drug A28 (26.09)2 (3.91)3093.33 Drug B12 (13.91)4 (2.09)1675.00 Total40 64686.96 Here n=46>40, but T 12 =30 6/46=3.91< 5; T 22 =16 6/46=2.09< 5. You should use the correction formula of 2 test 2 2 table for 2 2 table :
29
2014.3.25 29
30
2014.3.25 30 (3) Chi-square test for R×C table Example Example 9-7 Leukaemia H: H 0 : The distributions of blood types in two populations are all same H: H 1 : The distributions are not all same
31
2014.3.25 31 R×C table : The formula of 2 test statistic for R×C table : ExampleFor Example 9-7 : ν=(R - 1)(C - 1)=(2-1)(4-1)=3, Checked χ 2 0.05(3) =7.81, now χ 2 =1.84 < 7.81, then P > 0.05, H 0 is not rejected. The distributions of blood types in two populations are same.
32
2014.3.25 32 Question: Why, thenQuestion: Why 2 =1.84 < 2 0.05(3) =7.81, then ? P > 0.05 ? The answer is in this figure !The answer is in this figure !
33
2014.3.25 33 (4) Caution for Chi-square test (1)2 2 tableR C table contingency table2 2 table R C table (1) Either 2 2 table or R C table are all called contingency table. 2 2 table is a special case of R C table. (2) (2) When R >2, “H 0 is rejected”only means there is difference among some groups. Does not necessary mean that all the groups are different. (3) (3) The 2 test requires large sample : By experience, The theoretical frequencies should be greater than 5 in more than 4/5 cells The theoretical frequencies should be greater than 5 in more than 4/5 cells ;
34
2014.3.25 34 The theoretical frequency in any cell should be greater than 1 The theoretical frequency in any cell should be greater than 1. Otherwise, we can not use chi-square test directly. If the above requirements are violated, what should we do? If the above requirements are violated, what should we do? (1) Increase the sample size. (2) Re-organize the categories, Pool some categories, or Cancel some categories. categories, or Cancel some categories.
35
2014.3.25 35 C You should know You should know: Chi-square test Chi-square test is a very important method of Statistical inference for enumeration data !
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.