Presentation is loading. Please wait.

Presentation is loading. Please wait.

2014.3.25 1 Medical Statistics Medical Statistics Tao Yuchun Tao Yuchun 9

Similar presentations


Presentation on theme: "2014.3.25 1 Medical Statistics Medical Statistics Tao Yuchun Tao Yuchun 9"— Presentation transcript:

1 2014.3.25 1 Medical Statistics Medical Statistics Tao Yuchun Tao Yuchun 9 http://cc.jlu.edu.cn/ms.html

2 2014.3.25 2 Statistical Analysis of Enumeration Data Statistical Analysis of Enumeration Data 2. Statistical Inference for enumeration data

3 2014.3.25 3 9.1 Sampling error of frequency Example Suppose the death rate is 0.2, if the rats are fed with a kind of poison. What will happen when we do the experiment on n=1, 2, 3 or 4 rat(s)?

4 2014.3.25 4

5 5 In general In general, Supposed the population proportion is , sample size =n. The frequency is a random variable. When  is unknown and n is big enough, is approximately equal to

6 2014.3.25 6 Example Example 9-1 HBV Surface antigen. 200 people were tested, 7 positive.

7 2014.3.25 7 In theory If the sample size n is big enough, and observed frequency is p, then we have approximately

8 2014.3.25 8 9.2 Confidence Interval of Probability If the sample size n is big enough, and observed frequency is p, then  95% Confidence interval:  99% Confidence interval:

9 2014.3.25 9 Example Example 9-2 HBV Surface antigen. 200 people were tested, 7 positive. Calculate confidence interval for the π.

10 2014.3.25 10 μ Distinguish between μ and  for sampling error and confidence interval

11 2014.3.25 11 9.3 The hypothesis testing of proportion (Z test) (1) Comparison of sample proportion and population proportion ( One- sample Z test) Example Example 9-3 Cerebral infarction Cases Cure rate New Method 98 50% Routine 30%. 50% is sample proportion, p=50%. 30% is population proportion, π 0 =30%. 

12 2014.3.25 12 Hypotheses and α : α= 0.05 Statistic Z : Decision rule : If |Z| ≥ Z α, then reject H 0 ; Otherwise, no reason to reject H 0 (accept H 0 ).

13 2014.3.25 13 Z α is : Two sides: One side: Since |Z|=4.32 > Z 0.05 =1.96, reject H 0. New method is better than routine. (2) Comparison of two sample proportions ( Two-samples Z test) Example Example 9-4 Carrier rate of Hepatitis in B City: 522 people were tested, 24 carriers, p 1 = 4.06% (population carrier rate:  1 ); in Countryside: 478 people were tested, 33 carriers, p 2 = 6.90% (population carrier rate:  2 ).

14 2014.3.25 14 α= 0.05

15 2014.3.25 15 here p c is pooled estimation of two sample proportions, S p1-p2 is standard error of p 1 -p 2. Statistic Z : Decision rule : If |Z| ≥ Z α, then reject H 0 ; Otherwise, no reason to reject H 0 (accept H 0 ). Since |Z|=1.565 < Z 0.05 =1.96, not reject H 0. B City is same as Countryside for population carrier rate (  1 =  2 ).

16 2014.3.25 16 Summary The parameter estimation and hypothesis testing of proportion are based on the normal approximation (when sample size is big enough). How big is enough? By experience, n  > 5 and n(1-  ) >5. np > 5 and n(1-p) >5 For sample: np > 5 and n(1-p) >5. If the sample size is not big, Z test can’t be used and there is no t-test for proportion. (see more detailed text book)

17 2014.3.25 17  9.4 Chi-square test The Z test can only be used for comparing  with a given  0 (one sample) or comparing  1 with  2 (two samples). If we need to compare more than two samples, Chi-square test is widely used.

18 2014.3.25 18 (1) Basic idea of χ 2 test Given a set of actual frequency distribution A 1, A 2, A 3 … to test whether the data follow certain theory. If the theory is true, then we will have a set of theoretical frequency distribution: T 1, T 2, T 3 … Comparing A 1, A 2, A 3 … and T 1, T 2, T 3 …, If they are quite different, then the theory might not be true; Otherwise, the theory is acceptable.

19 2014.3.25 19 (2) Chi-square test for 2×2 table Example Example 9-5 Acute lower respiratory infection TreatmentEffectNon-effectTotalEffect rate Drug A68(64.82) a6(9.18) b74 (a+b)91.89 % Drug B52(55.18) c11(7.82) d63(c+d)82.54 % Total120 (a+c)17 (b+d)13787.59 % H:  1 =  2 H 0 :  1 =  2 H:  1 ≠  2 H 1 :  1 ≠  2 =0.05 α=0.05  1  2 here  1 is population effect rate for drug A,  2 is population effect rate for drug B.

20 2014.3.25 20 To calculate the theoretical frequencies; If H  1 =  2  120/137 If H 0 is true,  1 =  2  120/137 T 11 =74  120/137 =64.82, T 21 =63  120/137=55.18 T 11 =74  120/137 =64.82, T 21 =63  120/137=55.18 T 12 =74  17/137 =9.18, T 22 =63  17/137=7.82 T 12 =74  17/137 =9.18, T 22 =63  17/137=7.82 To compare A and T by a statistic  2 ;

21 2014.3.25 21 Chi-square test was invented Karl Pearson by Karl Pearson. Chi-square test is also called Pearson’s chi-square test. Karl Pearson 1857 - 1936 chi-square distribution  If H 0 is true,  2 follows a chi-square distribution. = (row-1)(column-1) If the  2 value is big enough, we doubt about H 0, then reject H 0 !

22 2014.3.25 22 ExampleFor Example 9-5 : = (row-1)(column-1)=(2-1)(2-1)=1,  2 α(ν) =  2 0.05(1) =3.84, Now,  2 =2.734<3.84, then P > 0.05, H 0 is not rejected. We have no reason to say the effects of two treatments are different. Question: What is ?Question: What is  2 α(ν) ? Why, then ? Why  2 0.05 ?

23 2014.3.25 23 χ2χ2 ν=3ν=3 ν=5ν=5 ν = 10 ν = 30 Chi-square distribution is a distribution for continuous variable. Chi-square distribution has a parameter-- (degree of freedom), it determines shape of  2 curve. The area under  2 curve is distribution of  2 probability. The  2 curves for different

24 2014.3.25 24 The Table for  2 distribution.  2 critical value denotes  2 α(ν), α is probability, ν is degree of freedom. The area under the  2 curve means [ for  2 0.05(1) ]:

25 2014.3.25 25 2  2 table For 2  2 table, there is a specific formula of chi- square calculation: ExampleFor Example 9-5 : 

26 2014.3.25 26 Chi-square test required large sample. Pearson’s chi-square test statistic follows chi-square distribution approximately. (1)andevery (1) If n≥40, and every T i ≥ 5,  2 test is applicable; (2)or (2) If n < 40 or T i < 1,  2 test is not applicable, you Fisher’s Exact Test should use Fisher’s Exact Test; (3)andonly one (3) If n≥40, and only one 1≤T i < 5,  2 test needs adjustment. 2  2 tableFor 2  2 table :

27 2014.3.25 27 2  2 table The correction formula of  2 test for 2  2 table : 

28 2014.3.25 28 Example Example 9-6 Hematosepsis TreatmentEffectiveNo effectTotalEffective rate (%) Drug A28 (26.09)2 (3.91)3093.33 Drug B12 (13.91)4 (2.09)1675.00 Total40 64686.96 Here n=46>40, but T 12 =30  6/46=3.91< 5; T 22 =16  6/46=2.09< 5. You should use the correction formula of  2 test 2  2 table for 2  2 table :

29 2014.3.25 29

30 2014.3.25 30 (3) Chi-square test for R×C table Example Example 9-7 Leukaemia H: H 0 : The distributions of blood types in two populations are all same H: H 1 : The distributions are not all same

31 2014.3.25 31 R×C table : The formula of  2 test statistic for R×C table : ExampleFor Example 9-7 : ν=(R - 1)(C - 1)=(2-1)(4-1)=3, Checked χ 2 0.05(3) =7.81, now χ 2 =1.84 < 7.81, then P > 0.05, H 0 is not rejected. The distributions of blood types in two populations are same.

32 2014.3.25 32 Question: Why, thenQuestion: Why  2 =1.84 <  2 0.05(3) =7.81, then ? P > 0.05 ? The answer is in this figure !The answer is in this figure !

33 2014.3.25 33 (4) Caution for Chi-square test (1)2  2 tableR  C table contingency table2  2 table R  C table (1) Either 2  2 table or R  C table are all called contingency table. 2  2 table is a special case of R  C table. (2) (2) When R >2, “H 0 is rejected”only means there is difference among some groups. Does not necessary mean that all the groups are different. (3) (3) The  2 test requires large sample : By experience, The theoretical frequencies should be greater than 5 in more than 4/5 cells  The theoretical frequencies should be greater than 5 in more than 4/5 cells ;

34 2014.3.25 34 The theoretical frequency in any cell should be greater than 1  The theoretical frequency in any cell should be greater than 1. Otherwise, we can not use chi-square test directly. If the above requirements are violated, what should we do? If the above requirements are violated, what should we do? (1) Increase the sample size. (2) Re-organize the categories, Pool some categories, or Cancel some categories. categories, or Cancel some categories.

35 2014.3.25 35 C  You should know  You should know: Chi-square test Chi-square test is a very important method of Statistical inference for enumeration data !


Download ppt "2014.3.25 1 Medical Statistics Medical Statistics Tao Yuchun Tao Yuchun 9"

Similar presentations


Ads by Google