Presentation is loading. Please wait.

Presentation is loading. Please wait.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.1 Lecture 4: Fitting distributions: goodness of fit l Goodness of fit.

Similar presentations


Presentation on theme: "Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.1 Lecture 4: Fitting distributions: goodness of fit l Goodness of fit."— Presentation transcript:

1 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.1 Lecture 4: Fitting distributions: goodness of fit l Goodness of fit l Testing goodness of fit l Testing normality l An important note on testing normality! l Goodness of fit l Testing goodness of fit l Testing normality l An important note on testing normality!

2 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.2 Goodness of fit l measures the extent to which some empirical distribution “fits” the distribution expected under the null hypothesis 2030405060 Fork length 0 10 20 30 Frequency Observed Expected

3 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.3 Goodness of fit: the underlying principle l If the match between observed and expected is poorer than would be expected on the basis of measurement precision, then we should reject the null hypothesis. Fork length Observed Expected 0 20 30 Frequency 2030405060 0 10 20 30 Reject H 0 Accept H 0

4 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.4 Testing goodness of fit : the Chi- square statistic (    l Used for frequency data, i.e. the number of observations/results in each of n categories compared to the number expected under the null hypothesis. Frequency Category/class Observed Expected

5 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.5 How to translate  2 into p? Compare to the  2 distribution with n - 1 degrees of freedom. If p is less than the desired  level, reject the null hypothesis. Compare to the  2 distribution with n - 1 degrees of freedom. If p is less than the desired  level, reject the null hypothesis. 05101520  2 (df = 5) 0 0.2 0.3 Probability  2 = 8.5, p = 0.31 accept p =  = 0.05

6 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.6 Testing goodness of fit: the log likelihood-ratio Chi-square statistic (G) Similar to  2, and usually gives similar results. l In some cases, G is more conservative (i.e. will give higher p values). Similar to  2, and usually gives similar results. l In some cases, G is more conservative (i.e. will give higher p values). Frequency Category/class Observed Expected

7 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.7  2 versus the distribution of  2 or G For both  2 and G, p values are calculated assuming a  2 distribution......but as n decreases, both deviate more and more from  2. For both  2 and G, p values are calculated assuming a  2 distribution......but as n decreases, both deviate more and more from  2. 05101520  2 /  2 /G (df = 5) 0 0.2 0.3 Probability  2 /G, very small n  2 /G, small n 

8 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.8 Assumptions (  2 and G) l n is larger than 30. l Expected frequencies are all larger than 5. l Test is quite robust except when there are only 2 categories (df = 1). For 2 categories, both X 2 and G overestimate  2, leading to rejection of null hypothesis with probability greater than  i.e. the test is liberal. l n is larger than 30. l Expected frequencies are all larger than 5. l Test is quite robust except when there are only 2 categories (df = 1). For 2 categories, both X 2 and G overestimate  2, leading to rejection of null hypothesis with probability greater than  i.e. the test is liberal.

9 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.9 What if n is too small, there are only 2 categories, etc.? l Collect more data, thereby increasing n. l If n > 2, combine categories. l Use a correction factor. l Use another test. l Collect more data, thereby increasing n. l If n > 2, combine categories. l Use a correction factor. l Use another test. More data Classes combined

10 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.10 Corrections for 2 categories For 2 categories, both X 2 and G overestimate  2, leading to rejection of null hypothesis with probability greater than  i.e. test is liberal  l Continuity correction: add 0.5 to observed frequencies. Williams’ correction: divide test statistic (G or  2 ) by: For 2 categories, both X 2 and G overestimate  2, leading to rejection of null hypothesis with probability greater than  i.e. test is liberal  l Continuity correction: add 0.5 to observed frequencies. Williams’ correction: divide test statistic (G or  2 ) by:

11 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.11 The binomial test l Used when there are 2 categories. l No assumptions l Calculate exact probability of obtaining N - k individuals in category 1 and k individuals in category 2, with k = 0, 1, 2,... N. l Used when there are 2 categories. l No assumptions l Calculate exact probability of obtaining N - k individuals in category 1 and k individuals in category 2, with k = 0, 1, 2,... N. Number of observations 012345678910 Probability Binominal distribution, p = 0.5, N = 10

12 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.12 An example: sex ratio of beavers l H 0 : sex-ratio is 1:1, so p = 0.5 = q l p(0 males, females) =.00195 l p(1 male/female, 9 male/female) =.0195 l p(9 or more individuals of same sex) =.0215, or 2.15%. l therefore, reject H 0 l H 0 : sex-ratio is 1:1, so p = 0.5 = q l p(0 males, females) =.00195 l p(1 male/female, 9 male/female) =.0195 l p(9 or more individuals of same sex) =.0215, or 2.15%. l therefore, reject H 0

13 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.13 Multinomial test l Simple extension of binomial test for more than 2 categories l Must specify 2 probabilities, p and q, for null hypothesis, p + q + r = 1.0. l No assumptions......but so tedious that in practice  2 is used. l Simple extension of binomial test for more than 2 categories l Must specify 2 probabilities, p and q, for null hypothesis, p + q + r = 1.0. l No assumptions......but so tedious that in practice  2 is used.

14 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.14 Multinomial test: segregation ratios l Hypothesis: both parents Aa, therefore segregation ratio is 1 AA: 2 Aa: 1 aa. l So under H 0, p =.25, q =.50, r =.25 l For N = 60, p <.001 l Therefore, reject H 0. l Hypothesis: both parents Aa, therefore segregation ratio is 1 AA: 2 Aa: 1 aa. l So under H 0, p =.25, q =.50, r =.25 l For N = 60, p <.001 l Therefore, reject H 0.

15 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.15 Goodness of fit: testing normality l Since normality is an assumption of all parametric statistical tests, testing for normality is often required. Tests for normality include  2 or G, Kolmogorov-Smirnov, Wilks-Shapiro & Lilliefors. l Since normality is an assumption of all parametric statistical tests, testing for normality is often required. Tests for normality include  2 or G, Kolmogorov-Smirnov, Wilks-Shapiro & Lilliefors. Frequency Category/class Observed Expected under hypothesis of normal distribution

16 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.16 Cumulative distributions l Areas under the normal probability density function and the cumulative normal distribution function 0.2 0.4 0.6 0.8 1.0 0  2.28% 50.00% 68.27% F Normal probability density function Cumulative normal density function

17 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.17  2 or G test for normality l Put data in classes (histogram) and compute expected frequencies based on discrete normal distribution. Calculate  2. l Requires large samples (k min = 10) and is not powerful because of loss of information. l Put data in classes (histogram) and compute expected frequencies based on discrete normal distribution. Calculate  2. l Requires large samples (k min = 10) and is not powerful because of loss of information. Observed Expected under hypothesis of normal distribution Frequency Category/class

18 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.18 “Non-statistical” assessments of normality l Do normal probability plot of normal equivalent deviates (NEDs) versus X. l If line appears more or less straight, then data are approximately normally distributed. l Do normal probability plot of normal equivalent deviates (NEDs) versus X. l If line appears more or less straight, then data are approximately normally distributed. NEDs X Normal Non-normal

19 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.19 Komolgorov-Smirnov goodness of fit l Compares observed cumulative distribution to expected cumulative distribution under the null hypothesis. l p is based on D max, absolute difference, between observed and expected cumulative relative frequencies. l Compares observed cumulative distribution to expected cumulative distribution under the null hypothesis. l p is based on D max, absolute difference, between observed and expected cumulative relative frequencies. D max X 0.2 0.4 0.6 0.8 1.0 Cumulative frequency

20 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.20 An example: wing length in flies l 10 flies with wing lengths: 4, 4.5, 4.9, 5.0, 5.1, 5.3, 5.5, 5.6, 5.7, 5.8, 5.9, 6.0 l cumulative relative frequencies:.1,.2,.3,.4,.5,.6,.7,.8,.9, 1.0 l 10 flies with wing lengths: 4, 4.5, 4.9, 5.0, 5.1, 5.3, 5.5, 5.6, 5.7, 5.8, 5.9, 6.0 l cumulative relative frequencies:.1,.2,.3,.4,.5,.6,.7,.8,.9, 1.0 Wing length 0.2 0.4 0.6 0.8 1.0 Cumulative frequency D max 4.04.55.05.56.0

21 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.21 Lilliefors test l KS test is conservative for tests in which the expected distribution is based on sample statistics. l Liliiefors corrects for this to produce a more reliable test. l Should be used when null hypothesis is intrinsic versus extrinsic. l KS test is conservative for tests in which the expected distribution is based on sample statistics. l Liliiefors corrects for this to produce a more reliable test. l Should be used when null hypothesis is intrinsic versus extrinsic.

22 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.22 An important note on testing normality! l When N is small, most tests have low power. l Hence, very large deviations are required in order to reject the null. l When N is large, power is high. l Hence, very small deviations from normality will be sufficient to reject the null. l So, exercise common sense! l When N is small, most tests have low power. l Hence, very large deviations are required in order to reject the null. l When N is large, power is high. l Hence, very small deviations from normality will be sufficient to reject the null. l So, exercise common sense!


Download ppt "Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.1 Lecture 4: Fitting distributions: goodness of fit l Goodness of fit."

Similar presentations


Ads by Google