SADC Course in Statistics Goodness-of-fit tests (and further issues) (Session 16)

Slides:



Advertisements
Similar presentations
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Advertisements

Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Session 9 Tests of Association in two-way tables.
1 Session 8 Tests of Hypotheses. 2 By the end of this session, you will be able to set up, conduct and interpret results from a test of hypothesis concerning.
SADC Course in Statistics Analysis of Variance for comparing means (Session 11)
SADC Course in Statistics Basic summaries for epidemiological studies (Session 04)
SADC Course in Statistics Common Non- Parametric Methods for Comparing Two Samples (Session 20)
SADC Course in Statistics Multiple Linear Regresion: Further issues and anova results (Session 07)
SADC Course in Statistics Estimating population characteristics with simple random sampling (Session 06)
SADC Course in Statistics Analysis of Variance with two factors (Session 13)
SADC Course in Statistics Multiple Linear Regression: Introduction (Session 06)
The Poisson distribution
SADC Course in Statistics Comparing several proportions (Session 15)
SADC Course in Statistics Further ideas concerning confidence intervals (Session 06)
SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)
SADC Course in Statistics Tests for Variances (Session 11)
Assumptions underlying regression analysis
SADC Course in Statistics The binomial distribution (Session 06)
SADC Course in Statistics Sampling weights: an appreciation (Sessions 19)
SADC Course in Statistics Inferences about the regression line (Session 03)
SADC Course in Statistics Importance of the normal distribution (Session 09)
SADC Course in Statistics Revision of key regression ideas (Session 10)
Correlation & the Coefficient of Determination
SADC Course in Statistics Confidence intervals using CAST (Session 07)
SADC Course in Statistics Session 4 & 5 Producing Good Tables.
SADC Course in Statistics Graphical summaries for quantitative data Module I3: Sessions 2 and 3.
SADC Course in Statistics Comparing two proportions (Session 14)
SADC Course in Statistics Linking tests to confidence intervals (and other issues) (Session 10)
SADC Course in Statistics Review and further practice (Session 10)
SADC Course in Statistics Revision using CAST (Session 04)
SADC Course in Statistics Introduction to Statistical Inference (Session 03)
SADC Course in Statistics (Session 09)
SADC Course in Statistics General approaches to sample size determinations (Session 12)
SADC Course in Statistics A model for comparing means (Session 12)
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
SADC Course in Statistics Revision on tests for proportions using CAST (Session 18)
Probability Distributions
Chapter 7 Sampling and Sampling Distributions
Elementary Statistics
The logic behind a statistical test. A statistical test is the comparison of the probabilities in favour of a hypothesis H 1 with the respective probabilities.
Contingency Tables Prepared by Yu-Fen Li.
Chapter 16 Goodness-of-Fit Tests and Contingency Tables
Chi-Square and Analysis of Variance (ANOVA)
Module 16: One-sample t-tests and Confidence Intervals
1 Random Sampling - Random Samples. 2 Why do we need Random Samples? Many business applications -We will have a random variable X such that the probability.
© The McGraw-Hill Companies, Inc., Chapter 12 Chi-Square.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chi-Square Tests Chapter 12.
Chapter 18: The Chi-Square Statistic
9. Two Functions of Two Random Variables
Chi Squared Tests. Introduction Two statistical techniques are presented. Both are used to analyze nominal data. –A goodness-of-fit test for a multinomial.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
SADC Course in Statistics Producing Good Tables In Excel Module B2 Sessions 4 & 5.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 15 The.
SADC Course in Statistics The normal distribution (Session 08)
Chapter 10 Hypothesis Testing
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Fundamentals of Hypothesis Testing: One-Sample Tests
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
SADC Course in Statistics Forecasting and Review (Sessions 04&05)
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Copyright © 2010 Pearson Education, Inc. Slide
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Dan Piett STAT West Virginia University Lecture 12.
Comparing Counts Chi Square Tests Independence.
Chapter 12 Chi-Square Tests and Nonparametric Tests
The Chi-Squared Test Learning outcomes
Presentation transcript:

SADC Course in Statistics Goodness-of-fit tests (and further issues) (Session 16)

To put your footer here go to View > Header and Footer 2 Learning Objectives By the end of this session, you will be able to conduct and interpret results from a chi- square test for testing the goodness-of-fit of data to a particular distribution understand how two-way contingency tables can be further examined to look at its residuals present results from a standard chi-square test, paying attention to the tables summary features

To put your footer here go to View > Header and Footer 3 Goodness-of-fit tests In previous sessions, we have seen that many tests are based on the assumption of normality On some occasions, it is also important to ascertain whether the data follow other distributions, e.g. the binomial or Poisson distributions We shall now look at how the chi-square test can be applied to examine the extent to which assumptions concerning the distribution of a given variable holds

To put your footer here go to View > Header and Footer 4 Goodness-of-fit tests The basic idea is first to calculate the probability of each possible value occurring e.g. the number of cows getting disease in a farm which has 6 cows, may be assumed to follow a binomial random variable. e.g. the number of visits made by a pregnant woman in a region to the regions single anti-natal clinic may be assumed to follow a Poisson distribution. Can we check these assumptions before subjecting the data to tests based on these?

To put your footer here go to View > Header and Footer 5 Goodness-of-fit test: Normal dist n Because the Normal distribution applies to a continuous random variable, it is necessary to group the data and obtain observed frequencies in each group. The next step is to determine the probability of an observation falling in each group, and hence the expected value. The chi-square test can then be applied in the usual way: the d.f. being number of groups – 1 – number of parameters estimated in computing expected values.

To put your footer here go to View > Header and Footer 6 An example: Normal dist n Consider the total rainfall in June at a particular site from 1928 to Suppose we wish to test the assumption that these data follow a normal distribution A histogram for the data appears below.

To put your footer here go to View > Header and Footer 7 An example: Normal dist n Expected values are now calculated for each group, assuming a normal distribution. The table shows observed and expected frequencies. The chi-square value is 3.6 with d.f.=5. P-value = Conclusions? RainTotalObservedExpected <= to to to to to to > Totals56

To put your footer here go to View > Header and Footer 8 An example: Binomial dist n First recall (from Module H1) the form of the probability density function for the binomial random variable with parameters n and p, where p is the probability of a success in a sequence of n trials, each trial having just 2 possible outcomes. The number of successes (X) in n trials has a binomial distribution. This formula gives the binomial probabilities, obtained also from Excels function Binomdist(x,n,p,false).

To put your footer here go to View > Header and Footer 9 An example: Binomial dist n Suppose we have a binomial variable with observed values as shown (n=7,p=0.222) Expected values can be derived using [P(X=k)]*404. The chi-square value is with d.f.=4 since p has been estimated from the data. p-value = kObservedExpected ,6, Totals404 What are your conclusions?

To put your footer here go to View > Header and Footer 10 Other issues There are two more issues to discuss concerning chi-square tests for testing the association between two categorical variables. These relate to further examination of the table of frequencies when a significant result is found; and how to present the results

To put your footer here go to View > Header and Footer 11 Example of Session 15 For data below, we found a significant chi-square value, with p=0.0024, i.e. evidence that the proportion of diseased animals are not the same for all vaccines. Vaccinediseasedhealthy Total A B C D E Total Question: But what contributes most to the chi- square statistic? i.e. departs most from Pr(diseased)=0.167?

To put your footer here go to View > Header and Footer 12 Cell contributions to chi-square: Vaccinediseasedhealthy A B C D E Table gives the chi-square contributions to each cell, i.e. values (O-E) 2 /E. Rule of thumb: Focus on cells with values4 and in larger tables, focus on those 9.

To put your footer here go to View > Header and Footer 13 Standardised residuals Vaccinediseasedhealthy A B C D E Better still, use standardised residuals so signs are also included, i.e. use SR=(O-E)/E. Rule of thumb: Focus on SR>|2|, or in larger tables, focus on those >|3|. Conclusion: Vaccine C gives most discrepancy from H 0.

To put your footer here go to View > Header and Footer 14 Presentation of results Vaccine% DISEASED C9.3% A15.4% D18.5% E19.7% B20.8% In this example, it would be appropriate to present a table of the percentage of animals diseased under each vaccine. Table sorted by the most useful vaccine would make the results easier to see. Note there are more advanced methods, e.g. modelling, to make specific comparisons between the above percentages

To put your footer here go to View > Header and Footer 15 Presentation: Example from Sess 14 Usually sleep under a mosquito net? Suffered malaria? YesNoTotal Yes % % % No % % % Total % % 7943 (100%) Recall results below from before. Test of association gave p=0.000.

To put your footer here go to View > Header and Footer 16 Presentation and conclusions Test results indicate that there is an association between use of a mosquito net and incidence of malaria. However the resulting incidences are unexpected. Note: malaria incidence for those using net = 62.5% for those not using a net is = 55.8%. This emphasises the danger of ignoring other factors that may affect malaria incidence, e.g. altitude, housing conditions, etc. Further, could it be that those who had malaria, then started using mosquito nets?

To put your footer here go to View > Header and Footer 17 Some final remarks Performing a chi-square analysis is simple, but it does not take account of other factors that may affect the results. More advanced (e.g. log-linear modelling) procedures do exist for exploring factors affecting a categorical response, here use of a bednet. Recall that the chi-square test is an approximation. This approximation is poor if the expected frequencies are very small (e.g. < 5). Try collapsing some rows or columns if this happens.

To put your footer here go to View > Header and Footer 18 Some practical work follows…