M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing.

M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing

Outline Background -concepts Goodness-of-fit (GoF) Chi2 tests for GoF The presentation follows the pyramid schema

Background Descriptive vs. inferential statistics – Descriptive : data used only for descriptive purposes (use tables, graphs, measures of variability etc.) – Inferential : data used for drawing inferences, make predictions etc. Sample vs. population – A sample is drawn from a population, assumed to have some characteristics. – The sample is often used to make inferences about the population (inferential statistics) : Hypothesis testing Estimation of population parameters

Background Statistic vs. parameter – A statistic is related (estimated from) a sample. It can be used for both descriptive and inferential purposes – A parameter refers to the whole population. A sample statistic is often used to infer a population parameter Example : the sample mean may be used to infer the population mean (expected value) Hypothesis testing – A procedure where sample data are used to evaluate a hypothesis regarding the population – A hypothesis may refer to several things : properties of a single population, relation between two populations etc. – Two statistical hypotheses are defined: a null H 0 and an alternative H 1 H 0 is the often a statement of no effect or no difference. It is the hypothesis the researcher seeks to reject

Background Inferential statistical test – Hypothesis testing is carried out via an inferential statistic test : Sample data are manipulated to yield a test statistic The obtained value of the test statistic is evaluated with respect to a sampling distribution, i.e., a theoretical probability distribution for the possible values of the test statistic The theoretical values of the statistic are usually tabulated and let someone assess the statistical significance of the result of his statistical test The goodness-of-fit is a type of hypothesis testing – devise inferential statistical tests, apply them to the sample, infer the matching of a theoretical distribution to the population distribution

GoF as hypothesis testing Hypothesis H 0 : – The sample is derived from a theoretical distribution F() The sample data are manipulated to derive a test statistic – In the case of the chi2 statistic this includes aggregation of data into bins and some computations The statistic, as computed from data, is checked against the sampling distribution – For the chi2 test, the sampling distribution is the chi2 distribution, hence the name

Goodness-of-fit Statistical tests and statistics : the big picture Chi2 type tests EDF-based tests Specialized tests Classical chi2 statistics Generalized chi2 statistics Pearson chi2 statistic Modified chi2 statistic Log-likelihood ratio statistic e.g., KS test, Anderson-Darling test e.g., Shapiro-Wilk test for normality

Pearson chi2 statistic M : number of bins O i (N i ): observed frequency in bin i n : sample size E i (np i ) : expected frequency in bin i according to the theoretical distribution F() If X 1, X 2, X 3 …X n, the random sample and F() the theoretical distribution under test, the Pearson chi2 statistic is computed as:

Interpretation of chi2 statistic Theory says that the Pearson chi2 statistic follows a chi2 distribution, whose df are – M-1, when the parameters of the fitted distribution are given a priori (case 0 test) – Somewhere between M-1 and M-1-q, when the q parameters of the distribution are estimated by the sample data – Usually, the df for this case are taken to be M-1-q Having estimated the value of the chi2 statistic X 2, I check the chi2 distribution with M-1 (M-1-q) df to find – What is the probability to get a value equal to or greater than the computed value X 2, called p-value – If p > a, where a is the significance level of my test, the hypothesis is rejected, otherwise it is retained – Standard values for a are 0.1, 0.05, 0.01 – the higher a is the more conservative I am in rejecting the hypothesis H 0

Example A die is rolled 120 times 1 comes 20 times, 2 comes 14, 3 comes 18, 4 comes 17, 5 comes 22 and 6 comes 29 times The question is: “Is the die biased?” –or better: “Do these data suggest that the die is biased?” Hypothesis H 0 : the die is not biased – Therefore, according to the null hypothesis these numbers should be distributed uniformly – F() : the discrete uniform distribution

Example – cont. Interpretation – The distribution of the test statistic has 5 df – The probability to get a value smaller or equal than 6.7 under a chi2 distribution with 5 df (p-value) is 0.75, which is < 1-a for all a in {0.01..0.1}. – Therefore the hypothesis that the die is not biased cannot be rejected Computations:

Interpretation of Pearson chi2 Graphical illustration z 6.711.0715.099.24 P-value :0.250.10.05 0.01 10% of the area under the curve At 10% significance level, I would reject the hypothesis if the computed X 2 >9.24)

Properties of Pearson chi2 statistic It can be estimated for both discrete and continuous variables – Holds for all chi2 statistics. Max flexibility but fails to make use of all available information for continuous variables It is maybe the simplest one from computational point of view As with all chi2 statistics, one needs to define number and borders of bins – These are generally a function of sample size and the theoretical distribution under test

Bin selection How many and which? – Different opinions in literature, no rigid proof of optimality There seems to be convergence on the following aspects – Probability of bins The bins should be chosen equiprobable with respect to the theoretical distribution under test – Minimum expected frequencies np i : (Cramer, 46) : np i > 10, for all bins (Cochran, 54) : np i > 1 for all bins, np i >= 5 for 80% of bins (Roscoe and Byars,71)

Bin selection Relevance of bins M to sample size N – (Mann and Wald, 42), (Schorr, 74) : for large sample sizes 1.88n 2/5 < M < 3.76n 2/5 – (Koehler and Larntz,80) : for small sample size M>=3, n>=10 and n 2 /M>=10 – (Roscoe and Byars, 71) Equi-probable bins hypothesis : N > M when a = 0.01 and a = 0.05 Non-equiprobable bins : N>2M (a = 0.05) and N>4M (a=0.01)

Bin selection Bins vs. sample size according to Mann and Ward

Bin selection : cont. vs. discrete 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Bin i Equi-probable bins easy to select 1234567 1.0 Less straightforward to define equi-probable bins

References D.J. Sheskin, Handbook of parametric and nonparametric statistical procedures – Introduction (descriptive vs. inferential statistics, hypothesis testing, concepts and terminology) – Test 8 (chap. 8) – The Chi-Square Goodness-of-Fit Test (high-level description with examples and discussion on several aspects) R. Agostino, M. Stephens, Goodness-of-fit techniques – Chapter 3 – Tests of Chi-square type Reviews the theoretical background and looks more generally at chi2 tests, not only the Pearson test. Textbooks

References S. Horn, Goodness-of-Fit tests for discrete data: A review and an Application to a Health Impairment scale – Good discussion of the properties and pros/cons of most goodness- of-fit tests for discrete data – accessible, tutorial-like Papers

M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing.

Similar presentations

Presentation on theme: "M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing.

Similar presentations

Presentation on theme: "M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing."— Presentation transcript:

Similar presentations

About project

Feedback