Goodness-of-Fit Tests

Slides:



Advertisements
Similar presentations
Biomedical Statistics Testing for Normality and Symmetry Teacher:Jang-Zern Tsai ( 蔡章仁 ) Student: 邱瑋國.
Advertisements

1 Chi-Square Test -- X 2 Test of Goodness of Fit.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
DISTRIBUTION FITTING.
Chapter 14 Analysis of Categorical Data
BCOR 1020 Business Statistics
Nonparametrics and goodness of fit Petter Mostad
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Copyright © 2010 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Slide 6-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Learning Simio Chapter 10 Analyzing Input Data
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 6 Putting Statistics to Work.
Copyright © 2005 Pearson Education, Inc. Slide 6-1.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Goodness-of-fit (GOF) Tests Testing for the distribution of the underlying population H 0 : The sample data came from a specified distribution. H 1 : The.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 6- 1.
Test of Goodness of Fit Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007.
CHAPTER 12 More About Regression
Process Control Charts
Step 1: Specify a null hypothesis
Sample Size Copyright  2008 by The McGraw-Hill Companies. This material is intended for educational purposes by licensed users of LearningStats. It.
Confidence Interval Estimation
CHAPTER 12 More About Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
One-Sample Hypothesis Tests
Two-Sample Hypothesis Testing
Two Factor ANOVA Copyright (c) 2008 by The McGraw-Hill Companies. This material is intended solely for educational purposes by licensed users of LearningStats.
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
CHAPTER 11 CHI-SQUARE TESTS
Confidence Intervals Copyright  2008 by The McGraw-Hill Companies. This material is intended for educational purposes by licensed users of LearningStats.
Descriptive Statistics: Numerical Methods
Prepared by Lloyd R. Jaisingh
Test for Goodness of Fit
PCB 3043L - General Ecology Data Analysis.
Evaluating Univariate Normality
Chapter 12 Tests with Qualitative Data
CHAPTER 12 More About Regression
6 Normal Curves and Sampling Distributions
Normality or not? Different distributions and their importance
Goodness of Fit Tests The goal of goodness of fit tests is to test if the data comes from a certain distribution. There are various situations to which.
Goodness of Fit Tests The goal of χ2 goodness of fit tests is to test is the data comes from a certain distribution. There are various situations to which.
The Practice of Statistics in the Life Sciences Fourth Edition
Elementary Statistics
Chapter 11: Inference for Distributions of Categorical Data
Modelling Input Data Chapter5.
Lecture 36 Section 14.1 – 14.3 Mon, Nov 27, 2006
Introduction Previous lessons have demonstrated that the normal distribution provides a useful model for many situations in business and industry, as.
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
Warmup Normal Distributions.
Analyzing the Association Between Categorical Variables
CHAPTER 12 More About Regression
Describing Data: Displaying and Exploring Data
CHAPTER 12 More About Regression
Inferences for Regression
The Normal Distribution
15 Chi-Square Tests Chi-Square Test for Independence
Introductory Statistics
Lecture 43 Section 14.1 – 14.3 Mon, Nov 28, 2005
Professor Ke-Sheng Cheng
Presentation transcript:

Goodness-of-Fit Tests Copyright (c) 2008 by The McGraw-Hill Companies. This material is intended solely for educational purposes by licensed users of LearningStats. It may not be copied or resold for profit.

Hypotheses The null hypothesis in a GOF test is that the sample came from a specific population (e.g., normal, uniform, Poisson, binomial). Does the sample contradict the hypothesis?

Parameters The parameters of the proposed distribution (e.g., its mean) might be specified a priori (e.g., from a non-sample benchmark) but more commonly they must be estimated from the sample.

Normality Test The most common test is for normality. MINITAB and other computer packages offer various normality tests.

Checking Normality We can: Inspect the histogram. Check the Empirical Rule: Compare the sample mean and median (symmetry). Check the skewness and kurtosis statistics. Do a chi-square test for normality. Inspect the probability plot. Inspect the ECDF plot. Do ECDF-based tests (Anderson-Darling, Kolmogorov-Smirnov, etc).

Histogram We have a sample of Kentucky Derby winning times (in seconds) from 1950 through 1999. Are the data normally distributed? Histogram appears symmetric, though perhaps with too much concentration in the middle. Source David P. Doane, Kieran Mathieson, and Ronald L. Tracy, Visual Statistics 2.0 (Irwin/McGraw-Hill, 2001).

Descriptive Statistics We have a sample of Kentucky Derby winning times (in seconds) from 1950 through 1999. Are the data normally distributed? The mean and median (Quartile 2) are identical. For the fitted data, we calculate the likely range, and find that the sample range is slightly less than expected. Data are symmetric (skewness near 0) but somewhat leptokurtic (kurtosis above 3). Source David P. Doane, Kieran Mathieson, and Ronald L. Tracy, Visual Statistics 2.0 (Irwin/McGraw-Hill, 2001).

Chi-Square Test Test statistic where Hypotheses: H0: Winning Derby times are normal H1: Winning Derby times are not normal where fj = the observed frequency in group j ej = the expected frequency in group j if H0 is true For c classes we use d.f. = c – 1 – m where m is the number of parameters estimated to fit the distribution Example: m = 2 for a normal if m and s are estimated)

Chi-Square Test Hypotheses: H0: Winning Derby times are normal H1: Winning Derby times are not normal Since the p-value is 0.009, the chi-square test statistic is significant at a = 0.01, leading to rejection of the hypothesis of normality. Most of the problem is in the middle category, as we noted in the histogram. The normal end categories are open-ended. We prefer that all expected frequencies be at least 5. Source David P. Doane, Kieran Mathieson, and Ronald L. Tracy, Visual Statistics 2.0 (Irwin/McGraw-Hill, 2001).

ECDF Plot An Empirical Cumulative Distribution function plots the cumulative frequency against the sample values. Sometimes a fitted normal distribution is displayed. This plot is always done by a computer, not by hand. This ECDF plot follows approximately the S-shape that characterizes a normal distribution. This resemblance is enhanced by superimposing a fitted normal cumulative distribution Source David P. Doane, Kieran Mathieson, and Ronald L. Tracy, Visual Statistics 2.0 (Irwin/McGraw-Hill, 2001).

Kolmogorov-Smirnov Test The K-S test is based on the largest vertical distance from the fitted normal. Special tables are required if you want a p-value This test is always done by a computer, not by hand. It is less common than the chi-square or Anderson-Darling test. The K-S statistic (D = 0.138) isn't significant at a = 0.20, so there isn't much evidence against H0. However, the K-S is not the most powerful test. Hypotheses: H0: Winning Derby times are normal H1: Winning Derby times are not normal Source David P. Doane, Kieran Mathieson, and Ronald L. Tracy, Visual Statistics 2.0 (Irwin/McGraw-Hill, 2001).

Probability Plot If the data are normal, the normal probability plot should be a straight line. This plot is always done by a computer, not by hand. Overall, the probability plot is fairly linear, although there are a few points at the ends and in the middle that are somewhat off the 45o line. Source David P. Doane, Kieran Mathieson, and Ronald L. Tracy, Visual Statistics 2.0 (Irwin/McGraw-Hill, 2001).

Anderson-Darling Test This result is from Minitab. The A-D test is always done by a computer, not by hand. The A-D test is powerful, but lacks the intuitive interpretation of a chi-square test. Hypotheses: H0: Winning Derby times are normal H1: Winning Derby times are not normal The A-D test statistic (0.708) yield a p-value of 0.061, so we could reject H0. at a = 0.10 but not at a = 0.05. Copyright Notice Portions of MINITAB Statistical Software input and output contained in this document are printed with permission of Minitab, Inc. MINITABTM is a trademark of Minitab Inc. in the United States and other countries and is used herein with the owner's permission.

Other Tests You can test for non-normal distributions (uniform, Poisson, etc) but those tests require a computer package designed for that purpose (e.g., Visual Statistics or Palisade's BestFitTM)

Conclusion: Data appear uniform. Example: Uniform Test Hypotheses: H0: 3-digit Lottery winning numbers are uniform H1: 3-digit Lottery winning numbers are not uniform Conclusion: Data appear uniform. Source David P. Doane, Kieran Mathieson, and Ronald L. Tracy, Visual Statistics 2.0 (Irwin/McGraw-Hill, 2001).

Summary Test Method Comments Eyeball method Inspect the histogram and see if it looks like the hypothesized distribution. Imprecise but easy. May suffice to rule out the proposed distribution. Chi-square test Form k categories, count the X values in each category, and use the chi-square test to compare the actual count with the expected count if X follows the hypothesized distribution. Common. Must avoid small expected frequencies. Need special software (Excel, Minitab, Visual Statistics, BestFit). Kolmogorov-Smirnov test Find the greatest difference between the empirical cumulative distribution and the hypothesized distribution. Easy to visualize. Need specialized software (Visual Statistics, BestFit). Anderson-Darling test Compare the empirical cumulative data distribution with the hypothesized distribution. Widely used by researchers. Not intuitive. Need special software (Minitab, Visual Statistics, BestFit).