Goodness-of-Fit Tests Applications Ch07 Goodness-of-Fit Tests Applications
Objective of this chapter: CHAPTER CONTENTS CHAPTER CONTENTS 7.1 Introduction ..................................................................................................... 372 7.2 The Chi-Square Tests for Count Data ............................................................. 372 7.3 Goodness-of-Fit Tests to Identify the Probability Distribution .......................... 381 7.4 Applications: Parametric Analysis .................................................................... 392 7.5 Exercises .......................................................................................................... 402 7.6 Chapter Summary ............................................................................................ 406 7.7 Computer Examples ......................................................................................... 406 Projects for Chapter 7 ............................................................................................. 408 Objective of this chapter: To determine if a given set of data follows a particular probability distribution.
Karl Pearson (1857-1936) In 1893, Pearson coined the term “standard deviation.”
Phenomenon of interest 7.1 Introduction Phenomenon of interest the amount of carbon dioxide, CO2, in the atmosphere on a daily basis the sizes of cancerous breast tumors the monthly average rainfall in the State of Florida the average monthly unemployment rate in the United States the hourly wind forces of a hurricane Etc. What are the probabilistic behaviors of these phenomena? i.e., what are the probability density functions, pdf, that characterizes these phenomenon of interest?
Statistical tests (methods) for determining how good the data fits a particular probability distribution: Pearson’s chi-square test Kolmogorov-Smirnov test Anderson-Darling test Shapiro-Wilk test P-P plots Q-Q plots Nonparametric (probability distribution-free) analysis (Ch. 12)
7.2 The Chi-Square Tests for Count Data How likely is it that an observed probability distribution is due to chance? 2 Test
7.2.1 TESTING THE PARAMETERS OF A MULTINOMIAL DISTRIBUTION: GOODNESS-OF-FIT TEST The 2-goodness-of-fit test.
7.2.2 CONTINGENCY TABLE: TEST FOR INDEPENDENCE Objective: To test for dependencies between the rows and columns in a contingency table.
7.3 Goodness-of-Fit Tests to Identify the Probability Distribution
Null hypothesis: X ~ pdf => Expected values Ei Observation: (Xi) Oi
7.3.1 PEARSON’S CHI-SQUARE TEST Observed: Expected: O3 O4 O5 O6 O1 O2 3 e3 2 e2 1 e1 5 e5 6 e6 Prob. = 4 Freq. =e4 I1 I2 I3 I4 I5 I6 (Ik)
7.3.2 THE KOLMOGOROV-SMIRNOV TEST: (ONE POPULATION) H0 : The true probability distribution that follows the given data, F(x), is actually the assumed distribution F0(x) Ha : The actual cumulative distribution, F(x) is not F0(x), Test statistic: D = Max F0(x) – Fn(x) where F0(x): the hypothesized cdf Fn(x): the empirical distribution function Fn(x) = (Xi – X)/n Critical value: D (from the Kolmogorov-Smirnov tables)
7.3.3 THE ANDERSON-DARLING TEST H0 : The given data follow a specific probability distribution Ha : The given data do not follow the specified probability distribution: Test statistic: A ( A2 = n s ) Critical value: A (from the Anderson+Darling tables)
7.3.4 SHAPIRO-WILK NORMALITY TEST
7.3.5 THE P-P PLOTS AND Q-Q PLOTS To determine if a given random sample of data follows or is drawn from a well-known probability distribution
Th P-P plot compares the empirical cdfs (of the given data) with the assumed true cdfs
Q-Q Plot the quantiles of the empirical distribution of the given data versus the quantiles of the assumed true pdf that we are testing.
7.4 Applications: Parametric Analysis
7.5 Exercises
6.6 Chapter Summary
6.7 Computer Examples
Projects for Chapter 7