Presentation is loading. Please wait.

Presentation is loading. Please wait.

Professor Ke-Sheng Cheng

Similar presentations


Presentation on theme: "Professor Ke-Sheng Cheng"— Presentation transcript:

1 STATISTICS HYPOTHESES TEST (III) Nonparametric Goodness-of-fit (GOF) tests
Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

2 Description of nonparametric Problems
Until now, in the estimation and hypotheses testing problems, we have assumed that the available observations come from distributions for which the exact form is known, even though the values of some parameters are unknown. In other words, we have assumed that the observations come from a certain parametric family of distributions, and a statistical inference must be made about the values of the parameters defining that family. 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

3 In many situations, we do not assume that the available observations come from a particular family of distributions. Instead, we want to study inferences that can be made about the distribution from which the observations come, without making special assumptions about the form of that distribution. 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

4 For example, we might simply assume that observations form a random sample from a continuous distribution, without specifying the form of this distribution any further; and we then investigate the possibility that this distribution is a normal distribution. 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

5 Problems in which the possible distributions of the observations are not restricted to a specific parametric family are called nonparametric problems, and the statistical methods that are applicable in such problems are called nonparametric methods. 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

6 Goodness-of-fit test A very common statistical problem in hydrological frequency analysis or water resources planning is that whether the available observations (a random sample available to us) come from a particular type of distribution. For example, before we can estimate the magnitude of the 24-hour rainfall depth with 100-year return period, we must decide (identify) the type of probability distribution for the rainfall data (the annual maximum series) through statistical tests. 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

7 Let’s consider statistical problems based on data such that each observation can be classified as belonging to one of a finite number of possible categories. If a large population consists of data of k different categories, and let pi denote the probability that an observation will belong to category i (i = 1, 2, …, k). Of course, for i = 1, 2, …, k and 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

8 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

9 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

10 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

11 Therefore, it seems reasonable to base a test on the values of the differences
for i = 1, 2, …, k and reject Ho when the magnitudes of these differences are relatively large. 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

12 Chi-square GOF test 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

13 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

14 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

15 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

16 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

17 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

18 Number of categories Sample size
12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

19 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

20 Kolmogorov-Smirnov GOF test
The chi-square test compares the empirical histogram against the theoretical histogram. In contrast, the K-S test compares the empirical cumulative distribution function (ECDF) against the theoretical CDF. 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

21 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

22 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

23 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

24 In order to measure the difference between Fn(X) and F(X), ECDF statistics based on the vertical distances between Fn(X) and F(X) have been proposed. 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

25 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

26 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

27 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

28 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

29 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

30 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

31 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

32 Values of for the Kolmogorov-Smirnov test
12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

33 Goodness-of-fit tests using R
2 test for GOF test chisq.test The above test doesn’t account for any parameters in determining the expected values. The degree of freedom of the test statistic is k-1. Kolmogorov-Smirnov GOF test ks.test (one-sample test) 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University

34 ks.test(x, y, parameters, alternative=”…”)
where x is the data vector to be tested, y is a string vector specifying the hypothesized distribution, parameters are the values of distribution parameters corresponding to y, and alternative represents a string vector (“less”, “greater”, or “two.sided”) for one-tail or two-tail test. Examples ks.test(x, ”pnorm”, 30, 10, alternative=”two.sided”) ks.test(x, ”pexp”, 0.2, alternative=”greater”) 12/7/2019 Dept of Bioenvironmental Systems Engineering National Taiwan University


Download ppt "Professor Ke-Sheng Cheng"

Similar presentations


Ads by Google