Nonparametric test Nonparametric tests are decoupled from the distribution so the tested attribute may also be used in the case of arbitrary distribution,

Fundamentals of Data Analysis Lecture 5 Testing of statistical hypotheses pt.2

Nonparametric test Nonparametric tests are decoupled from the distribution so the tested attribute may also be used in the case of arbitrary distribution, not necessarily close to normal. Nonparametric tests can be divided into two groups: tests of goodness of fit, allowing to test the hypothesis that the population has a certain type of distribution, tests of the hypothesis that two samples come from one population (ie, that the two populations have the same distribution).

Chi-square test of goodness of fit
This is one of the oldest of statistical tests for confirming the hypothesis that the population has a certain distribution type (described in the form of a cumulative distribution function), and it may be either continuous or discrete distribution. The only limitation is that the sample must be large, containing at least tens of samples, because we have to share the results of some class values. These classes should not be too small a number to each of them should fall at least 8 results.

The algorithm is as follows: 1. The results are divided into r disjoint classes of size ni, the size of the sample is equal to: Thus, we have the empirical distribution. 2. We formulate the null hypothesis that the tested population has a distribution with distribution function belonging to some set of distributions with a specific type of distribution function;

3. From the hypothetical distribution calculate for each of the r classess of the investigated quantity values of probability pi , that the random variable will take the value belonging to a class number i (i = 1,2,...,r); 4. We calculate the theoretical sizes npi , for class i, if the population has assumed distribution;

5. From all the empirical ni and theoretical npi sizes we determine the value of chi-square statistics: which, assuming that the null hypothesis is true, has the chi-square distribution with r - 1 degrees of freedom or r - k - 1 degrees of freedom, where k is numeber of parameters estimated from the sample

6. From the tables of the chi-square distribution, for the selected level of confidence we must read the critical value to undergo a relation P( ) = 1 - a. 7. We compare two values, and if the inequality null hypothesis should be rejected. In opposite case, when there is no reason to reject the null hypothesis , however, does not mean that we can accept it.

In a physical experiment time of scintillation is measured. Number of mesurements n = 1000 and grouped set of results is as in Table. At the 99% confidence level to test the hypothesis that the time of occurrence of light effect which during these experiments was tested is normally distributed. From the content of the task does not arise hypothetical distribution parameters. Our null hypothesis will be: F(x)   where  is all normal distribution function class Example Two parameters of the distribution, the average value m and the standard deviation , we estimate from the sample using the estimators m = 0.67 i s = 0.30. Further results are in a Table, where F(ui) is the value of the normal distribution function N(0,1) at point ui = (xi -m) / s which is the standardized value of the right end of the range of the class. Degree of freedom k = = 4, since based on the random sample were calculated two parameters: the mean and standard deviation. From the tables from of the chi-square, with the level of significance 0.01, we find the critical value χ2 = 13,277. Critical value is less than the calculated statistics equal to 73,52, Thus the hypothesis of normality should be rejected.

During experiment n = 200 mesurements was conducted and grouped set of results is as in Table. At the 95% confidence level to test the hypothesis that the results of measurements are under uniform distribution. Exercise Mean of the class ni 45.25 23 45.75 19 46.25 25 46,75 18 47.25 17 47.75 24 48.25 16 48.75 22 49.25 20 49.75

Kolmogorov test of goodness of fit
In Kolmogorov λ test of goodness of fit, for veryfication of hypothesis that population has specified distribution. Not be processed, as in the chi-square test, the size of empirical series and compares with the size of hypothetical series, but during thata test the empirical distribution function is compared to the hypothetical one. In fact, when the population distribution is consistent with the hypothesis the value of empirical and hypothetical distribution should be similar in all examined points. The test starts with the analysis of the differences between the two distribution functions, the largest of which will be used then for the construction of lambda statistics whose distribution does not depend on the form of a hypothetical distribution. This distribution determines the critical value for this test. If the maximum difference at some point in the area of the characteristic variability is too high, the hypothesis that the distribution of the population has the cumulative distribution as we suspect, it should be rejected. The us this test is limited, however, because the distribution hypothetical must be continuous, in principle, we should also know the parameters of this distribution, but in the case of large samples can be estimated from the sample.

In Kolmogorov λ test of goodness of fit is suitable for veryfication of the hypothesis that population has specified distribution. During that test the empirical distribution function is compared to the hypothetical one, not as in the chi-square test, where the size of empirical series was culculated and compared with the size of hypothetical series. In fact, when the population distribution is consistent with the hypothesis ,the value of empirical and hypothetical distribution should be similar in all examined points.

The test starts with the analysis of the differences between the two distribution functions, the largest of which will be used then for the construction of lambda statistics whose distribution does not depend on the form of a hypothetical distribution. This distribution determines the critical value for this test. If the maximum difference at some point in the area of the characteristic variability is too high, the hypothesis that the distribution of the population has the cumulative distribution as we suspect should be rejected.

The usefulness of this test is limited, because the hypothetical distribution must be continuous, we should also know the parameters of this distribution, but in the case of large samples they can be estimated from the sample.

The procedure for the Kolmogorov test is as follows: 1 we sort the results in ascending or group them in a relatively narrow ranges, with right ends xi and corresponding sizes ni; 2. For each xi we calculate empirical cumulative distribution Fn(x) with the aid of equation:

3. from a hypothetical distribution for each xi we determine the theoretical value of the distribution function F(x); 4. For each xi we calculate the absolute value of the difference Fn(x)-F(x); 5. We calculate the parameter D = sup|Fn(x)-F(x)| and then the value of special statistics: which, when null hypothesis is true should have the Kolmogorov distribution.

6. for a certain level of confidence a we read from Kolmogorov distribution critical value satisfying condition P{l  lcr } = 1 - a. When l  lkr the null hypothesis should be rejected, otherwise there is no reason to reject the null hypothesis.

The sample of size n = 1000 were tested and the results are grouped into 10 narrow classes and included in the table. Our task is to extend the reasonable null hypothesis for the distribution and verify it on the level of confidence equal to 95%. Example Class Size

Example Size distribution is close to symmetric, the maximum is at one of the middle classes, which raises the hypothesis that the distribution of the tested attribute is a normal distribution N(m, s) If we were to take m=65, then inside the range <63.0; 67.0> there are 1000-(25+19) = 956 results, what gives 95.6% of all results. From properties of the normal distribution, we know that the probability of adopting the values within the range from u-1.96s to u+1.96s is equal to 95% (this mean that for 1000 samples probe 950 samples should be inside that range).

Example Thus, a reasonable hypothesis seems to be s = 1 Our null hypothesis is H0: N(65,1)

Example In the third column the distribution function values are placed which were calculated by:

Example In the fourth column we place the right ends of standardized classes (x - m)/s in the fifth column are the values of distribution function read from the tables of N(0, 1) distribution in the sixth column the absolute values of differences between the distribution functions are placed, the largest of which is d4 = 0,0280

Example then we calculate sqrt(n) dn = sqrt(1000) · 0,0280 = 0,886 For a confidence level 0,95 we read from the Kolmogorov distribution tables a critical value lcr = 1,354 in the sixth column the absolute values of differences between the distribution functions are placed, the largest of which is d4 = 0,0280

Example Critical value is greater than the calculated value, so the test results do not contradict the null hypothesis that the distribution of the general population is normal N(65, 1)

Exercise The capacity of 40 capacitors was measured (in pF): 55.1; 67.3; 54.6; 52.2; 58.4; 50.4; 70.1; 55.3; 57.6; 62.5; 68.4; 54.5; 56.7; 53.5; 61.6; 59.6; 49.0; 63.7; 58.1; 56.7; 57.8; 63.6; 69.2; 60.8; 62.9; 54.3; 61.0; 58.2; 64.3; 57.4; 39.3; 59.0; 60.1; 60.7; 59.9; 70.5; 57.2; 61.8; 46.0;55.6 Using the Kolmogorov test ( with 95% level of significance) find the distribution to which capacitance of the capacitor come under.

Wilcoxon test For the two populations, in which the interesting feature is a random variable of the continuous type we can test the hypothesis that, in both populations, this feature has the same cumulative distribution function. For this purpose, from two populations we grab random samples: the first population of size k {x1, x2, ...,xk} and the second population of size l {y1, y2, ..., yl}

Wilcoxon test To be able to perform the Wilcoxon test each sample must have at least four elements, and both samples in total must be at least 20 items. The algorithm is as follows: 1. The results in both samples are sorted ascending; 2. Calculate the number of inversions U in ordered samples. A pair of elements (xi, yj) creates an inversion, where yj <xi for any i, j

Wilcoxon test 3. Statistics U has normal distribution
so we have to calculate the corresponding parameter of standard normal distribution;

Wilcoxon test 4. From the tables of the standard normal distribution, for the selected level of confidence we must read the critical value to undergo a relation P(|X|>uα ) = 1 - a. 5. We compare two values, and if the inequality null hypothesis should be rejected. In opposite case, when there is no reason to reject the null hypothesis , however, does not mean that we can accept it.

Wilcoxon test Exercise
Measurements of active power for transformers which supply two towns were taken. Following results were obtained: C1 – 82.4, 90.7, 95.6, 98.9, 103.8, 105.5, 115.4, 123.8, 129.4, 130.8, 132.2, 137.9; C2 – 70.1, 103.6, 107.7, 110.1, 113.4, 115.0, 115.8, On the level of significance 95% test the hypothesis that the test indicator has the same distribution in both cities.

To be continued … !

Nonparametric test Nonparametric tests are decoupled from the distribution so the tested attribute may also be used in the case of arbitrary distribution,

Similar presentations

Presentation on theme: "Nonparametric test Nonparametric tests are decoupled from the distribution so the tested attribute may also be used in the case of arbitrary distribution,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nonparametric test Nonparametric tests are decoupled from the distribution so the tested attribute may also be used in the case of arbitrary distribution,

Similar presentations

Presentation on theme: "Nonparametric test Nonparametric tests are decoupled from the distribution so the tested attribute may also be used in the case of arbitrary distribution,"— Presentation transcript:

Similar presentations

About project

Feedback