Nonparametric test Nonparametric tests are decoupled from the distribution so the tested attribute may also be used in the case of arbitrary distribution,

Slides:



Advertisements
Similar presentations
Statistical Techniques I
Advertisements

Biomedical Statistics Testing for Normality and Symmetry Teacher:Jang-Zern Tsai ( 蔡章仁 ) Student: 邱瑋國.
CHI-SQUARE(X2) DISTRIBUTION
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Parametric/Nonparametric Tests. Chi-Square Test It is a technique through the use of which it is possible for all researchers to:  test the goodness.
1 Hypothesis testing. 2 A common aim in many studies is to check whether the data agree with certain predictions. These predictions are hypotheses about.
Fundamentals of Data Analysis Lecture 6 Testing of statistical hypotheses pt.3.
Point estimation, interval estimation
Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications
Chapter 8 Introduction to Hypothesis Testing
5-3 Inference on the Means of Two Populations, Variances Unknown
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Nonparametric or Distribution-free Tests
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Copyright © 2012 by Nelson Education Limited. Chapter 8 Hypothesis Testing II: The Two-Sample Case 8-1.
Statistical Analysis Statistical Analysis
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
NONPARAMETRIC STATISTICS
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Fundamentals of Data Analysis Lecture 10 Management of data sets and improving the precision of measurement pt. 2.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Question paper 1997.
GG 313 Lecture 9 Nonparametric Tests 9/22/05. If we cannot assume that our data are at least approximately normally distributed - because there are a.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Descriptive and Inferential Statistics
Virtual University of Pakistan
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 12 Chi-Square Tests and Nonparametric Tests
ESTIMATION.
Chapter 12 Analysis of count data.
3. The X and Y samples are independent of one another.
Chapter 4. Inference about Process Quality
Chapter 9: Inferences Involving One Population
Test for Goodness of Fit
Goodness of Fit x² -Test
Chapter 2 Simple Comparative Experiments
Sampling and Sampling Distributions
Chapter 11 Goodness-of-Fit and Contingency Tables
Chapter 9 Hypothesis Testing.
Hypothesis Theory examples.
Analysis of count data 1.
CONCEPTS OF ESTIMATION
Discrete Event Simulation - 4
Discrete Event Simulation - 5
Chapter 11: Inference for Distributions of Categorical Data
Goodness-of-Fit Tests Applications
Interval Estimation and Hypothesis Testing
One-Way Analysis of Variance
Hypothesis Tests for Two Population Standard Deviations
Virtual University of Pakistan
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
The Rank-Sum Test Section 15.2.
CHAPTER 11 CHI-SQUARE TESTS
Chapter 8 Hypothesis Tests
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and
Last Update 12th May 2011 SESSION 41 & 42 Hypothesis Testing.
Skills 5. Skills 5 Standard deviation What is it used for? This statistical test is used for measuring the degree of dispersion. It is another way.
Distribution-Free Procedures
Testing a Claim About a Standard Deviation or Variance
Professor Ke-Sheng Cheng
Presentation transcript:

Fundamentals of Data Analysis Lecture 5 Testing of statistical hypotheses pt.2

Nonparametric test Nonparametric tests are decoupled from the distribution so the tested attribute may also be used in the case of arbitrary distribution, not necessarily close to normal. Nonparametric tests can be divided into two groups: tests of goodness of fit, allowing to test the hypothesis that the population has a certain type of distribution, tests of the hypothesis that two samples come from one population (ie, that the two populations have the same distribution).

Chi-square test of goodness of fit This is one of the oldest of statistical tests for confirming the hypothesis that the population has a certain distribution type (described in the form of a cumulative distribution function), and it may be either continuous or discrete distribution. The only limitation is that the sample must be large, containing at least tens of samples, because we have to share the results of some class values. These classes should not be too small a number to each of them should fall at least 8 results.

Chi-square test of goodness of fit The algorithm is as follows: 1. The results are divided into r disjoint classes of size ni, the size of the sample is equal to: Thus, we have the empirical distribution. 2. We formulate the null hypothesis that the tested population has a distribution with distribution function belonging to some set of distributions with a specific type of distribution function;

Chi-square test of goodness of fit 3. From the hypothetical distribution calculate for each of the r classess of the investigated quantity values of probability pi , that the random variable will take the value belonging to a class number i (i = 1,2,...,r); 4. We calculate the theoretical sizes npi , for class i, if the population has assumed distribution;

Chi-square test of goodness of fit 5. From all the empirical ni and theoretical npi sizes we determine the value of chi-square statistics: which, assuming that the null hypothesis is true, has the chi-square distribution with r - 1 degrees of freedom or r - k - 1 degrees of freedom, where k is numeber of parameters estimated from the sample

Chi-square test of goodness of fit 6. From the tables of the chi-square distribution, for the selected level of confidence we must read the critical value to undergo a relation P( ) = 1 - a. 7. We compare two values​​, and if the inequality null hypothesis should be rejected. In opposite case, when there is no reason to reject the null hypothesis , however, does not mean that we can accept it.

Chi-square test of goodness of fit In a physical experiment time of scintillation is measured. Number of mesurements n = 1000 and grouped set of results is as in Table. At the 99% confidence level to test the hypothesis that the time of occurrence of light effect which during these experiments was tested is normally distributed. From the content of the task does not arise hypothetical distribution parameters. Our null hypothesis will be: F(x)   where  is all normal distribution function class Example Two parameters of the distribution, the average value m and the standard deviation , we estimate from the sample using the estimators m = 0.67 i s = 0.30. Further results are in a Table, where F(ui) is the value of the normal distribution function N(0,1) at point ui = (xi -m) / s which is the standardized value of the right end of the range of the class. Degree of freedom k = 7 - 2 - 1 = 4, since based on the random sample were calculated two parameters: the mean and standard deviation. From the tables from of the chi-square, with the level of significance 0.01, we find the critical value χ2 = 13,277. Critical value is less than the calculated statistics equal to 73,52, Thus the hypothesis of normality should be rejected.

Chi-square test of goodness of fit During experiment n = 200 mesurements was conducted and grouped set of results is as in Table. At the 95% confidence level to test the hypothesis that the results of measurements are under uniform distribution. Exercise Mean of the class ni 45.25 23 45.75 19 46.25 25 46,75 18 47.25 17 47.75 24 48.25 16 48.75 22 49.25 20 49.75

Kolmogorov test of goodness of fit In Kolmogorov λ test of goodness of fit, for veryfication of hypothesis that population has specified distribution. Not be processed, as in the chi-square test, the size of empirical series and compares with the size of hypothetical series, but during thata test the empirical distribution function is compared to the hypothetical one. In fact, when the population distribution is consistent with the hypothesis the value of empirical and hypothetical distribution should be similar in all examined points. The test starts with the analysis of the differences between the two distribution functions, the largest of which will be used then for the construction of lambda statistics whose distribution does not depend on the form of a hypothetical distribution. This distribution determines the critical value for this test. If the maximum difference at some point in the area of the characteristic variability is too high, the hypothesis that the distribution of the population has the cumulative distribution as we suspect, it should be rejected. The us this test is limited, however, because the distribution hypothetical must be continuous, in principle, we should also know the parameters of this distribution, but in the case of large samples can be estimated from the sample.

Kolmogorov test of goodness of fit In Kolmogorov λ test of goodness of fit is suitable for veryfication of the hypothesis that population has specified distribution. During that test the empirical distribution function is compared to the hypothetical one, not as in the chi-square test, where the size of empirical series was culculated and compared with the size of hypothetical series. In fact, when the population distribution is consistent with the hypothesis ,the value of empirical and hypothetical distribution should be similar in all examined points.

Kolmogorov test of goodness of fit The test starts with the analysis of the differences between the two distribution functions, the largest of which will be used then for the construction of lambda statistics whose distribution does not depend on the form of a hypothetical distribution. This distribution determines the critical value for this test. If the maximum difference at some point in the area of the characteristic variability is too high, the hypothesis that the distribution of the population has the cumulative distribution as we suspect should be rejected.

Kolmogorov test of goodness of fit The usefulness of this test is limited, because the hypothetical distribution must be continuous, we should also know the parameters of this distribution, but in the case of large samples they can be estimated from the sample.

Kolmogorov test of goodness of fit The procedure for the Kolmogorov test is as follows: 1 we sort the results in ascending or group them in a relatively narrow ranges, with right ends xi and corresponding sizes ni; 2. For each xi we calculate empirical cumulative distribution Fn(x) with the aid of equation:

Kolmogorov test of goodness of fit 3. from a hypothetical distribution for each xi we determine the theoretical value of the distribution function F(x); 4. For each xi we calculate the absolute value of the difference Fn(x)-F(x); 5. We calculate the parameter D = sup|Fn(x)-F(x)| and then the value of special statistics: which, when null hypothesis is true should have the Kolmogorov distribution.

Kolmogorov test of goodness of fit 6. for a certain level of confidence a we read from Kolmogorov distribution critical value satisfying condition P{l  lcr } = 1 - a. When l  lkr the null hypothesis should be rejected, otherwise there is no reason to reject the null hypothesis.

Kolmogorov test of goodness of fit The sample of size n = 1000 were tested and the results are grouped into 10 narrow classes and included in the table. Our task is to extend the reasonable null hypothesis for the distribution and verify it on the level of confidence equal to 95%. Example Class Size

Kolmogorov test of goodness of fit Example Size distribution is close to symmetric, the maximum is at one of the middle classes, which raises the hypothesis that the distribution of the tested attribute is a normal distribution N(m, s) If we were to take m=65, then inside the range <63.0; 67.0> there are 1000-(25+19) = 956 results, what gives 95.6% of all results. From properties of the normal distribution, we know that the probability of adopting the values within the range from u-1.96s to u+1.96s is equal to 95% (this mean that for 1000 samples probe 950 samples should be inside that range).

Kolmogorov test of goodness of fit Example Thus, a reasonable hypothesis seems to be s = 1 Our null hypothesis is H0: N(65,1)

Kolmogorov test of goodness of fit Example In the third column the distribution function values ​​are placed which were calculated by:

Kolmogorov test of goodness of fit Example In the fourth column we place the right ends of standardized classes (x - m)/s in the fifth column are the values of distribution function read from the tables of N(0, 1) distribution in the sixth column the absolute values ​​of differences between the distribution functions are placed, the largest of which is d4 = 0,0280

Kolmogorov test of goodness of fit Example then we calculate sqrt(n) dn = sqrt(1000) · 0,0280 = 0,886 For a confidence level 0,95 we read from the Kolmogorov distribution tables a critical value lcr = 1,354 in the sixth column the absolute values ​​of differences between the distribution functions are placed, the largest of which is d4 = 0,0280

Kolmogorov test of goodness of fit Example Critical value is greater than the calculated value, so the test results do not contradict the null hypothesis that the distribution of the general population is normal N(65, 1)

Kolmogorov test of goodness of fit Exercise The capacity of 40 capacitors was measured (in pF): 55.1; 67.3; 54.6; 52.2; 58.4; 50.4; 70.1; 55.3; 57.6; 62.5; 68.4; 54.5; 56.7; 53.5; 61.6; 59.6; 49.0; 63.7; 58.1; 56.7; 57.8; 63.6; 69.2; 60.8; 62.9; 54.3; 61.0; 58.2; 64.3; 57.4; 39.3; 59.0; 60.1; 60.7; 59.9; 70.5; 57.2; 61.8; 46.0;55.6 Using the Kolmogorov test ( with 95% level of significance) find the distribution to which capacitance of the capacitor come under.

Wilcoxon test For the two populations, in which the interesting feature is a random variable of the continuous type we can test the hypothesis that, in both populations, this feature has the same cumulative distribution function. For this purpose, from two populations we grab random samples: the first population of size k {x1, x2, ...,xk} and the second population of size l {y1, y2, ..., yl}

Wilcoxon test To be able to perform the Wilcoxon test each sample must have at least four elements, and both samples in total must be at least 20 items. The algorithm is as follows: 1. The results in both samples are sorted ascending; 2. Calculate the number of inversions U in ordered samples. A pair of elements (xi, yj) creates an inversion, where yj <xi for any i, j

Wilcoxon test 3. Statistics U has normal distribution so we have to calculate the corresponding parameter of standard normal distribution;

Wilcoxon test 4. From the tables of the standard normal distribution, for the selected level of confidence we must read the critical value to undergo a relation P(|X|>uα ) = 1 - a. 5. We compare two values​​, and if the inequality null hypothesis should be rejected. In opposite case, when there is no reason to reject the null hypothesis , however, does not mean that we can accept it.

Wilcoxon test Exercise Measurements of active power for transformers which supply two towns were taken. Following results were obtained: C1 – 82.4, 90.7, 95.6, 98.9, 103.8, 105.5, 115.4, 123.8, 129.4, 130.8, 132.2, 137.9; C2 – 70.1, 103.6, 107.7, 110.1, 113.4, 115.0, 115.8, 122.4. On the level of significance 95% test the hypothesis that the test indicator has the same distribution in both cities.

To be continued … !