Non Parametric Methods Dr. Mohammed Alahmed 1. Learning Objectives 1.Distinguish Parametric & Nonparametric Test Procedures. 2.Explain commonly used Nonparametric.

Slides:



Advertisements
Similar presentations
COMPLETE BUSINESS STATISTICS
Advertisements

Chapter 16 Introduction to Nonparametric Statistics
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
INTRODUCTION TO NON-PARAMETRIC ANALYSES CHI SQUARE ANALYSIS.
Statistical Tests Karen H. Hagglund, M.S.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 14 Analysis of Categorical Data
Chapter 12 Chi-Square Tests and Nonparametric Tests
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Statistics for Managers Using Microsoft® Excel 5th Edition
Inferences About Process Quality
Student’s t statistic Use Test for equality of two means
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Non-parametric statistics
Nonparametrics and goodness of fit Petter Mostad
Chapter 15 Nonparametric Statistics
Non-parametric tests Note: When valid use parametric Note: When valid use parametric Commonly used Commonly usedWilcoxon Chi square etc. Performance comparable.
Statistical Methods II
AM Recitation 2/10/11.
Non-parametric Dr Azmi Mohd Tamil.
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-1 CHAPTER 17 BIVARIATE STATISTICS: NONPARAMETRIC TESTS.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 12-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Chapter 14: Nonparametric Statistics
Fundamentals of Hypothesis Testing: One-Sample Tests
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
14 Elements of Nonparametric Statistics
NONPARAMETRIC STATISTICS
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
What are Nonparametric Statistics? In all of the preceding chapters we have focused on testing and estimating parameters associated with distributions.
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Chapter 16 The Chi-Square Statistic
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Ordinally Scale Variables
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
1 Nonparametric Statistical Techniques Chapter 17.
CHI SQUARE TESTS.
Chapter 13 CHI-SQUARE AND NONPARAMETRIC PROCEDURES.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Kruskal-Wallis H TestThe Kruskal-Wallis H Test is a nonparametric procedure that can be used to compare more than two populations in a completely randomized.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
© Copyright McGraw-Hill 2004
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Week 6 Dr. Jenne Meyer.  Article review  Rules of variance  Keep unaccounted variance small (you want to be able to explain why the variance occurs)
NONPARAMETRIC STATISTICS In general, a statistical technique is categorized as NPS if it has at least one of the following characteristics: 1. The method.
Chapter 14 Nonparametric Methods and Chi-Square Tests
Chapter Fifteen Chi-Square and Other Nonparametric Procedures.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
1 Nonparametric Statistical Techniques Chapter 18.
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Hypothesis testing. Chi-square test
Hypothesis testing. Chi-square test
Chapter 10 Analyzing the Association Between Categorical Variables
Analyzing the Association Between Categorical Variables
Chapter 18: The Chi-Square Statistic
Presentation transcript:

Non Parametric Methods Dr. Mohammed Alahmed 1

Learning Objectives 1.Distinguish Parametric & Nonparametric Test Procedures. 2.Explain commonly used Nonparametric Test Procedures. 3.Perform Hypothesis Tests Using Nonparametric Procedures. Dr. Mohammed Alahmed2

Introduction In the previous sections we learned a lot about one-sample, two-sample, paired t- tests, ANOVA, regression. All of these tests had some basic assumptions: 1.the individual samples were approximately normal. 2.the individual samples came from populations with approximately equal variance. 3.we preferred that the individual samples were of a size greater than 30. Methods of estimation and hypothesis testing have been based on these assumptions. Dr. Mohammed Alahmed3

These procedures are usually called parametric statistical methods because the parametric form of the distribution is assumed to be known. If these assumptions about the shape of the distribution are not made, and/or if the central-limit theorem also seems inapplicable because of small sample size, then non-parametric statistical methods, which make fewer assumptions about the distributional shape, must be used. Non-parametric tests are typically focused on the median (rather than on the mean) and involve fairly straight- forward procedures like ordering and counting. Most nonparametric methods based on ranks instead of original data. Dr. Mohammed Alahmed4

Statistical Testing Test ParametricNon Parametric One Quantitative Response Variable One-Sample t-testSign Test One Quantitative Response Variable – Two Values from Paired Samples Paired Sample t- test Wilcoxon Signed Rank Test One Quantitative Response Variable – One Qualitative Independent Variable with two groups Two-independent Sample t-test Wilcoxon Rank Sum or Mann Whitney Test One Quantitative Response Variable – One Qualitative Independent Variable with three or more groups ANOVAKruskall Wallis Dr. Mohammed Alahmed5

The Sign Test The sign test is used to test hypotheses about the median, rather than the mean in the parametric test. Assume the null hypothesis is that the median of the distribution is zero. Tests One Population Median. Let S  = number of values greater than median. If null hypothesis is true, S  should have binomial distribution with success probability 0.5 More precisely, number of positive values should follow a binomial distribution with probability 0.5 When the sample is large, the binomial distribution can be approximated with a normal distribution. Dr. Mohammed Alahmed6

Conducting a sign test State the hypotheses: –H 0 : median = m 0 and H 1 : median  m 0 (Two - tailed) H 1 : median > m 0 (Right-tailed) H 1 : median < m 0 (Left-tailed) Convert data to plus (+) and minus (-) signs: –Change all data to + (above m 0 ) or – (below m 0 ) –Any values = m 0 change to 0 Dr. Mohammed Alahmed7

Compare the number of + and – signs. (Ignore 0’s.) –If the number of + signs and the number of – signs are approximately equal, the null hypothesis is not likely to be rejected. –If they are not approximately equal, however, it is likely that the null hypothesis will be rejected. Dr. Mohammed Alahmed8

Test Statistic: When n ≤ 20, the test statistic is the smaller number (x) of + or – signs. When n > 20, the test statistic is: –where X is the smaller number of + or  signs and n is the sample size, i.e., the total number of + or  signs (zeros excluded). Dr. Mohammed Alahmed9

Example Recent studies of the private practices of physicians suggested that the median length of each patient visit was 22 minutes. It is believed that the median visit length in practices is shorter than 22 minutes. A random sample of 20 visits in practices yielded, in order, the following visit lengths: Based on these data, is there sufficient evidence to conclude that the median visit length in practices is shorter than 22 minutes? Dr. Mohammed Alahmed10

Solution: We are interested in testing: H 0 : m = 22 vs. H 1 : m < 22. Dr. Mohammed Alahmed11

Dr. Mohammed Alahmed12

Exact test (binomial): Dr. Mohammed Alahmed13

The Wilcoxon Signed-Rank Test Wilcoxon Signed-rank test is another non-parametric test used for paired data, equivalent to the paired t-test. We wish to test the hypothesis that the median of the first sample equals the median of the second. It is nonparametric, because it is based on the ranks of the observations rather than on their actual values, as is the paired t test. Use the Wilcoxon Signed-Rank if the assumption of normality is violated for the paired-t test Dr. Mohammed Alahmed14

Procedure The first step in this test is to compute ranks for each observation, as follows: 1.Obtain Difference Scores, d i = x 1i - x 2i, and arrange the differences d i in order of absolute value. 2.Count the number of differences with the same absolute value. 3.Ignore the observations where d i = 0, and rank the remaining observations from 1 for the observation with the lowest absolute value, up to n for the observation with the highest absolute value. 4.If any differences are equal, average their ranks 5.Compute the rank sum R 1 of the positive differences and the rank sum R 2 of the negative differences. 6.Compare the smaller of the two rank sums with the T value, obtained from the Appendix of Wilcoxon T values (Table 11). 7.If n ≥ 16, use normal approximation. Dr. Mohammed Alahmed15

Example Patient Hours of sleep Difference Rank Ignoring sign DrugPlacebo * * rd & 4 th ranks are tied hence averaged R= smaller of R 1 (50.5) and R 2 (4.5) Here R = 4.5 significant at 2% level (see Table 11) indicating the drug (hypnotic) is more effective than placebo. Dr. Mohammed Alahmed16

Dr. Mohammed Alahmed17

Example Twelve adult males were put on a diet in a weight-reducing plan. Weights were recorded before and after the diet. The data are shown in the table below. Use the Wilcoxon Signed-Rank Test to determine if the plan was successful. Use α=0.05. Before After Dr. Mohammed Alahmed18

Dr. Mohammed Alahmed19

The Wilcoxon Rank-Sum Test The Wilcoxon Rank-Sum Test is a nonparametric analog to the t-test for two independent samples. Here, we do NOT have paired data, but rather n 1 values from group 1 and n 2 values from group 2. We want to test whether the values in the groups are samples from different distributions. Used to determine if two independent samples came from the same or equal populations Dr. Mohammed Alahmed20

Procedure Rank the data of both the groups in ascending order. If any values are equal average their ranks. Compute the rank sum R 1 in the first sample (the choice of sample is arbitrary). Compare this sum with the critical ranges given in table 12. Dr. Mohammed Alahmed21

Example Non-smokers (n=15) Heavy smokers (n=14) Birth wt (Kg)RankBirth wt (Kg)Rank * * * Sum=272Sum=163 * 17, 18 & 19are tied hence the ranks are averaged Dr. Mohammed Alahmed22

H 0 : the observations come from the same population Dr. Mohammed Alahmed23

H 0 : m 1 = m 2 H 1 : m 1 ≠ m 2 Dr. Mohammed Alahmed24

Dr. Mohammed Alahmed Kruskal-Wallis One-Way Analysis of Variance In some instances we want to compare means among more than two samples, but either the underlying distribution is far from being normal or we have ordinal data. In these situations, a non-parametric alternative to the One- way ANOVA is The Kruskal-Wallis Test. H 0 : All k populations have the same median. H 1 : Not all of the k population medians are the same. Like all non-parametric tests, the focus is on ranks, counting and the medians. The hypotheses statements are written as: 25

Dr. Mohammed Alahmed The Kruskal-Wallis test To compare the medians of K samples (K > 2) using nonparametric methods, use the following procedure: Pool the observations over all samples, thus constructing a combined sample of size n = Σn i Assign ranks to the individual observations, using the average rank in the case of tied observations. Compute the rank sum R i for each of the k samples. If there are no ties, compute the test statistic 26

Dr. Mohammed Alahmed Under the null hypothesis, this has an approximate distribution The approximation is OK when each group contains at least 5 observations For a level α test: Reject H o if W >, otherwise do not reject H o 27

Dr. Mohammed Alahmed Example: Depression Does physical exercise alleviate depression? We find some depressed people and check that they are all equivalently depressed to begin with. Then we allocate each person randomly to one of three groups: no exercise; 20 minutes of jogging per day; or 60 minutes of jogging per day. At the end of a month, we ask each participant to rate how depressed they now feel, on a Likert scale that runs from 1 ("totally miserable") through to 100 (ecstatically happy"). The appropriate test here is the Kruskal-Wallis test. We have three separate groups of participants, each of whom gives us a single score on a rating scale. Ratings are examples of an ordinal scale of measurement, and so the data are not suitable for a parametric test. The Kruskal-Wallis test will tell us if the differences between the groups are so large that they are unlikely to have occurred by chance. 28

Dr. Mohammed Alahmed NoexerciseNoexercise Jogging for 20 minutes Jogging for 60 minutes Data Rating on depression scale: 29

Dr. Mohammed Alahmed30

Dr. Mohammed Alahmed31

Dr. Mohammed Alahmed H 0 : All populations have the same median. H 1 : Not all of the population medians are the same. Conclusion: Since p-value < α, then reject H 0 Conclusion: Since p-value < α, then reject H 0 32

Key Concepts These methods can be used when the data cannot be measured on a quantitative scale, or when The numerical scale of measurement is arbitrarily set by the researcher, or when The parametric assumptions such as normality or constant variance are seriously violated. Dr. Mohammed Alahmed33

Hypothesis Testing: Categorical Data Dr. Mohammed Alahmed34

Introduction In Chapters 7 and 8, the basic methods of hypothesis testing for continuous data were presented. If the variable under study is not continuous but is instead classified into categories, which may or may not be ordered, then different methods of inference should be used. Dr. Mohammed Alahmed35

Categorical data analysis deals with discrete data that can be organized into categories. The data are organized into a contingency table. The  2 distribution is used in categorical data analysis. Dr. Mohammed Alahmed36

Independent (Explanatory) Variable is Categorical (Nominal or Ordinal) Dependent (Response) Variable is Categorical (Nominal or Ordinal) Special Cases: –2x2 (Each variable has 2 levels) –Nominal/Nominal –Nominal/Ordinal –Ordinal/Ordinal Dr. Mohammed Alahmed37

Contingency Tables Tables representing all combinations of levels of explanatory and response variables Numbers in table represent Counts of the number of cases in each cell Row and column totals are called Marginal counts The contingency table is also known as a crosstabulation, because it counts the cases that fall into each pairing of the table. Dr. Mohammed Alahmed38

Chi-Square (χ 2 ) and Frequency Data For chi ‑ square, the data are frequencies rather than numerical scores. Chi Square is used when both variables are measured on a nominal or ordinal scale. It can be applied to interval or ratio data that have been categorized into a small number of groups. It assumes that the observations are randomly sampled from the population. All observations are independent (an individual can appear only once in a table and there are no overlapping categories). It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances. Chi-squared is based upon the differences between observed and expected frequencies Dr. Mohammed Alahmed39

Chi-Square Statistic Measures how far the observed values are from the expected values Take sum over all cells in table When is large, there is evidence that H 0 is false. Dr. Mohammed Alahmed40

Two non-parametric hypothesis tests using the chi-square statistic: 1.the chi-square test for goodness of fit 2.the chi-square test for independence. Assumptions –Independent observations. –A sample size of at least 10. –Random sampling. –All observations must be used. –For the test to be accurate, the expected frequency should be at least 5. Dr. Mohammed Alahmed41

Goodness-of-Fit Test A goodness-of-fit test is an inferential procedure used to determine whether a frequency distribution follows a claimed distribution. The chi-square test for goodness-of- fit is a nonparametric test when we have (nominal or ordinal) data. it uses frequency data from a sample to test hypotheses about the shape or proportions of a population. Dr. Mohammed Alahmed42

Each individual in the sample is classified into one category on the scale of measurement. The data, called observed frequencies, simply count how many individuals from the sample are in each category. The hypotheses to these tests are written a little different than we have seen in the past because they are usually written in word. Dr. Mohammed Alahmed43

Example (Example page 401 in the book) Diastolic blood-pressure measurements were collected at home in a community-wide screening program of 14,736 adults ages 30−69 in East Boston, as part of a nationwide study to detect and treat hypertensive people. The people in the study were each screened in the home, with two measurements taken during one visit. A frequency distribution of the mean diastolic blood pressure is given in the Table in 10-mm Hg intervals. Group (mm Hg)< 5050 –60 – –90 –100 –110 -Total Observed Frequency Dr. Mohammed Alahmed44

We would like to assume these measurements came from an underlying normal distribution, so that we can use parametric methods. We want to tes t: –H o : the random variable follows normal distribution –H 1 : the random variable does not follow normal distribution How can the above hypothesis be tested? To test this hypothesis: –Estimate parameters from data. –Compute expected counts. –Compute the test statistic used for contingency tables. –This will now have a chi-squared distribution under the null hypothesis. Dr. Mohammed Alahmed45

Dr. Mohammed Alahmed46

Enter the expected frequency from Table Dr. Mohammed Alahmed47

Conclusion: We reject the null hypothesis. Thus the normal model does not provide an adequate fit to the data. Conclusion: We reject the null hypothesis. Thus the normal model does not provide an adequate fit to the data. Dr. Mohammed Alahmed48

Test of Independence The chi-square test of independence is probably the most frequently used hypothesis test in the social sciences. The chi-square test of independence is used to determine whether there is association between a row variable and column variable in a contingency table constructed from sample data. Dr. Mohammed Alahmed49

Hypothesis: H 0 : The row variable is independent of the column variable. H 1 : The row variable is dependent (related to) the column variable. Test Statistic: Expected Observed Dr. Mohammed Alahmed50

Example Smoking Lung cancer Total PositiveNegative Obs.Exp.Obs.Exp. Smoker Non smoker Total To determine whether there is an association between smoking and lung cancer! Dr. Mohammed Alahmed51

Hypothesis: H 0 : No Relationship between smoking and lung cancer H 1 : The two variables are associated. Test statistic: = Dr. Mohammed Alahmed52

 = 0.05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): χ 2 1, α/2 = from χ 2 table Test Statistic: Decision: Reject H 0 at  =.05 Conclusion: There is evidence of a relationship between smoking and lung cancer. χ 2 = Dr. Mohammed Alahmed53

Using SPSS Dr. Mohammed Alahmed54

Dr. Mohammed Alahmed Conclusion: Since p-value < α, then reject H 0 Conclusion: Since p-value < α, then reject H 0 55