Biostatistics course Part 17 Non-parametric methods

Slides:

Advertisements

Similar presentations

AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory

Advertisements

Lecture 8: Hypothesis Testing

Prepared by Lloyd R. Jaisingh

STATISTICS HYPOTHESES TEST (I)

STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.

STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.

Overview of Lecture Parametric vs Non-Parametric Statistical Tests.

C82MST Statistical Methods 2 - Lecture 2 1 Overview of Lecture Variability and Averages The Normal Distribution Comparing Population Variances Experimental.

Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION

SADC Course in Statistics Common Non- Parametric Methods for Comparing Two Samples (Session 20)

Non-parametric tests, part A:

The basics for simulations

Factoring Quadratics — ax² + bx + c Topic

EE, NCKU Tien-Hao Chang (Darby Chang)

Elementary Statistics

9.4 t test and u test Hypothesis testing for population mean Example : Hemoglobin of 280 healthy male adults in a region: Question: Whether the population.

February Nature of the distribution is not known, or known to be non-normal. Sometimes called distribution free statistics Everything up to this.

Biostatistics course Part 13 Effect measures in 2 x 2 tables Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences.

Chi-Square and Analysis of Variance (ANOVA)

Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and.

Biostatistics course Part 6 Normal distribution Dr. en C. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and.

Chapter 4 Inference About Process Quality

Statistics Review – Part I

Biostatistics course Part 11 Comparison of two proportions Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences.

Statistical Inferences Based on Two Samples

© The McGraw-Hill Companies, Inc., Chapter 12 Chi-Square.

Chapter Thirteen The One-Way Analysis of Variance.

Comparing Two Means.

Chapter 8 Estimation Understandable Statistics Ninth Edition

Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.

Experimental Design and Analysis of Variance

Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation.

1 Chapter 20: Statistical Tests for Ordinal Data.

Simple Linear Regression Analysis

Chapter 14 Nonparametric Statistics

Multiple Regression and Model Building

Biostatistics course Part 14 Analysis of binary paired data

Chapter 16: Correlation.

9. Two Functions of Two Random Variables

McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Nonparametric Methods Chapter 15.

Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.

15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.

Nonparametrics and goodness of fit Petter Mostad

Chapter 15 Nonparametric Statistics

Statistical Methods II

Biostatistics course Part 16 Lineal regression Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering.

Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering.

14 Elements of Nonparametric Statistics

Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering.

CHAPTER 14: Nonparametric Methods

Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.

CHAPTER 14: Nonparametric Methods to accompany Introduction to Business Statistics seventh edition, by Ronald M. Weiers Presentation by Priscilla Chaffe-Stengel.

Course on Biostatistics Part 1 What is statistics? Dr. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering.

Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.

© Copyright McGraw-Hill CHAPTER 13 Nonparametric Statistics.

Biostatistics course Part 5 Binomial distribution

Biostatistics course Part 12 Association between two categorical variables Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division.

Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.

Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.

Economics 173 Business Statistics Lectures 1 Fall, 2001 Professor J. Petry.

Biostatistics Nonparametric Statistics Class 8 March 14, 2000.

Biostatistics course Part 7 Introduction to inferential statistics Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics, Division Health.

Nonparametric Tests with Ordinal Data Chapter 18.

Biostatistic course Part 10 Inferences from a proportion Dr. Sc. Nicolas Padilla Raygoza Department dof Nursing and Obstetrics Division Health Sciences.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. CHAPTER 14: Nonparametric Methods to accompany Introduction to Business Statistics fifth.

SA3202 Statistical Methods for Social Sciences

Nonparametric Statistical Methods: Overview and Examples

Non – Parametric Test Dr. Anshul Singh Thapa.

Nonparametric Statistics

Statistical Inference for the Mean: t-test

Presentation transcript:

Biostatistics course Part 17 Non-parametric methods Dr. C. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineers Campus Celaya-Salvatierra University of Guanajuato

Biosketch Médico Cirujano por la Universidad Autónoma de Guadalajara. Pediatra por el Consejo Mexicano de Certificación en Pediatría. Diplomado en Epidemiología, Escuela de Higiene y Medicina Tropical de Londres, Universidad de Londres. Master en Ciencias con enfoque en Epidemiología, Atlantic International University. Doctorado en Ciencias con enfoque en Epidemiología, Atlantic International University. Profesor Asociado C, Department of Nursing and Obstetrics, Division of Health Sciences and Engineerings, Campus Celaya Salvatierra, University of Guanjuato. padillawarm@gmail.com

Competencies The reader will know the non-parametric methods and when he(she) can use them. He (she) will apply non-parametric methods in an appropriate form. He (she) can obtain a confidence interval in non-paramethric analysis He (she) will apply Wilcoxon sum rank test He (she) will apply Wilcoxon He (she) will apply r Spearman.

Introduction Parametric methods They are base in means, standard deviations or probabilities. The Normal distribution is not always appropriate To study variables with a few observations, Non-symmetrical distributions, or Variables that can have more than two values Until now, we have using methods that assume that variable has a distribution with some characteristics: a) the distribution is Normal, in quantitative data, and b) in binary data, the distribution is binomial, with a probability, p. Means, standard deviations and probabilities, are calling parameters and the methods to make inferences about these parameters, are calling parametric methods. There are other distributions; with them, we do not assume that they are Normal or binomials; as when the sample size is small, quantitative data do not have Normal distribution or when categorical data have more than two values.

Introduction (contd…) When this happens, we use other anaylisis methods Non-parametric methods They are not based in the same assumptions that parametric methods, but also have some assumptions.

Categories (ranking), means, medians Some non-parametric methods use rankings en lugar de los real values. Categories are use to compare data, more for their ranking that for their size. Patient Glucose in blood (mg/dl) 1 135 2 225 3 70 4 100 5 110 6 150 7 90 8 9 170 10 60 11 80 From these data, we van see that patient 1 have a lesser value of glucose in blood that patient 2. When we can to study the order, we can summarize data and apply statistic tests, without waiting that data have a particular distribution. Median is the center of the ordinal classification (ranking).

Categories (ranking), means, median Ranked in ascending order Patient Glucose in blood (mg/dl) Ranking 10 60 1 3 70 2 11 80 7 90 4 100 5 8 6 110 135 150 9 170 225 Median is the center of categories distribution.

Are mean and median equals? To use mean and confidence interval is adequate, the distribution of values should be symmetric. To the median and confidence intervals are adequate, no need for assumptions. To calculate the confidence interval of a median, we need only calculate the probability that the sample data are grouped symmetrically around the true median. There are tables published for the confidence interval of a median. Having already located the middle, the table lists the values for lower limit and upper confidence interval.

Are mean and median equals? Using the order (ranking) instead of original values, reduces the need for assumptions about the distribution, the calculations are simpler and faster. The disadvantage is that the original values are lost. Thus, non-parametric methods are used only to test hypotheses, not for estimation purposes.

Non-parametric methods Situation Non-paramethric method Paramethric methods One sample Wilcoxon signed rank test Z statistic ( t test) Two indpendent samples Wilcoxon sum rank test Z statistic for two independent samples (t test) Two paired samples Z-paired statistic (t-paired test) One sample, two quantitative variables Correlation coefficient of Spearrman Correlation coefficient of Pearson

Data of one sample The table show data of glucose levels in blood from 11 patients. We want to know if the mean is 100 mg/dl. Patient Glucose in blood (mg/dl) Ranking 10 60 1 3 70 2 11 80 7 90 4 100 5 8 6 110 135 150 9 170 225 If we use a parametric method to know the answer to this question, we should assume that distribution of glucose levels in blood in this sample are approximately Normal and then, probe: Null hypothesis Ho: μ = 100 Alternative hypothesis H1: μ ≠ 100 Then we calculate a t-statistic test ( no Z) because the little sample size. _ Mean X = 117.27 s = 49.01 SE = 15.52 t = 1.11 P > 0.05

Data of one sample Alternative no parametric test is Wilcoxon signed rank test. It can be used to evaluate if the values in the sample are centered in 100 mg/dl. This test does not require Normality of the distribution of data, but requires that the distribution is symmetrical, but not necessarily take the form of "bell" as Normal.

Data of one sample Wilcoxon signed rank test is calculate by six steps: 1. To calculate the difference between each observation and the interest value, 100 mg/dl. 2. You should exclude any difference = 0. 3. To classify and order (ranking) differences by magnitude , not taken into accoun the sign. 4. Sum the rankings of positive differences. 5. Sum the rankings of negative differences. 6. Select the more little sums and call it T. This T value is search in the critical values table from Wilcoxon signed rank test. We can read the table in the row corresponding to differences non-zero. Each row has ranges different of values corresponding to different p-values. If the T-values is out of range of the column or exactly equal at one of range- values, the p-value is less that corresponding to the column. If the T-value is between the values of range, p-value is higher that corresponding to the column.

Data of one sample Patient Glucose in blood (mg/dl) Differences with 100 mg/dl Rnking 10 60 -40 6 3 70 -30 4 11 80 -20 7 90 -10 2 100 8 5 110 1 135 35 150 50 9 170 225 125 Positive differences 1+5+7+8+9 = 30 Negative differences 6+4+3+2 = 15 Two differences = 0 T=15 From the table of critical values for Wilcoxon signed rank test for one sample, with sample size of 11 – 2, 0 differences, n = 9, the first column show the range from 10 – 35 (T=15 is between the range) and correspond to p=0.2. To obtain confidence interval, n=11, in the table, we search 11 sample size and the 95% confidence interval is between ranking 2 and 10.

Two independent groups 30 teenagers with acute apendicitis, were distributed 15 to underwent traditional apendicectomia and 15 with laparoscopic apedicectomia. For both groups, we evaluate post-surgical pain. Post-surgical pain Traditional Laparoscopy None 1 3 Slight 5 7 Moderate 4 Severe Total 15 The median in the group of traditional appendectomy is moderate pain in the laparoscopic group is slight.

Two independent groups To compate post-surgical pain in both groups, we can use Wilcoxon rank sum test. We define the null hypothesis Ho: the two distributions overlap. We define alternative hypothesis Hi: the two distributions are not overlap.

Two independent groups Wilcoxon rank sum test has three steps: We order the values in both groups in ascendant order. To calculate T as the sum of rankings of more short sample or one of two if the sample size is equal. To compare T-value in the critical values of Wilcoxon rank sum test.

Two independent groups Post-surgical pain Traditional Laparoscopy Rankings None 1 1+ 3 Slight 5 9+ 7 15 Moderate 21+ 4 25 Severe 29+ 30 Total T = 1 + 9 + 21+ 29 = 60 In the critical values of Wilcoxon table, n1,n2 (15,15), we search the 60 value and it correspond to p<0.05

Two paired groups The table show hours of improvement given by two analgesics in 12 patients with rheumatoid arthritis. To test that the improvement is the same with both analgesics, we can use paired-t test or Wilcoxon signed ranking test. With both methods, we calculate the difference of improvement in hours for each patient. Patient A Analgesic B Analgesic 1 3.5 2 3.6 5.7 3 2.6 2.9 4 2.4 5 7.3 9.9 6 3.4 3.3 7 14.9 16.7 8 6.6 6.0 9 2.3 3.8 10 2.0 4.0 11 6.8 9.1 12 8.5 26.9

Two paired groups With Wilcoxon signed rank test, it is no requirement the Normality, but the data should be symmetrical to both sides of the median. Ho: difference in medians = 0 Hi= difference in medians ≠ 0 Patient A Analgesic B Analgesic Difference Rankings 1 3.5 2 3.6 5.7 -2.1 8 3 2.6 2.9 -0.3 4 2.4 0.2 5 7.3 9.9 -2.6 10 6 3.4 3.3 0.1 7 14.9 16.7 -1.8 6.6 6.0 0.6 9 2.3 3.8 -1.5 2.0 4.0 -2.0 11 6.8 9.1 -2.3 12 8.5 26.9 -18.4 If we are using t-test, assume that the distribution of differences is Normal and note Ho: δ = 0 Hi:δ ≠ 0 δ is the mean of differences. δ = -2.41 SE = 1.48 t= -1.69 p>0.10

Two paired groups We calculate the Wilcoxon signed rank test for differences, making the following: 1.- Count how many differences non-zero. 2.- Order the differences by their magnitude, without take into account the sign. 3.- Sum rankings of positive differences. 4.- Sum rankings of negative differences. 5.- Select the more shor of the two sums and call it T. (Sum of negative differences = 59, sum of positive differences = 7, T=7). 6.- Compare the T-value in the critical values tables for Wilcoxon signed rank test. T=7, p<0.05. With these result, we rejected the null hypothesis (differences of medians is not 0).

Spearman’s correlation of ranks Table and graphic show incidence of colon cancer and average of meat intake per capita in 13 countries. Country Incidence colon ca Mean of intake of meat 1 10 2 8 9 3 11 5 4 12 22 33 6 67 37 7 73 32 48 41 31 21 29 17 13 We can measure correlation between two quantitative variables, using the r Pearson correlation. For the relationship between meat intake and incidence of colon cancer is r=0.65; but the both variables should have a Normal distribution. When they are not Normal, we can apply two strategies: 1.- To transform the variables ( logarithmic or squared) to come more Normal, or 2.- To use an equivalent nonparametric

Spearman ranks correlation It is appropiate for monotonic relationships, non-lineal. It is calculate at the same time that r’s Pearson, only using the rankings. To calculate it, we need three steps: To order the values of first variable, To order the values of second variable, To apply the formulae of r’s Pearson, using the rankings instead of original values.

Spearman ranks correlation Country Incidence colon ca Mean of meat intake Ranking of cancer Ranking of meat intake 1 10 3 2 8 9 7 11 5 4 12 22 33 6 67 37 73 32 13 48 41 31 21 29 17 r= Σ(x – median of x)(y-median of y) / √Σ(x – median of X)2 Σ(y-median of y)2 = 0.74

Comparison of methods Example Parametric method Non-parametric method Glucose in blood t test for one sample p>0.05 Wilcoxon signed rank test, p>0.2 Intensity of surgical pain t test for two independent samples p<0.05 Wilcoxon sun rank test p<0.05 Improvement of pain t paired test p>0.1 Wilcoxon signed rank test, p<0.05 Corrlation between colon cancer and meat intake R Pearson, r= 0.65 R Spearman, r=0.74

Bibliografy 1.- Last JM. A dictionary of epidemiology. New York, 4ª ed. Oxford University Press, 2001:173. 2.- Kirkwood BR. Essentials of medical ststistics. Oxford, Blackwell Science, 1988: 1-4. 3.- Altman DG. Practical statistics for medical research. Boca Ratón, Chapman & Hall/ CRC; 1991: 1-9.