Copyright (c) Bani Mallick1 STAT 651 Lecture 8. Copyright (c) Bani Mallick2 Topics in Lecture #8 Sign test for paired comparisons Wilcoxon signed rank.

Slides:



Advertisements
Similar presentations
Lecture 6 Outline – Thur. Jan. 29
Advertisements

Is it statistically significant?
CHAPTER 9 Testing a Claim
One sample T Interval Example: speeding 90% confidence interval n=23 Check conditions Model: t n-1 Confidence interval: 31.0±1.52 = (29.48, 32.52) STAT.
Confidence Interval and Hypothesis Testing for:
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Significance Tests Chapter 13.
Copyright (c) Bani Mallick1 Stat 651 Lecture 5. Copyright (c) Bani Mallick2 Topics in Lecture #5 Confidence intervals for a population mean  when the.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.
Copyright (c) Bani Mallick1 Lecture 2 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #2 Population and sample parameters More on populations.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.
Copyright (c) Bani Mallick1 STAT 651 Lecture 7. Copyright (c) Bani Mallick2 Topics in Lecture #7 Sample size for fixed power Never, ever, accept a null.
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Copyright (c) Bani Mallick1 STAT 651 Lecture 10. Copyright (c) Bani Mallick2 Topics in Lecture #10 Comparing two population means using rank tests Comparing.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #16.
Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.
Lecture 9 Today: –Log transformation: interpretation for population inference (3.5) –Rank sum test (4.2) –Wilcoxon signed-rank test (4.4.2) Thursday: –Welch’s.
Chapter 11: Inference for Distributions
Copyright © 2010 Pearson Education, Inc. Chapter 24 Comparing Means.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture # 12.
Copyright (c) Bani Mallick1 STAT 651 Lecture # 11.
5-3 Inference on the Means of Two Populations, Variances Unknown
Copyright (c)Bani K. Mallick1 STAT 651 Lecture #21.
Power and Sample Size IF IF the null hypothesis H 0 : μ = μ 0 is true, then we should expect a random sample mean to lie in its “acceptance region” with.
Chapter 24: Comparing Means.
Experimental Statistics - week 2
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 23, Slide 1 Chapter 23 Comparing Means.
More About Significance Tests
NONPARAMETRIC STATISTICS
Comparing Two Population Means
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
1 Nonparametric Statistical Techniques Chapter 17.
Nonparametric Statistics
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
AP Statistics Chapter 24 Comparing Means.
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture 6.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Comparing Means Chapter 24. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
+ Unit 6: Comparing Two Populations or Groups Section 10.2 Comparing Two Means.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.3 Tests About a Population.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide Next Time: Make sure to cover only pooling in TI-84 and note.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
1 Nonparametric Statistical Techniques Chapter 18.
Nonparametric Tests PBS Chapter 16 © 2009 W.H. Freeman and Company.
CHAPTER 9 Testing a Claim
Inferences for Regression
This Week Review of estimation and hypothesis testing
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Presentation transcript:

Copyright (c) Bani Mallick1 STAT 651 Lecture 8

Copyright (c) Bani Mallick2 Topics in Lecture #8 Sign test for paired comparisons Wilcoxon signed rank test for paired comparisons Comparing two population means: first pass

Copyright (c) Bani Mallick3 Book Sections Covered in Lecture #8 My own material (sign test) Chapter 6.5 (Wilcoxon signed rank test, although I think my explanation is better) Chapters (comparing two population means: this lecture will not have any formulae)

Copyright (c) Bani Mallick4 Lecture 7 Review: Sample Size Calculations You want to test at level (Type I error)  the null hypothesis that the mean = 0 You want power 1 -  to detect a change of from the hypothesized mean by the amount  or more, i.e., the mean is greater than  or the mean is less than -  There is a formula for this, that I showed you in class.

Copyright (c) Bani Mallick5 Lecture 7 Review: Never Accept a Null Hypothesis Suppose we use a 95% confidence interval, it includes zero. Why do I say: with 95% confidence, I cannot reject that the population mean is zero. I never, ever say: I can therefore conclude that the population mean is zero.

Copyright (c) Bani Mallick6 Lecture 7 Review: Never Accept a Null Hypothesis If you pick a tiny sample size, there is no statistical power to reject the null hypothesis In particular, p-values are not the probability that the null hypothesis is true.

Copyright (c) Bani Mallick7 Lecture 7 Review: P-Values The p-value is NOT the probability that the null hypothesis is true. p-values are simply a mechanical way to understand what will happen to hypothesis tests when you go out and compute them. For example if you take n=2, you will have no power, hence you will have high p-values. Does this mean that the null hypothesis has a high probability of being correct? No! It means you have a rotten study.

Copyright (c) Bani Mallick8 Lecture 7 Review: Student’s t- Distribution The (1  100% CI when  was known was The (1  100% CI when is  unknown is You replace  by s and by

Copyright (c) Bani Mallick9 Lecture 7 Review: Student’s t- Distribution Take 95% confidence,  = 0.05 z  = 1.96 n = 3, n-1 = 2, t  (n-1) = n = 10, n-1 = 9, t  (n-1) = n = 30, n-1 = 29, t  (n-1) = n = 121, n-1 = 120, t  (n-1) = 1.98

Copyright (c) Bani Mallick10 Paired Comparisons We have shown how to test : population mean difference = 0; : population mean difference 0; using t-statistics (confidence intervals and tests).

Copyright (c) Bani Mallick11 Paired Comparisons Unfortunately, it often arises (as it does for the hormone assay) that the differences between two variables can have many outliers. We know that outliers affect the sample mean and especially the sample standard deviation, making the latter larger. Larger standard deviations mean larger confidence intervals and hence less power.

Copyright (c) Bani Mallick12 Paired Comparisons There are two alternative methods that are not so affected by outliers These are the Wilcoxon signed rank test and the sign test Both are available in SPSS: “Analyze”, “Nonparametric Tests”, “2 Related Samples”, also click in “sign” test.

Copyright (c) Bani Mallick13 Paired Comparisons The sign test is simple: recode the data +1 = positive difference 0 = no difference -1 = negative difference Then run a t-test and compute the p-value Problem: No confidence intervals Serves as check in t-inferences

Copyright (c) Bani Mallick14 HAND EXAMPLE Data Signs

Copyright (c) Bani Mallick15 Paired Comparisons The Wilcoxon signed rank test is simple: recode the data! Take the absolute values of the data Order the absolute values from largest to smallest To the smallest absolute value, assign the number –1 if the actual difference is negative, 0 if there is no difference, +1 if the difference is positive

Copyright (c) Bani Mallick16 Paired Comparisons The Wilcoxon signed rank test is simple: To the jth absolute value in order, assign the number –j if the actual difference is negative, 0 if there is no difference, +j if the difference is positive Then run a t-test and compute the p-value Problem: No confidence intervals Serves as check on t-inferences

Copyright (c) Bani Mallick17 HAND EXAMPLE Data Absolute Rank Signed Rank (Run t-test on these guys)

Copyright (c) Bani Mallick18 Armspan Data Sign test p-value = Wilcoxon signed rank test p-value = t-test p-value = All consistent!

Copyright (c) Bani Mallick19 Hormone Assay Data Remember that in the hormone assay data, we seemed to get different inferences based on whether we used the raw data or their logarithms The sign test is not affected by transformations The Wilcoxon test may be slightly affected by transformations when studying paired comparisons

Copyright (c) Bani Mallick20 Hormone Assay Data t-test on raw data, p = t-test of log data, p = Sign test, logged or raw data, p = Wilcoxon signed rank test, raw data, p = 0.016, logged data p = Remember, I claimed that the log data scale was most nearly bell-shaped, and hence thought there was a difference!

Copyright (c) Bani Mallick21 Comparing Two Population Means A great deal of our effort will go into comparing population means. Bluebonnet Heights on red petals: does environment matter? Are true building costs different in Bryan and College Station, after accounting for land valuaton?

Copyright (c) Bani Mallick22 Comparing Two Population Means We’ll use all our methods Histograms, boxplots, q-q plots, confidence intervals, nonparametric tests

Copyright (c) Bani Mallick23 Comparing Two Populations There a two populations Take a sample from each population The sample sizes need not be the same Population 1: Population 2:

Copyright (c) Bani Mallick24 Comparing Two Populations Each will have a sample standard deviation Population 1: Population 2:

Copyright (c) Bani Mallick25 Comparing Two Populations Each sample with have a sample mean Population 1: Population 2: That’s the statistics. What are the parameters?

Copyright (c) Bani Mallick26 Comparing Two Populations Each sample with have a population standard deviation Population 1: Population 2:

Copyright (c) Bani Mallick27 Comparing Two Populations Each sample with have a population mean Population 1: Population 2:

Copyright (c) Bani Mallick28 Comparing Two Populations How do we compare the population means and ???? The usual way is to take their difference: If the population means are equal, what is their difference?

Copyright (c) Bani Mallick29 Comparing Two Populations The usual way is to take their difference: If the population means are equal, their difference = 0 Suppose we form a confidence interval for the difference. What do we learn? Say a 95% CI is from 1 to 3?

Copyright (c) Bani Mallick30 Comparing Two Populations The usual way is to take their difference: Suppose we form a confidence interval for the difference. What do we learn? Say a 95% CI is from 1 to 3? Population 1 has a mean that is between 1 and 3 units larger than population 2, with 95% probability

Copyright (c) Bani Mallick31 Comparing Two Populations Before learning how this confidence interval is computed, let’s look at an example.

Copyright (c) Bani Mallick32 NHANES Comparison “Analyze”, “Compare Means”, “Independent Samples” will get you the analysis in SPSS You will get lots and lots of things, so we have to be a little careful First do the plots, then the analysis! You will get means and standard errors

Copyright (c) Bani Mallick33 NHANES Comparison

Copyright (c) Bani Mallick34 NHANES Comparison (Cancer Cases)

Copyright (c) Bani Mallick35 NHANES Comparison (Healthy Cases)

Copyright (c) Bani Mallick36 NHANES Comparison Healthy: Mean = , s =.6173, se = Cancer: Mean = , s =.6423, se = Note: The sample standard deviations are nearly numerically equal. This agree with the box plots, where the IQR’s are nearly equal Note how small the standard errors are

Copyright (c) Bani Mallick37 NHANES Comparison The next thing is that there will be two rows, one for “Equal Variances Assumed”, the other for “Equal Variances Not Assumed” Because we have been careful, the variability looks to be in the same ballpark. Thus I would conclude to assume equal variances

Copyright (c) Bani Mallick38 NHANES Comparison What happens if the variances do not look equal? Generally the results are not very different unless the sample sizes are quite small. Generally, people quote the “Variances assumed equal” p-values and CI You have a backup, nonparametric rank tests, that we will discuss later. It’s pretty hard to make a huge blunder

Copyright (c) Bani Mallick39 NHANES Comparison The “Mean Difference” is Since the healthy cases had a higher mean, this is Mean(Healthy) – Mean(Cancer) The 95% CI is from to What is this a CI for?

Copyright (c) Bani Mallick40 NHANES Comparison The “Mean Difference” is Since the cancer cases had a higher mean, this is Mean(Healthy) – Mean(Cancer) The 95% CI is from to What is this a CI for? In the log scale, healthy people eat between and of saturated fat than women who developed breast cancer, with 95% probability.