Summer School in Statistics for Astronomers V June 1-6, 2009 Robustness, Nonparametrics and some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
PTP 560 Research Methods Week 9 Thomas Ruediger, PT.
Nonparametrics.Zip (a compressed version of nonparametrics) Tom Hettmansperger Department of Statistics, Penn State University References: 1.Higgins (2004)
Confidence Interval and Hypothesis Testing for:
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Hypothesis Testing.
Elementary hypothesis testing
Elementary hypothesis testing
Topic 2: Statistical Concepts and Market Returns
Final Review Session.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Inference about a Mean Part II
Chapter 2 Simple Comparative Experiments
Chapter 7 Estimation: Single Population
AP Statistics Section 10.2 A CI for Population Mean When is Unknown.
Chapter 11: Inference for Distributions
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
5-3 Inference on the Means of Two Populations, Variances Unknown
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing.
Introduction to Regression Analysis, Chapter 13,
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
AM Recitation 2/10/11.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Confidence Interval Estimation
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
One Sample Inf-1 If sample came from a normal distribution, t has a t-distribution with n-1 degrees of freedom. 1)Symmetric about 0. 2)Looks like a standard.
Chapter 11 Inference for Distributions AP Statistics 11.1 – Inference for the Mean of a Population.
Mid-Term Review Final Review Statistical for Business (1)(2)
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
Nonparametrics.Zip (a compressed version of nonparamtrics) Tom Hettmansperger Department of Statistics, Penn State University References: 1.Higgins (2004)
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
Lecture 10: Correlation and Regression Model.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Week121 Robustness of the two-sample procedures The two sample t-procedures are more robust against nonnormality than one-sample t-procedures. When the.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Math 4030 Final Exam Review. Probability (Continuous) Definition of pdf (axioms, finding k) Cdf and probability (integration) Mean and variance (short-cut.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Essential Statistics Chapter 171 Two-Sample Problems.
1 Design and Analysis of Experiments (2) Basic Statistics Kyung-Ho Park.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Estimating standard error using bootstrap
Statistical Inference
Lecture Nine - Twelve Tests of Significance.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Nonparametrics.Zip (a compressed version of nonparamtrics) Tom Hettmansperger Department of Statistics, Penn State University References: Higgins (2004)
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Sampling Distributions (§ )
Introductory Statistics
Presentation transcript:

Summer School in Statistics for Astronomers V June 1-6, 2009 Robustness, Nonparametrics and some Inconvenient Truths Tom Hettmansperger Dept. of Statistics Penn State University

Robust methods t-tests and F-test rank tests Least squares Nonparametrics

Some ideas we will explore: Robustness Nonparametric Bootstrap Nonparametric Density Estimation Nonparametric Rank Tests Tests for (non-)Normality The goal: To make you worry or at least think critically about statistical analyses.

Abstract Population Distribution, Model Real World Data Probability and Expectation Statistical Inference

Statistical Model, Population Distribution Research Hypothesis or Question in English Measurement, Exp. Design, Data Collection Translate Res. Hyp. or Quest. into a statement in terms of the model parameters Select a relevant statistic Carry out statistical inference Graphical displays Model criticism Sampling Distributions P-values Significance levels Confidence coefficients State Conclusions and Recommdations in English

Parameters in a population or model Typical Values: mean, median, mode Spread: variance (standard deviation), interquartile range (IQR) Outliers Shape: probability density function (pdf), cumulative distribution function (cdf)

NGC 4382 (n = 59) Research Question: How large are the luminosities in NGC 4382? Measure of luminosity (data below) Traditional model: normal distribution of luminosity Translate Res. Q.: What is the mean luminosity of the population? (Here we use the mean to represent the typical value.) The relevant statistic is the sample mean. orig: no: : Statistical Inference: 95% confidence interval for the mean using a normal approximation to the sampling distribution of the mean.

Variable N N* Mean SE Mean StDev NGC 4382_no NGC 4382_orig NGC 4382_ NGC 4382_ NGC 4382_ Minimum Q1 Median Q

Outliers can have arbitrarily large impact on the sample mean, sample standard deviation, and sample variance. A single outlier can increase the width of the t-confidence interval and inflate the margin of error for the sample mean. Inference can be adversely affected. First Inconvenient Truth: Second Inconvenient Truth: It is bad for a small portion of the data to dictate the results of a statistical analysis.

Third Very Inconvenient Truth: The construction of a 95% confidence interval for the population variance is very sensitive to the shape of the underlying model distribution. The standard interval computed in most statistical packages assumes the model distribution is normal. If this assumption is wrong, the resulting confidence coefficient can typically vary significantly. I am not aware of a stable 95% confidence interval for the population variance.

The ever hopeful statisticians

Robustness: structural and distributional Influence or sensitivity curves: The rate of change in a statistic as an outlier is varied. Breakdown: The smallest fraction of the data that must be altered to carry the statistic beyond any preset bound. We want bounded influence and high breakdown. Structural: We would like to have an estimator and a test statistic that are not overly sensitive to small portions of the data.

Distributional robustness: We want a sampling distribution for the test statistic that is not sensitive to changes or misspecifications in the model or population distribution. This type of robustness provides stable p-values for testing and stable confidence coefficients for confidence intervals.

Message: The sample mean is not structurally robust; whereas, the median is structurally robust. It takes only one observation to move the sample mean anywhere. It takes roughly 50% of the data to move the median. (Breakdown) Sensitivity Curve: SC mean (x) = x SC median (x) = (n+1)x (r) if x < x (r) (n+1)x if x (r) < x < x (r+1) (n+1)x (r+1) if x (r+1) < x when n = 2r

Mean Median x Influence Mean has linear, unbounded influence. Median has bounded influence.

Some good news: The sampling distribution of the sample mean depends only mildly on the population or model distribution. (A Central Limit Theorem effect) Provided our data come from a model with finite variance, for large sample size has an approximate standard normal distribution (mean 0 and variance 1). This means that the sample mean enjoys distributional robustness, at least approximately. We say that the sample mean is asymptotically nonparametric.

More inconvenient truth: the sample variance is neither structurally robust (unbounded sensitivity and breakdown tending to 0), but also lacks distributional robustness. Again, from the Central Limit Theorem: Provided our data come from a model with finite fourth moment, for large sample size has an approximate standard normal distribution with mean 0 and variance:  is called the kurtosis

 Approx true Conf Coeff The kurtosis and is a measure of the tail weight of a model distribution. It is independent of location and scale and has value 3 for any normal model. Assuming 95% confidence:

A very inconvenient truth: A test for normality will also mislead you!!

Some questions: 1.If statistical methodology based on sample means and sample variances is non robust, what can we do? Are you concerned about the last least squares analysis you carried out? (t-tests and F-tests) If not, you should be! 2.What if we want to simply replace the mean by the median as the typical value? The sample median is robust, at least structurally. What about the distribution? 3.The mean and the t-test go together. What test goes with the median?

We know that: How to find SE(median) and estimate it. Two ways: 1.Nonparametric Bootstrap (computational) 2.Estimate the standard deviation of the approximating normal distribution. (theoretical)

Result for NGC4382: SE(median) =.028 (.027 w/o the outlier,.028 w outlier = 24) Nonparametric Bootstrap; 1. Draw a sample of size 59 from the original NGC4382 data. Sample with replacement. 2.Compute and store the sample median. 3.Repeat B times. (I generally take B = 4999) 4.The histogram of the B medians is an estimate of sampling distribution of the sample median. 5.Compute the standard deviation of the B medians. This is the approximate SE of the sample median.

Theoretical (Mathematical Statistics) Moderately Difficult Let M denote the sample median. Provided the density (pdf) of the model distribution is not 0 at the model median, has an approximate normal distribution with mean 0 and variance 1/[4f 2 ( . In other words, SE(median) = and we must estimate the value of the density at the population median. where f(x) is the density and  is the model median.

Let f(x) denote a pdf. Based on a sample of size n we wish to estimate f(x 0 ) where x 0 is given. Define: Where K(t) is called the kernel and Nonparametric density estimation:

Then a bit of calculation yields: And a bit more: And so we want:

The density estimate does not much depend on K(t), the kernel. But it does depend strongly on h, the bandwidth. We often choose a Gaussian (normal) kernel: Next we differentiate the integrated mean squared error and set it equal to 0 to find the optimal bandwidth (indept of x 0 ). If we choose the Gaussian kernel and if f is normal then:

Recall, SE(median) = For NGC4382: n = 59, M = Bootstrap result for NGC4382: SE(median) =.028 finite sample approx Final note: both bootstrap and density estimate are robust.

The median and the sign test (for testing H 0 :  = 0) are related through the L 1 norm. To test H 0 :  = 0 we use S + (  X i > 0 which has a null binomial sampling distribution with parameters n and.5. This test is nonparametric and very robust.

Research Hypothesis: NGC4494 and NGC4392 differ in luminosity. Luminosity measurements (data) NGC 4494 (n = 101) … NGC 4382 (n = 59) … Statistical Model Two normal populations with possibly different means but with the same variance. Translation: H 0 :  4494 =   vs. H 0 :    

Select a statistic: The two sample t statistic The two sided t-test with significanc level.05 rejects the null hyp when |t| > 2. Recall that means and variances are not robust. VERY STRANGE!

Ratio of variances Ratio of sample sizes 1/41/13/1 1/ /1.05 4/ Table of true values of the significance level when the assumed level is.05. Another inconvenient truth: the true significance level can differ from.05 when some model assumptions fail.

An even more inconvenient truth: These problems extend to analysis of variance and regression. Seek alternative tests and estimates. We already have alternatives to the mean and t-test: the median and the sign test that are robust.

We next consider nonparametric rank tests and estimates for comparing two samples. (Competes with the two sample t-test and difference in sample means.) Generally suppose: To test H 0 :  or to estimate  we introduce is the rank of Y j in the combined data.

The robust estimate of  is Provides the robustness Provides the comparison As opposed to which is not robust.

Research Hypothesis: NGC4494 and NGC4392 differ in luminosity. Luminosity measurements (data) NGC 4494 (n = 101) … NGC 4382 (n = 59) … Statistical Model Two normal populations with possibly different medians but with the same scale. Translation: H 0 :   vs. H 0 : 

Mann-Whitney Test and CI: NGC 4494, NGC 4382 N Median NGC NGC Point estimate for Delta is Percent CI for Delta is (-0.328,-0.182) W = (Sum of ranks of NGC 4494) Test of Delta = 0 vs Delta not equal 0 is significant at (P-Value)

Recall the two sample t-test is sensitive to the assumption of equal variances. The Mann-Whitney test is less sensitive to the assumption of equal scale parameters. The null distribution is nonparametric. It does not depend on the common underlying model distribution. It depends on the permutation principle: Under the null hypothesis, all (m+n)! permutations of the data are equally likely. This can be used to estimate the p-value of the test: sample the permutations, compute and store the MW statistics, then find the proportion greater than the observed MW.

Here’s a bad idea: Test the data for normality using, perhaps, the Kolmogorov-Smirnov test. If the test accepts normality then use a t-test, and if it rejects normality then use a rank test. You can use the K-S test to reject normality. The inconvenient truth is that it may accept many possible models, some of which can be very disruptive to the t-test and sample means.

Absolute Magnitude Planetary Nebulae Milky Way Abs Mag (n = 81) …

But don’t be too quick to “accept” normality:

Null Hyp: Pop distribution, F(x) is normal The Kolmogorov-Smirnov Statistic The Anderson-Darling Statistic

A Strategy: Use robust statistical methods whenever possible. If you must use traditional methods (sample means, t and F tests) then carry out a parallel analysis using robust methods and compare the results. Start to worry if they differ substantially. Always explore your data with graphical displays. Attach probability error statements whenever possible.

What more can we do robustly? 1.Multiple regression 2.Analysis of designed experiments (AOV) 3.Analysis of covariance 4.Multivariate analysis These analyses can be carried out using the website:

There’s more: The rank based methods are 95% efficient relative to the least squares methods when the underlying model is normal. They may be much more efficient when the underlying model has heavier tails than a normal distribution. But time is up.

References: 1.Higgins (2004) Intro to Modern Nonpar Stat 2.Hollander and Wolfe (1999) Nonpar Stat Methods 3.Arnold Notes, Bendre Notes 4.Johnson, Morrell, and Schick (1992) Two-Sample Nonparametric Estimation and Confidence Intervals Under Truncation, Biometrics, 48, Staudte and Sheather (1990) Robust Estimation & Testing 6.Efron and Tibshirani (1993) An Introduction to the bootstrap 7.Wasserman (2006) All of Nonparametric Statistics 8.Website:

Thank you for listening!