Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:

Slides:



Advertisements
Similar presentations
Nonparametric Statistics Timothy C. Bates
Advertisements

Correlation Mechanics. Covariance The variance shared by two variables When X and Y move in the same direction (i.e. their deviations from the mean are.
AP Statistics Section 10.2 A CI for Population Mean When is Unknown.
Copyright © 2010 Pearson Education, Inc. Chapter 24 Comparing Means.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Non-parametric statistics
Bootstrapping applied to t-tests
7.1 Lecture 10/29.
Standard Error of the Mean
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Chapter 14: Nonparametric Statistics
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 23, Slide 1 Chapter 23 Comparing Means.
1 Psych 5500/6500 Data Transformations Fall, 2008.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 24 Comparing Means.
The Robust Approach Dealing with real data. Review With regular analyses we have certain assumptions that are made, or requirements that have to be met.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
One-sample In the previous cases we had one sample and were comparing its mean to a hypothesized population mean However in many situations we will use.
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Nonparametric Statistical Methods: Overview and Examples ETM 568 ISE 468 Spring 2015 Dr. Joan Burtner.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
The Robust Approach Dealing with real data. Estimating Population Parameters Four properties are considered desirable in a population estimator:  Sufficiency.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
CHI SQUARE TESTS.
AP Statistics Chapter 24 Comparing Means.
Robust Estimators.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Comparing Means Chapter 24. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Modern Approaches The Bootstrap with Inferential Example.
+ Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population Mean.
 Kolmogor-Smirnov test  Mann-Whitney U test  Wilcoxon test  Kruskal-Wallis  Friedman test  Cochran Q test.
Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric methods.
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 8: Estimating with Confidence
Chapter 23 Comparing Means.
Sampling distribution
Non-Parametric Tests 12/1.
Non-Parametric Tests 12/6.
Parametric vs Non-Parametric
Non-Parametric Tests.
Chapter 8: Estimating with Confidence
Nonparametric Statistical Methods: Overview and Examples
Test for Mean of a Non-Normal Population – small n
Nonparametric Statistics Overview
Nonparametric Statistical Methods: Overview and Examples
Some Nonparametric Methods
Nonparametric Statistical Methods: Overview and Examples
Nonparametric Statistical Methods: Overview and Examples
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Chapter 8: Estimating with Confidence
Estimating with Confidence
Chapter 8: Estimating with Confidence
Non – Parametric Test Dr. Anshul Singh Thapa.
Chapter 24 Comparing Means Copyright © 2009 Pearson Education, Inc.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
2/5/ Estimating a Population Mean.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Presentation transcript:

Non-parametric Approaches The Bootstrap

Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties: No assumption about the underlying distribution being normal More sensitive to medians than means (which is good if you’re interested in the median) Some may not be very affected by outliers Rank tests

Parametric vs. Non-parametric Parametric tests will typically be used when assumptions are met as they will usually have more power Though the non-parametric tests might be able to match that power under certain conditions Non-parametric tests are often used with small samples or violations of assumptions

Some commonly used nonparametric analyses Chi-Square Chi-Square analysis involves categorical variables/frequency data only Example: Party: Republican Democrat Vote: Yes No In this case, we cannot meet the assumptions common to our typical tests, but the goal would still be to understand the relationship between the variables involved Chi-square analysis examines such relationships regarding frequencies of cells, and we can still get measures of the strength of the association See effect size handout

Common Rank tests Wilcoxon t for independent and dependent samples, Mann- Whitney U Kruskal-Wallis, Friedman for more than 2 groups Basic procedure Rank the DV and get sums of the ranks for the groups Construct a test statistic based on the ranked data Advantage: Normality not necessary Insensitive to outliers Disadvantage: Ranked data is not in original units and so therefore may be less interpretable May lack power, particularly when parametric assumptions hold

Transformation of data So if I don’t like my data I just change it? Think about what you’re studying Is depression a function of Likert scale questions? Is reaction time inherently related to learning? Tukey: “reexpressions” Our original numbers are already useful fictions, and if we think of them as such, transforming them into something else may not seem so far-fetched

Logarithmic: Positively skewed Square root: Count data e.g. Reciprocal (1/x): When there are very extreme outliers Arcsine: Proportional data e.g. Other measures of location: Heavy tailed data e.g. Trimmed mean Some common transformations

When to transform? Not something to think about doing straight away at any little sign of trouble Even if your groups are skewed in a similar manner parametric tests may hold “Shop around” Try different transformations to see if one works better for your problem regarding the distribution of values (but not just to get a sig p-value)

Note Transformations will not necessarily solve problems with outliers Also, if inferences are based on e.g. the mean of the transformed data, we cannot simply transform the values back to the original and act as though the inferences still hold (e.g. for μ) In the end, we’d rather keep our data in original units and those transformations should be a last resort

More recent developments The Bootstrap The basic idea involves sampling with replacement from the sample data to produce random samples of size n Each of these samples provides an estimate of the parameter of interest Repeating the sampling a large number of times provides information on the variability of the estimate i.e. its standard error Necessary for any inferential test

TV Example How many hours of TV watched yesterday

Bootstrap 1000 samples Distribution of Means of each sample  Mean = 3.951

How? Two Examples #basic R function boot=(1:1000) for (i in boot) boot[i]=mean(sample(TVdata, replace=T)) mean(boot) hist(boot, col="blue", border="red") #uses the bootstrap package library(bootstrap) bootmean=bootstrap(TVdata, theta=mean, nboot=1000) mean(bootmean$thetastar) hist(bootmean$thetastar)

Bootstrap Hypothetical situation: If we cannot assume normality, how would we go about getting a confidence interval for a particular statistic? How would you get a confidence interval for robust measures and other statistics? Solution: Resample (with replacement) from our own data based on its distribution Treat our sample distribution as a population distribution and take random samples from it So what we have done is, instead of assuming some sampling distribution of a particular shape and size, we’ve created it ourselves and derived our interval estimate from it From this we can create confidence intervals and perform other inferential procedures

Hypothesis Testing Comparing independent groups Step 1 compute the bootstrap mean and bootstrap sd as before, but for each group Each time you do so, calculate T* This creates your own t distribution.

Hypothesis Testing Use the quantile points corresponding to your confidence level from it in computing your confidence interval on the difference betweens, rather than the t cv from typical distributions Note however that your T* will not be the same for the upper and lower bounds Unless your bootstrap distribution was perfectly symmetrical Not likely to happen

So why use? Accuracy and control of type I error rate Most of the problems associated with both accuracy and maintenance of type I error rate are reduced using bootstrap methods compared to Student’s t Wilcox goes further to suggest that there may be in fact very few situations, if any, in which the traditional approach offers any advantage over the bootstrap approach The problem of outliers and the basic statistical properties of means and variances as remain however