Download presentation
Presentation is loading. Please wait.
Published byGodwin Dorsey Modified over 8 years ago
1
Non-parametric Approaches The Bootstrap
2
Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties: No assumption about the underlying distribution being normal More sensitive to medians than means (which is good if you’re interested in the median) Some may not be very affected by outliers Rank tests
3
Parametric vs. Non-parametric Parametric tests will typically be used when assumptions are met as they will usually have more power Though the non-parametric tests might be able to match that power under certain conditions Non-parametric tests are often used with small samples or violations of assumptions
4
Some commonly used nonparametric analyses Chi-Square Chi-Square analysis involves categorical variables/frequency data only Example: Party: Republican Democrat Vote: Yes No In this case, we cannot meet the assumptions common to our typical tests, but the goal would still be to understand the relationship between the variables involved Chi-square analysis examines such relationships regarding frequencies of cells, and we can still get measures of the strength of the association See effect size handout
5
Common Rank tests Wilcoxon t for independent and dependent samples, Mann- Whitney U Kruskal-Wallis, Friedman for more than 2 groups Basic procedure Rank the DV and get sums of the ranks for the groups Construct a test statistic based on the ranked data Advantage: Normality not necessary Insensitive to outliers Disadvantage: Ranked data is not in original units and so therefore may be less interpretable May lack power, particularly when parametric assumptions hold
6
Transformation of data So if I don’t like my data I just change it? Think about what you’re studying Is depression a function of Likert scale questions? Is reaction time inherently related to learning? Tukey: “reexpressions” Our original numbers are already useful fictions, and if we think of them as such, transforming them into something else may not seem so far-fetched
7
Logarithmic: Positively skewed Square root: Count data e.g. Reciprocal (1/x): When there are very extreme outliers Arcsine: Proportional data e.g. Other measures of location: Heavy tailed data e.g. Trimmed mean Some common transformations
8
When to transform? Not something to think about doing straight away at any little sign of trouble Even if your groups are skewed in a similar manner parametric tests may hold “Shop around” Try different transformations to see if one works better for your problem regarding the distribution of values (but not just to get a sig p-value)
9
Note Transformations will not necessarily solve problems with outliers Also, if inferences are based on e.g. the mean of the transformed data, we cannot simply transform the values back to the original and act as though the inferences still hold (e.g. for μ) In the end, we’d rather keep our data in original units and those transformations should be a last resort
10
More recent developments The Bootstrap The basic idea involves sampling with replacement from the sample data to produce random samples of size n Each of these samples provides an estimate of the parameter of interest Repeating the sampling a large number of times provides information on the variability of the estimate i.e. its standard error Necessary for any inferential test
11
TV Example How many hours of TV watched yesterday
12
Bootstrap 1000 samples Distribution of Means of each sample Mean = 3.951
13
How? Two Examples #basic R function boot=(1:1000) for (i in boot) boot[i]=mean(sample(TVdata, replace=T)) mean(boot) hist(boot, col="blue", border="red") #uses the bootstrap package library(bootstrap) bootmean=bootstrap(TVdata, theta=mean, nboot=1000) mean(bootmean$thetastar) hist(bootmean$thetastar)
14
Bootstrap Hypothetical situation: If we cannot assume normality, how would we go about getting a confidence interval for a particular statistic? How would you get a confidence interval for robust measures and other statistics? Solution: Resample (with replacement) from our own data based on its distribution Treat our sample distribution as a population distribution and take random samples from it So what we have done is, instead of assuming some sampling distribution of a particular shape and size, we’ve created it ourselves and derived our interval estimate from it From this we can create confidence intervals and perform other inferential procedures
15
Hypothesis Testing Comparing independent groups Step 1 compute the bootstrap mean and bootstrap sd as before, but for each group Each time you do so, calculate T* This creates your own t distribution.
16
Hypothesis Testing Use the quantile points corresponding to your confidence level from it in computing your confidence interval on the difference betweens, rather than the t cv from typical distributions Note however that your T* will not be the same for the upper and lower bounds Unless your bootstrap distribution was perfectly symmetrical Not likely to happen
17
So why use? Accuracy and control of type I error rate Most of the problems associated with both accuracy and maintenance of type I error rate are reduced using bootstrap methods compared to Student’s t Wilcox goes further to suggest that there may be in fact very few situations, if any, in which the traditional approach offers any advantage over the bootstrap approach The problem of outliers and the basic statistical properties of means and variances as remain however
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.