Is cross-fertilization good or bad?: An analysis of Darwin’s Zea Mays Data By Jamie Chatman and Charlotte Hsieh.

Slides:



Advertisements
Similar presentations
Chapter 25: Paired Samples and Blocks. Paired Data Paired data arise in a number of ways. Compare subjects with themselves before and after treatment.
Advertisements

October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
Chapter 15 Comparing Two Populations: Dependent samples.
Is it statistically significant?
Dealing With Statistical Uncertainty
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Biomedical Presentation Name: 牟汝振 Teach Professor: 蔡章仁.
Confidence Interval and Hypothesis Testing for:
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics.
Elementary hypothesis testing
12.5 Differences between Means (s’s known)
Sample size computations Petter Mostad
Resampling techniques
Lesson #25 Nonparametric Tests for a Single Population.
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Topic 2: Statistical Concepts and Market Returns
Statistics 07 Nonparametric Hypothesis Testing. Parametric testing such as Z test, t test and F test is suitable for the test of range variables or ratio.
Lecture 9: One Way ANOVA Between Subjects
EXPERIMENTAL DESIGN Random assignment Who gets assigned to what? How does it work What are limits to its efficacy?
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
5-3 Inference on the Means of Two Populations, Variances Unknown
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Non-parametric statistics
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Nonparametrics and goodness of fit Petter Mostad
Chapter 15 Nonparametric Statistics
Statistical Methods II
Power and Sample Size IF IF the null hypothesis H 0 : μ = μ 0 is true, then we should expect a random sample mean to lie in its “acceptance region” with.
Choosing Statistical Procedures
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
AM Recitation 2/10/11.
Experimental Statistics - week 2
Statistical Computing
Introduction to resampling in MATLAB Feb Austin Hilliard.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
NONPARAMETRIC STATISTICS
Two Sample Tests Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
CHAPTER 14: Nonparametric Methods
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
The Scientific Method Formulation of an H ypothesis P lanning an experiment to objectively test the hypothesis Careful observation and collection of D.
What are Nonparametric Statistics? In all of the preceding chapters we have focused on testing and estimating parameters associated with distributions.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Fall 2002Biostat Nonparametric Tests Nonparametric tests are useful when normality or the CLT can not be used. Nonparametric tests base inference.
Nonparametric Statistics
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
AP Statistics Chapter 24 Comparing Means.
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Hypothesis Testing Errors. Hypothesis Testing Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean.
NON-PARAMETRIC STATISTICS
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
ISMT253a Tutorial 1 By Kris PAN Skewness:  a measure of the asymmetry of the probability distribution of a real-valued random variable 
Introduction to Basic Statistical Methods Part 1: Statistics in a Nutshell UWHC Scholarly Forum May 21, 2014 Ismor Fischer, Ph.D. UW Dept of Statistics.
Midterm. T/F (a) False—step function (b) False, F n (x)~Bin(n,F(x)) so Inverting and estimating the standard error we see that a factor of n -1/2 is missing.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means II: Nonparametric techniques.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Testing Differences in Means (t-tests) Dr. Richard Jackson © Mercer University 2005 All Rights Reserved.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Bootstrapping and Randomization Techniques Q560: Experimental Methods in Cognitive Science Lecture 15.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. CHAPTER 14: Nonparametric Methods to accompany Introduction to Business Statistics fifth.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture Slides Elementary Statistics Twelfth Edition
Lecture Slides Elementary Statistics Twelfth Edition
Presentation transcript:

Is cross-fertilization good or bad?: An analysis of Darwin’s Zea Mays Data By Jamie Chatman and Charlotte Hsieh

Outline Short biography of Charles Darwin and Ronald Fisher Description of the Zea Mays data Analysis of the data  Parametric tests (t-test, confidence intervals)  Nonparametric test (i.e. Wilcoxon signed rank)  Bootstrap tests Conclusion

Short Biography of Charles Darwin Darwin was born in 1809 in Shrewsbury, England At 16 went to Edinburgh University to study medicine, but did not finish He went to Cambridge University, where he received his degree studying to become a clergyman. Darwin worked as an unpaid naturalist on a five-year scientific expedition to South America Darwin’s research led to his book, On the Origin of Species by Means of Natural Selection, published in

Short Biography of Ronald Fisher Fisher was born in East Finchley, London in Fisher went to Cambridge University and received a degree in mathematics. Fisher made many discoveries in statistics including maximum likelihood, analysis of variance, sufficiency, and was a pioneer for design of experiments

Darwin’s Zea Mays Data

Hypothesis Null Hypothesis:  H o : There is no difference in stalk height between the cross-fertilized and self-fertilized plants. Alternative Hypothesis:  H A : Cross-fertilized stalk heights are not equal to self-fertilized heights  H A : Cross-fertilization leads to increased stalk height

Galton’s Approach to the Data CrossedSelf-Fert. Pot I Pot II Pot III Pot IV Original Data CrossedSelf-Fert.Difference Galton’s Approach

Parametric Test Fisher made an assumption that the stalk heights were normally distributed  Crossed: X ~  Self-fertilized Y~  Difference: X-Y=d ~  p-value :  Reject the null hypothesis that at the.05 level d.f.= 14

Parametric Test 95% confidence interval Since zero is not in the interval, the null hypothesis that the differences =0, (or that the means) are equal is rejected

Fisher’s Non-Parametric Approach If H o is true, and the heights of the crossed and self- fertilized are equal, then there should be an equal chance that each one of the pairs came from the self-fert. or the crossed  If we look at all possible swaps in each pair there are 2 15 = 32,768 possibilities  The sum of the differences is  But only 863 of these cases have sums of the difference as great as So the null hypothesis would be rejected at the level

Fisher’s Nonparametric Approach The results of the nonparametric test agreed with the results of the t-test Fisher was happy with this However, Fisher believed that removing the assumption of normality in the nonparametric test would result in a less powerful test than the t-test “[Nonparametric tests] assume less knowledge, or more ignorance, of the experimental material than does the standard test…” We disagree

Non-Parametric Test Wilcoxon Signed Rank Test Diff Diff.Rank

Non-Parametric Test Wilcoxon Signed Rank Test  When n is large W~N(0, Var(W))  This gives a p-value of Thus we reject the null hypothesis.

Bootstrap Methods Introduced by Bradley Efron (1979)  44 years after Fisher’s analysis  "If statistics had evolved at a time when computers existed, it wouldn't be what it is today (Efron)." Uses repeated re-samples of the data Allows the use of computer sampling approaches that are asymptotically equivalent to tests where exact significance levels require complicated manipulations A sampling simulation approximation to Fisher’s nonparametric approach The data “pull themselves up by their own bootstraps” by generating new data sets through which their reliability can be determined.

Bootstrap: Random Sign Change If H o is true, there is an equal chance that the plants in each pair are cross-fertilized or self- fertilized Method:  1. Randomly shift from cross to self-fertilized in each pair  2. Compute sum of differences  3. Repeat 5,000 times  4. Plot histogram of summed differences  5. Find the number of summed differences > 39.25

Bootstrap: Random Sign Change Results 124/5000 are > The p-value is 2*(124/5000)= Compare to exact combinatorial p-value of

Bootstrap: Resample Within Pots Experimenters will tend to present data in such a way as to get significant results In order to be sure that pairings in each pot are random, we can resample within pots We assume equality of heights in each pot Method:  1. Sample 3 crossed plants in pot 1 with replacement  2. Sample 3 self-fert. plants in pot 1 with replacement  3. Repeat for pots 2-4  4. Compute sum of differences  5. Repeat 5,000 times  6. Plot histogram of summed differences  5. Find the number of summed differences <0

Bootstrap: Resample Within Pots Results 27/5000 are <0 The p-value is 2*(27/5000)=0.0108

Resampling-Based Sign Test Disregard size of difference and look only at the sign of the difference If H o is true, the probability of any difference being positive or negative is 0.5, and we can use a binomial approach, where we would expect half out of 15 pairs to have a positive difference and half to have a negative difference We can count the number of positive differences in resampled pairs of size 15 Method:  1. Sample 3 crossed plants in pot 1 with replacement  2. Sample 3 self-fert. plants in pot 1 with replacement  3. Repeat for pots 2-4  4. Count the number of positive differences  5. Repeat 5,000 times

Resampling-Based Sign Test Results Almost every time out of 5,000, we get over 8 positive differences out of 15. #pos diff < 6: 0/5000 #pos diff < 8: 2/5000 p-value is essentially 0

Randomization Within Pots Disregard information about cross or self-fertilized Find the distribution of summed differences by resampling from pooled data Method:  1. Pool plants in pot 1  2. Sample 3 plants from the pool w/replacement, treat as crossed  3. Sample 3 plants from the pool w/replacement, treat as self-fert.  4. Repeat for pots 2-4  5. Compute sum of differences  6. Repeat 5,000 times  7. Plot histogram of summed differences (=distribution of null hypothesis)  8. Find the number of summed differences >39.25

Randomization Within Pots Results 38/5000 are >39.25 The p-value is 2*(38/5000)=

Resampling Approach to Confidence Intervals Using Darwin’s original differences:  1. Sample 15 differences with replacement  2. Compute the sum of differences  3. Repeat 5,000 times  4. Plot histogram of summed differences  5. Take 125 th and 4875 th summed difference  Divide by sample size = 15 We get 95% CI: (0.1749, 4.817), which is shorter than the t-interval (.0036, 5.230)

Resampling Approach to Confidence Intervals In the resampling approaches, “95% of the resampled average differences were between and ” This is not equivalent to the t- procedure, where “with probability 95%, the true value of the difference estimate lies between and ”

Conclusion We can conclude from our tests that cross- fertilization leads to increased stalk heights Despite Fisher’s concerns that removing normality assumptions was less intelligible than the t-test, nonparametric resampling- based methods are powerful and efficient

Is there anything else to consider? Not using randomization, which might lead to environmental advantages and disadvantages  Soil conditions or fertility  Lighting  Air currents  Irrigation/evaporation

References Fisher, R.A.(1935). The Design of Experiments. Edinburgh: Oliver & Boyd, Thompson, J.R.(2000). Simulation: A Modeler’s Approach. New York: Wiley-International Publication,