Fewer permutations, more accurate P-values Theo A. Knijnenburg 1,*, Lodewyk F. A. Wessels 2, Marcel J. T. Reinders 3 and Ilya Shmulevich 1 1Institute for.

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

Sampling: Final and Initial Sample Size Determination
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Psych 5500/6500 The Sampling Distribution of the Mean Fall, 2008.
Probability distribution functions Normal distribution Lognormal distribution Mean, median and mode Tails Extreme value distributions.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Evaluation.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Differentially expressed genes
Evaluation.
Topic 2: Statistical Concepts and Market Returns
Evaluating Hypotheses
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Inference about a Mean Part II
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Continuous Random Variables and Probability Distributions
Market Risk VaR: Historical Simulation Approach
BCOR 1020 Business Statistics Lecture 18 – March 20, 2008.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Chapter 6 Random Error The Nature of Random Errors
Quantitative Methods – Week 7: Inductive Statistics II: Hypothesis Testing Roman Studer Nuffield College
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
AM Recitation 2/10/11.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Multiple testing in high- throughput biology Petter Mostad.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Determining Sample Size
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Traffic Modeling.
Topic 5 Statistical inference: point and interval estimate
Random Sampling, Point Estimation and Maximum Likelihood.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
LECTURE 25 THURSDAY, 19 NOVEMBER STA291 Fall
Bootstrap Event Study Tests Peter Westfall ISQS Dept. Joint work with Scott Hein, Finance.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
© Copyright McGraw-Hill 2004
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Sampling Theory and Some Important Sampling Distributions.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
Testing a Single Mean Module 16. Tests of Significance Confidence intervals are used to estimate a population parameter. Tests of Significance or Hypothesis.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Chapter 7: The Distribution of Sample Means
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Statistics for Business and Economics Module 1:Probability Theory and Statistical Inference Spring 2010 Lecture 8: Tests of significance and confidence.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Estimation and Confidence Intervals
ESTIMATION.
Non-Parametric Tests 12/1.
Non-Parametric Tests 12/6.
Non-Parametric Tests.
Presentation transcript:

Fewer permutations, more accurate P-values Theo A. Knijnenburg 1,*, Lodewyk F. A. Wessels 2, Marcel J. T. Reinders 3 and Ilya Shmulevich 1 1Institute for Systems Biology, Seattle, WA, USA, 2Bioinformatics and Statistics, The Netherlands Cancer Institute, Amsterdam and 3Information and Communication Theory Group, Delft University of Technology, Delft, The Netherlands Bioinformatics (12):i161-i168

How to obtain accurate p-values with fewer permutations? P-value of a permutation test is a probability of obtaining a result at least as extreme as the test statistic, given that the null hypothesis is true. Null hypothesis: labels assigning samples to classes are interchangeable. Normal conditionsStress conditions Array 1Array 2Array 3Array 4Array 5Array 6Array 7Array 8T gene gene gene gene gene

How to obtain accurate p-values with fewer permutations? P-value of a permutation test is a probability of obtaining a result at least as extreme as the test statistic, given that the null hypothesis is true. Null hypothesis: labels assigning samples to classes are interchangeable. Normal conditionsStress conditions Array 8Array 4Array 6Array 3Array 5Array 1Array 7Array 2T.perm gene gene gene gene gene For each test (gene) The P-value is assessed by performing all possible permutations and computing the fraction of permutation values that are at least as extreme as the test statistic obtained from the unpermuted data. In practice, because performing all permutations may be infeaseable, only subset of Nall is computed.

Problems: -permutation-obtained p-values need N permutations to achieve 1/N accuracy -smallest achievable p-value is 1/N 6 samples, 2 conditions -> N=20 p min = multiple tests adjustment of p-values leads to even bigger (less meaningful) p-values in most conservative adjustment p adj =p*N tests -large number of permutations may be computationally intensive, infeasible or impossible

Authors propose to estimate the small P-values from permutation test using extreme value theory (Gumbel, 1958). The set of extreme (very large or very small) permutation values that forms the tail of the distribution of permutation values is modeled as a generalized Pareto distribution (GPD).1958 z - exceedances, z i = t i - t 0 a - the scale parameter k - the shape parameter Maximum likelihood (ML) estimation is employed to estimate a and k given Z. k 1 impossible Original distribution of statisticGeneral Pareto distribution fitted to the extreme values

(a) From the PDF of the F distribution, 5000 samples are drawn. Samples that exceed 5 are defined as the exceedances and are modeled using a GPD. The GPD approximation of the tail (scaled to the interval [(1–Nexc/N), 1] is depicted alongside the theoretical CDF. (b) The theoretical P-value, which is derived from the CDF of the F distribution (Pf) is compared with the ECDF approximation (Pecdf) and the GPD approximation (Pgpd) for values of x 0 >5. GPD tail approximation of an F distribution. PDF-Probability Density Function; GPD-Generalized Pareto Distribution; CDF- Cumulative Distribution Function; ECDF-Empirical Cumulative Distribution Function

Selection of the threshold Too low – it is not an extreme value, can’t be modeled by GPD Too high – only few samples available, large errors in estimates -perform a certain number of permutations -treat 250 most extreme permutation values as exceedances -perform goodness-of-fit test to assess if these 250 values follow GPD -if not, decrease number of exceedances iteratively by 10 until good fit to GPD is reached

When to use GPD Can only be used when the test statistic is extreme, i.e. in the tail of distribution. If, say, 50 out of the 100 permutation values exceed the test statistic, then GPD tail approximation is useless and standard empirical method is adequate. Criterion: if the value >= test statistic appears at least 10 times among permutation values, compute standard empirical p-value; otherwise use GPD This is because: -Number of extreme permutation values M follows a binomial distribution (Bernoulli trials with probability p perm of success) -according to central limit theorem, if M>=10, binomial distribution of M can be approximated by a normal distribution M=P ecdf *N

Minimum number of permutations (N c ) required for convergence to the correct P-value Results on 7 theoretical distributions Light-tailedHeavy-tailed -always fewer permutations required for GPD -difference between methods bigger for smaller p-values and for distributions with heavier tail Theoretical permutation test P-value obtained by evaluating the CDF at the value of the test statistic ECDF GPD

Pecdf and Pgpd for an F distribution The ECDF approximation converges to the correct P-value linearly with the number of permutations, N. GPD approximation converges with far fewer permutations. a decent estimate of Pperm is obtained with 10 4 permutation values However, when N >> 1/Pperm, there is a lot of variability in Pgpd

Application to differential gene expression analysis Chose 132 relevant genes, computed t-statistic for differential expression Did permutations until M>25 (or N=10^9), then estimated *true* p-perm from permutations Computed Pecdf and Pgpd for different N until N=10^5. Repeated 200 times

Application to differential gene expression analysis transformed the test statistic and its permutation values, such that k<0, i.e. the tail becomes more heavy. raised all test statistics and corresponding permutation values to the power three better estimate with much less variance Now a reasonable estimate of P-values <10 –7 can be made using only 10^5 permutations.

Application to GSEA Comparison of Cecdf and Cgpd for different values of N T-statistics for differentially expressed gene sets Pperm - generate permutation values until M becomes >25 (no more than 10^6) Chose 89 gene sets for which M>25 within the 10 6 permutations and which had a Pperm <0.01 compare the correctly ordered list of 89 gene sets based on Pperm with the ordered lists based on Pgpd and Pecdf (Spearman rank correlation)

where P est (N) is the estimated P-value (either P ecdf or P gpd ) after N permutations, P  est (N) is the  % confidence bound on the estimated P-value. N c is the minimum amount of permutations at which these criteria are met. The convergence criteria developed to suit practical applications: 1. convergence: little variation of P est with increasing N from Nc/10 to Nc 2. accuracy; the 25th–75th confidence bounds of the P-value estimate deviate <10% from P est (criterion only for GPD)

The 25th and 75th percentile values of N c and the corresponding P-value estimates (P ecdf and P gpd ) for five different genes Using these convergence criteria on the 5 exemplary genes (so that range of p-values is big) from the differential gene expression analysis Number of permutations was increased until convergence criteria were met (or reached 10^6) Repeated 25 times Examined effect of order-retaining transformations on the test statistic and its permutation values. Application to differential gene expression analysis Z - power to which statistic is raised

Usually, same number of permutations is performed for each test statistic Different test statistics require different numbers of permutations In most applications, large majority of test statistics will require only a small number of permutations to reliably compute their large (and hence, insignificant) P-values while only a small fraction of the test statistics will be significant, i.e. they will require a lot of permutations to reliably estimate their small P-values. Simple convergence criteria and confidence bounds on the estimate can be used to indicate when enough permutations have been performed to have certain statistical confidence in the P-value estimate. Such an approach can lead to a decrease in the total number of permutations, and thus computational time while producing more accurate P-value estimates. Web interface for the proposed method is under development. Summary

Fewer permutations, more accurate P-values Theo A. Knijnenburg 1,*, Lodewyk F. A. Wessels 2, Marcel J. T. Reinders 3 and Ilya Shmulevich 1 1Institute for Systems Biology, Seattle, WA, USA, 2Bioinformatics and Statistics, The Netherlands Cancer Institute, Amsterdam and 3Information and Communication Theory Group, Delft University of Technology, Delft, The Netherlands Bioinformatics (12):i161-i168