Q-Vals (and False Discovery Rates) Made Easy Dennis Shasha Based on the paper "Statistical significance for genomewide studies" by John Storey and Robert.

Slides:



Advertisements
Similar presentations
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
Advertisements

Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.
Hypothesis Testing making decisions using sample data.
1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
AP Statistics – Chapter 9 Test Review
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Hypothesis Testing.
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
Business 205. Review Sampling Continuous Random Variables Central Limit Theorem Z-test.
Business Statistics - QBM117
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Differentially expressed genes
Hypothesis Testing Lecture 4. Examples of various hypotheses The sodium content in Furresøen is x Sodium content in Furresøen is equal to the content.
The Basics of Regression continued
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
Inference about a Mean Part II
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Ch. 9 Fundamental of Hypothesis Testing
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Determining Statistical Significance
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
Hypothesis Testing:.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Hypothesis testing is used to make decisions concerning the value of a parameter.
Hypothesis Testing.
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
Overview Basics of Hypothesis Testing
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
AP STATISTICS LESSON 10 – 2 DAY 1 TEST OF SIGNIFICANCE.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Confidence intervals and hypothesis testing Petter Mostad
Chapter 8 Introduction to Hypothesis Testing ©. Chapter 8 - Chapter Outcomes After studying the material in this chapter, you should be able to: 4 Formulate.
Large sample CI for μ Small sample CI for μ Large sample CI for p
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Multiple Testing Matthew Kowgier. Multiple Testing In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Statistical Testing with Genes Saurabh Sinha CS 466.
Hypothesis Testing Lecture 3. Examples of various hypotheses Average salary in Copenhagen is larger than in Bælum Sodium content in Furresøen is equal.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Hypothesis Testing Errors. Hypothesis Testing Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Formulating the Hypothesis null hypothesis 4 The null hypothesis is a statement about the population value that will be tested. null hypothesis 4 The null.
Statistical Techniques
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Inference as Design Target Goal: I can calculate and interpret a type I and type II error. 9.1c h.w: pg 547: 15, 19, 21.
Type I and Type II Errors. For type I and type II errors, we must know the null and alternate hypotheses. H 0 : µ = 40 The mean of the population is 40.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
1 Section 8.2 Basics of Hypothesis Testing Objective For a population parameter (p, µ, σ) we wish to test whether a predicted value is close to the actual.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
A.P. STATISTICS EXAM REVIEW TOPIC #2 Tests of Significance and Confidence Intervals for Means and Proportions Chapters
Tests of hypothesis Statistical hypothesis definition: A statistical hypothesis is an assertion or conjecture on or more population.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Learning Objectives Describe the hypothesis testing process Distinguish the types of hypotheses Explain hypothesis testing errors Solve hypothesis testing.
Review and Preview and Basics of Hypothesis Testing
Q-Vals (and False Discovery Rates) Made Easy
Q-Vals (and False Discovery Rates) Made Easy
Chapter 11: Introduction to Hypothesis Testing Lecture 5c
More About Tests Notes from
AP STATISTICS LESSON 10 – 4 (DAY 2)
Presentation transcript:

Q-Vals (and False Discovery Rates) Made Easy Dennis Shasha Based on the paper "Statistical significance for genomewide studies" by John Storey and Robert Tibshirani PNAS August 5,

Challenge You test plants/patients/… in two settings (or from different populations). You want to know which genes are differentially expressed (alternate) You don’t want to make too many mistakes (declaring a gene to be alternate when in fact it’s null – not differentially expressed).

First Idea You take p-vals of the differences in expression. P-val(g) is the probability that if g is null, it would have a difference at least this large. You choose a cutoff, say You say all genes that differ with p-val <= 0.05 are truly different. What’s the problem?

Thought Experiment Suppose that no genes are truly differentially expressed. You will conclude that about 5% of those you called significant really are. Your false discovery rate (number null among those predicted to be alternate/number predicted to be alternate) = 100%. Bad.

A Fundamental Insight All truly null genes (i.e. not truly differentially expressed) are equally likely to have any p-val. That is by construction of p-val: under the null hypothesis, 1% of the genes will be in the top 1 percentile, 1% will be in percentile between 89 and 90 th and so on. P-val is just a way of saying percentile in null condition.

What Do We Do With That? Mixture model: imagine null genes as light blue marbles and truly different genes as red ones. If the assay is decent, red marbles should be concentrated at the low p-values.

0 …. Pval …………………………………………………1

Method We Can Use We don’t of course know the colors of the marbles/we don’t know which genes are true alternates. However, we know that null marbles are equally likely to have any p-value. So, at the p-value where the height of the marbles levels off, we have primarily light blue marbles/null genes. Why?

0 …. Pval …………………………………………………1 Flat region starts here Level of flat region

Answer Because if all genes/marbles were null, the heights would be about uniform. Provided the reds are concentrated near the low p-vals, the flat regions will be primarily light blues.

Example: all null Consider the all null case. All marbles are light blue. False discovery rate in region to left of flat region is estimated number of white marbles (based on flat region)/number of marbles to left of flat region. This will be close to 100%

0 …. Pval …………………………………………………1 Flat region starts here Level of flat region

Example: all non-null Consider the all non-null case. All marbles are red and they are highly skewed. Flat region is essentially zero. False discovery rate in region to left of flat region is estimated number of white marbles (based on flat region)/number of marbles to left of flat region. This will be close to 0.

0 …. Pval …………………………………………………1 Flat region starts here

Example: mixed case Get a distribution of p-values. Find flat region. Estimate number of nulls in the left-of-flat region by extending the flat line. This gives the false discovery rate.

0 …. Pval ……………………………………………1 Flat line; base level of nulls Number of genes having pval Possible p-value threshold

Example: mixed case What would you estimate the false discovery rate to be in the case that we declare the entire area to the left of the possible p-value threshold to be significant? 10%, 25%, 50%?

0 …. Pval ……………………………………………1 Flat line; base level of nulls Number of genes having pval Possible p-value threshold

Obtaining q-values from False Discovery Rate Suppose we order genes from least p- value to greatest. That corresponds to one of these cartesian graphs. The q-value of a gene having p-value p is exactly the False Discovery Rate if the declared significance region had a threshold of p.

0 …. Pval ……………………………………………1 Flat line; base level of nulls Number of genes having pval Q-value of a gene having this p-val is the FDR if this is the significance threshold.

Lessons for Research Mushy p-values (large error bars/few replicates) may force us to the far left in order to get a low False Discovery Rate. This may eliminate genes of interest. If testing out a gene is not too expensive, then we can accept a higher False Discovery Rate – nothing magical about 0.01.

0 …. Pval ……………………………………………1 Flat line; base level of nulls Number of genes having pval Better p-values avoid loss of genes, for small FalseDiscovery Rate.