Q-Vals (and False Discovery Rates) Made Easy

Slides:



Advertisements
Similar presentations
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
Advertisements

1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Differentially expressed genes
Hypothesis Testing Lecture 4. Examples of various hypotheses The sodium content in Furresøen is x Sodium content in Furresøen is equal to the content.
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
Inference about a Mean Part II
Statistical Inference Lab Three. Bernoulli to Normal Through Binomial One flip Fair coin Heads Tails Random Variable: k, # of heads p=0.5 1-p=0.5 For.
Comparing Many Group Means One Way Analysis of Variance.
Q-Vals (and False Discovery Rates) Made Easy Dennis Shasha Based on the paper "Statistical significance for genomewide studies" by John Storey and Robert.
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
Overview Basics of Hypothesis Testing
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
Large sample CI for μ Small sample CI for μ Large sample CI for p
Hypothesis Testing State the hypotheses. Formulate an analysis plan. Analyze sample data. Interpret the results.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Hypothesis Testing Lecture 3. Examples of various hypotheses Average salary in Copenhagen is larger than in Bælum Sodium content in Furresøen is equal.
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Hypothesis Testing. Central Limit Theorem Hypotheses and statistics are dependent upon this theorem.
1 Hypothesis Testing Basic Problem We are interested in deciding whether some data credits or discredits some “hypothesis” (often a statement about the.
Making Predictions with Theoretical Probability. Warm Up You flip a coin three times. 1.Create a tree diagram to find the sample space. 2.How many outcomes.
Type I and Type II Errors. For type I and type II errors, we must know the null and alternate hypotheses. H 0 : µ = 40 The mean of the population is 40.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Research methods. Recap: last session 1.Outline the difference between descriptive statistics and inferential statistics? 2.The null hypothesis predicts.
Inference for a Single Population Proportion (p)
Chi-Squared Χ2 Analysis
Step 1: Specify a null hypothesis
Multiple Testing Methods for the Analysis of Microarray Data
Blockbusters The aim for blue team is to get from one side to the other. The aim for white team is to get from the bottom to the top. Either team may answer.
Review and Preview and Basics of Hypothesis Testing
Inference and Tests of Hypotheses
Chapter 6 Making Sense of Statistical Significance: Decision Errors, Effect Size and Statistical Power Part 1: Sept. 24, 2013.
Statistical Testing with Genes
Q-Vals (and False Discovery Rates) Made Easy
Hypothesis Testing.
Inference for Proportions
Review Nine men and nine women are tested for their memory of a list of abstract nouns. The mean scores are Mmale = 15 and Mfemale = 17. The mean square.
Comparing Two Proportions
Hypothesis Testing: Hypotheses
Mixture Modeling of the Distribution of p-values from t-tests
Statistical Inference
Inferential Statistics
Week 11 Chapter 17. Testing Hypotheses about Proportions
Making Data-Based Decisions
Multiple Testing Methods for the Analysis of Gene Expression Data
Statistical Tests P Values.
= 4.802−1.3= 2.09(2.8)= Bell Work 8.84÷3.4= − 3 4 = Cronnelly.
Skill Review Unique has a bag of marbles. There are 4 colors of marbles: red, blue, yellow, and green. The table shows the frequencies of marbles after.
Slides by JOHN LOUCKS St. Edward’s University.
Statistics 300: Elementary Statistics
Hypothesis Testing A hypothesis is a claim or statement about the value of either a single population parameter or about the values of several population.
STATISTICS IN A NUTSHELL
Bell Work Cronnelly.
Chapter 11: Introduction to Hypothesis Testing Lecture 5c
Chapter 8 Hypothesis Tests
Confidence Intervals.
What determines Sex Ratio in Mammals?
Statistical Testing with Genes
Lecture 43 Section 14.1 – 14.3 Mon, Nov 28, 2005
Presentation transcript:

Q-Vals (and False Discovery Rates) Made Easy Dennis Shasha Based on the paper "Statistical significance for genomewide studies" by John Storey and Robert Tibshirani PNAS August 5, 2003 9440-9445

Challenge You test plants/patients/… in two settings (or from different populations). You want to know which genes are differentially expressed (alternate) You don’t want to make too many mistakes (declaring a gene to be alternate when in fact it’s null – not differentially expressed).

First Idea You take p-vals of the differences in expression. P-val(g) is the probability that if g is unaffected by the treatment, it would have a difference at least this large. You choose a cutoff, say 0.05. You say all genes that differ with p-val <= 0.05 are truly different. What’s the problem?

Thought Experiment Suppose we do a treatment that has no true effect. p-val of 5% is prob one gene changes by certain amount; so approx 5% of the genes will change that much Your false discovery rate (number null among those predicted to be alternate/number predicted to be alternate) = 100%. Bad.

Second Example You take 10,000 fair coins and give each one a different color. You flip each one 17 times. Approximately 10 of them get 15 heads and another 10 get 15 tails. The p-value of each is 0.001. Declaring those colors to have a significant reaction would be folly.

What Do We Do With That? Mixture model: imagine null (unchanging) genes as light blue marbles and truly different genes as red ones. If the assay is decent, red marbles should be concentrated at the low p-values.

0 …. Pval …………………………………………………1

Method We Can Use We don’t of course know the colors of the marbles/we don’t know which genes are true alternates. However, we know that null marbles are equally likely to have any p-value. So, at the p-value where the height of the marbles levels off, we have primarily light blue marbles/null genes. Why?

Flat region starts here Level of flat region 0 …. Pval …………………………………………………1

Answer Because if all genes/marbles were null, the heights would be about uniform. So, the flat regions will be primarily light blues (not differentially expressed).

Example: all null Consider the all null case. All marbles are light blue. False discovery rate in region to left of flat region is estimated number of white marbles (based on flat region)/number of marbles to left of flat region. This will be close to 100%

Flat region starts here Level of flat region 0 …. Pval …………………………………………………1

Example: all non-null Consider the all non-null case. All marbles are red and they are highly skewed. Flat region is essentially zero. False discovery rate in region to left of flat region is estimated number of white marbles (based on flat region)/number of marbles to left of flat region. This will be close to 0.

Flat region starts here 0 …. Pval …………………………………………………1

Example: mixed case Get a distribution of p-values. Find flat region. Estimate number of nulls in the left-of-flat region by extending the flat line. This gives the false discovery rate.

Number of genes having pval Possible p-value threshold Flat line; base level of nulls 0 …. Pval ……………………………………………1

Example: mixed case What would you estimate the false discovery rate to be in the case that we declare the entire area to the left of the possible p-value threshold to be significant? 10%, 25%, 50%?

Number of genes having pval Possible p-value threshold Flat line; base level of nulls 0 …. Pval ……………………………………………1

Obtaining q-values from False Discovery Rate Suppose we order genes from least p-value to greatest. That corresponds to one of these cartesian graphs. The q-value of a gene having p-value p is exactly the False Discovery Rate if the declared significance region had a threshold of p.

Number of genes having pval Q-value of a gene having this p-val is the FDR if this is the significance threshold. Flat line; base level of nulls 0 …. Pval ……………………………………………1

Lessons for Research Mushy p-values (large error bars/few replicates) may force us to the far left in order to get a low False Discovery Rate. This may eliminate genes of interest. If testing out a gene is not too expensive, then we can accept a higher False Discovery Rate – nothing magical about 0.01.

Number of genes having pval Better p-values avoid loss of genes, for small FalseDiscovery Rate. Flat line; base level of nulls 0 …. Pval ……………………………………………1