Download presentation
Presentation is loading. Please wait.
1
Q-Vals (and False Discovery Rates) Made Easy
Dennis Shasha Based on the paper "Statistical significance for genomewide studies" by John Storey and Robert Tibshirani PNAS August 5,
2
Challenge You test plants/patients/… in two settings (or from different populations). You want to know which genes are differentially expressed (alternate) You don’t want to make too many mistakes (declaring a gene to be alternate when in fact it’s null – not differentially expressed).
3
First Idea You take p-vals of the differences in expression.
P-val(g) is the probability that if g is unaffected by the treatment, it would have a difference at least this large. You choose a cutoff, say 0.05. You say all genes that differ with p-val <= 0.05 are truly different. What’s the problem?
4
Thought Experiment Suppose we do a treatment that has no true effect.
p-val of 5% is prob one gene changes by certain amount; so approx 5% of the genes will change that much Your false discovery rate (number null among those predicted to be alternate/number predicted to be alternate) = 100%. Bad.
5
Second Example You take 10,000 fair coins and give each one a different color. You flip each one 17 times. Approximately 10 of them get 15 heads and another 10 get 15 tails. The p-value of each is Declaring those colors to have a significant reaction would be folly.
6
What Do We Do With That? Mixture model: imagine null (unchanging) genes as light blue marbles and truly different genes as red ones. If the assay is decent, red marbles should be concentrated at the low p-values.
7
0 …. Pval …………………………………………………1
8
Method We Can Use We don’t of course know the colors of the marbles/we don’t know which genes are true alternates. However, we know that null marbles are equally likely to have any p-value. So, at the p-value where the height of the marbles levels off, we have primarily light blue marbles/null genes. Why?
9
Flat region starts here
Level of flat region 0 …. Pval …………………………………………………1
10
Answer Because if all genes/marbles were null, the heights would be about uniform. So, the flat regions will be primarily light blues (not differentially expressed).
11
Example: all null Consider the all null case.
All marbles are light blue. False discovery rate in region to left of flat region is estimated number of white marbles (based on flat region)/number of marbles to left of flat region. This will be close to 100%
12
Flat region starts here
Level of flat region 0 …. Pval …………………………………………………1
13
Example: all non-null Consider the all non-null case.
All marbles are red and they are highly skewed. Flat region is essentially zero. False discovery rate in region to left of flat region is estimated number of white marbles (based on flat region)/number of marbles to left of flat region. This will be close to 0.
14
Flat region starts here
0 …. Pval …………………………………………………1
15
Example: mixed case Get a distribution of p-values. Find flat region.
Estimate number of nulls in the left-of-flat region by extending the flat line. This gives the false discovery rate.
16
Number of genes having pval
Possible p-value threshold Flat line; base level of nulls 0 …. Pval ……………………………………………1
17
Example: mixed case What would you estimate the false discovery rate to be in the case that we declare the entire area to the left of the possible p-value threshold to be significant? 10%, 25%, 50%?
18
Number of genes having pval
Possible p-value threshold Flat line; base level of nulls 0 …. Pval ……………………………………………1
19
Obtaining q-values from False Discovery Rate
Suppose we order genes from least p-value to greatest. That corresponds to one of these cartesian graphs. The q-value of a gene having p-value p is exactly the False Discovery Rate if the declared significance region had a threshold of p.
20
Number of genes having pval
Q-value of a gene having this p-val is the FDR if this is the significance threshold. Flat line; base level of nulls 0 …. Pval ……………………………………………1
21
Lessons for Research Mushy p-values (large error bars/few replicates) may force us to the far left in order to get a low False Discovery Rate. This may eliminate genes of interest. If testing out a gene is not too expensive, then we can accept a higher False Discovery Rate – nothing magical about 0.01.
22
Number of genes having pval
Better p-values avoid loss of genes, for small FalseDiscovery Rate. Flat line; base level of nulls 0 …. Pval ……………………………………………1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.