1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html

Slides:



Advertisements
Similar presentations
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
Advertisements

From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and Alternative Hypotheses Type I and Type II Errors Type I and Type II Errors.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
AP Statistics – Chapter 9 Test Review
Multiple testing adjustments European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov 2009 Tim Massingham,
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Differentially expressed genes
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
Statistical Analysis of Microarray Data
Hypothesis Testing Lecture 4. Examples of various hypotheses The sodium content in Furresøen is x Sodium content in Furresøen is equal to the content.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 8-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Inference about a Mean Part II
Ch. 9 Fundamental of Hypothesis Testing
Chapter 8 Introduction to Hypothesis Testing
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 8-1 TUTORIAL 6 Chapter 10 Hypothesis Testing.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Q-Vals (and False Discovery Rates) Made Easy Dennis Shasha Based on the paper "Statistical significance for genomewide studies" by John Storey and Robert.
False Discovery Rate (FDR) = proportion of false positive results out of all positive results (positive result = statistically significant result) Ladislav.
Hypothesis Testing:.
Multiple testing correction
Multiple testing in high- throughput biology Petter Mostad.
Confidence Intervals and Hypothesis Testing - II
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Hypothesis Testing.
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap th Lesson Introduction to Hypothesis Testing.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
Overview Basics of Hypothesis Testing
Essential Statistics in Biology: Getting the Numbers Right
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 19.
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Confidence intervals and hypothesis testing Petter Mostad
Chapter 8 Introduction to Hypothesis Testing ©. Chapter 8 - Chapter Outcomes After studying the material in this chapter, you should be able to: 4 Formulate.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
Large sample CI for μ Small sample CI for μ Large sample CI for p
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Chap 8-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 8 Introduction to Hypothesis.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
‘Omics’ - Analysis of high dimensional Data
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Issues concerning the interpretation of statistical significance tests.
Tests of Significance: The Basics BPS chapter 15 © 2006 W.H. Freeman and Company.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Statistical Testing with Genes Saurabh Sinha CS 466.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Testing the Differences between Means Statistics for Political Science Levin and Fox Chapter Seven 1.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
The Broad Institute of MIT and Harvard Differential Analysis.
Applied Quantitative Analysis and Practices LECTURE#14 By Dr. Osman Sadiq Paracha.
Statistical Techniques
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Hypothesis Testing Chapter Hypothesis Testing  Developing Null and Alternative Hypotheses  Type I and Type II Errors  One-Tailed Tests About.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Review and Preview and Basics of Hypothesis Testing
Q-Vals (and False Discovery Rates) Made Easy
Q-Vals (and False Discovery Rates) Made Easy
Chapter 8 Hypothesis Tests
Presentation transcript:

1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html

Challenge You test plants/patients/… in two settings (or from different populations). You want to know which / how many genes are differentially expressed (alternate) You don’t want to make too many mistakes (declaring a gene to be alternate = differentially expressen when in fact they are null – not differentially expressed). Multiple Testing

You choose a significance level, say You calculate p-values of the differences in expression. The p-value of g is the probability that if g is null (not differentially expressed), it would have a test statistic (e.g., t-statistic) at least that large. You say all genes that differ with p-value ≤ 0.05 are truly different. The Multiple Testing Problem What’s the problem?

Suppose that you test 10,000 genes, but no genes are truly differentially expressed. You will conclude that about 5% of those you called significant are differentially expressed. You will find 500 “significant” genes. Bad. The Multiple Testing Problem You are testing many genes at the same time

The Multiple Testing Problem

Bonferroni Correction

Bonferroni Correction (FWER control) Pr(at least one gene found diff.expr.) Bonferroni controls the probability by which our list of differentially expressed genes contains at least one mistake = Family-wise error rate (FWER). This is very strict.

A Fundamental Insight All truly null genes (i.e. not truly differentially expressed) are equally likely to have any p-value. That is by construction of p-val: under the null hypothesis, 1% of the genes will be in the top 1 percentile, 1% will be in percentile between 89 and 90 th and so on. P-val is just a way of saying percentile in null condition. False Discovery Rate (FDR) estimation 0 1 p-value

Idea: The observed p-value distribution is a mixture of null genes (light blue marbles) and truly different genes (red marbles). If the chosen test is appropriate, red marbles should be concentrated at the low p-values. False Discovery Rate (FDR) estimation 0 1 p-value Differential gene Non-Differential gene

We don’t of course know the colors of the marbles/we don’t know which genes are true alternates. However, we know that null marbles are equally likely to have any p-value. So, at the p-value where the height of the marbles levels off, we have primarily light blue marbles/null genes. False Discovery Rate (FDR) estimation

≈non- differential genes Because if all genes/marbles were null, the heights would be about uniform. Provided the reds are concentrated near the low p-values, the flat regions will be primarily light blues. Absolute frequency 0 1 p-value We estimate the baseline of null marbles

False Discovery Rate (FDR) estimation ≈non- differential genes ≈ differential genes 0 1 p-value Subtracting the “baseline” of true null hypotheses, the remaining balls are primarily red, i.e., they are true alternative hypotheses Absolute frequency

≈non- differential genes ≈ differential genes False Discovery Rate (FDR) estimation 0 1 p-value Given a p-value cutoff, we can estimate the rate of false discoveries (FDR) that pass this threshold. Absolute frequency p-value cutoff FDR(p-cut) = +

Baseline of nulls Absolute frequency 0 1 p-value FDR-based p-value cutoff Given a desired FDR (e.g., 20%), we can find the largest p-value cutoff for which this FDR is achieved. FDR(p-cut 1 )= 9% p-cut 1 = 0.1

Baseline of nulls Absolute frequency 0 1 p-value FDR-based p-value cutoff Given a desired FDR (e.g., 20%), we can find the largest p-value cutoff for which this FDR is achieved. FDR(p-cut 1 )= 9% FDR(p-cut 1 )= 20% p-cut 1 = 0.1 p-cut 1 = 0.2

Baseline of nulls Absolute frequency 0 1 p-value FDR-based p-value cutoff Given a desired FDR (e.g., 20%), we can find the largest p-value cutoff for which this FDR is achieved. FDR(p-cut 1 )= 9% FDR(p-cut 1 )= 20% FDR(p-cut 3 )= 52% p-cut 1 = 0.1 p-cut 1 = 0.2 p-cut 1 = 0.7

Baseline of nulls Absolute frequency 0 1 p-value FDR-based p-value cutoff Given a desired FDR (e.g., 20%), we can find the largest p-value cutoff for which this FDR is achieved. p-cut 1 = 0.1 FDR(p-cut 1 )= 9% FDR(p-cut 1 )= 20% FDR(p-cut 3 )= 52% p-cut 1 = 0.2 p-cut 1 = 0.7

Consider the all null case (all marbles are blue). For any p-value cutoff, the estimated FDR will be close to 100%. For any sensible FDR (substantially below 100%), there will be no suitable p-value cutoff, and the method will not return any gene. Good. Example: All null 0 1 p-value

Examples: All alternate 0 1 p-value Consider the all alternate case (all marbles are red). For a large range of p-value cutoffs, the estimated FDR will be close to 0. For sensible FDR cutoffs (e.g. 20%), the corresponding p-value cutoff will be high. The method will return many genes Good.

A flat p-value distribution may force us to the far left in order to get a low False Discovery Rate. This may eliminate genes of interest. If subsequent validation experiments are not too expensive, we can accept a higher False Discovery Rate (e.g., 20%) FDR rate and significance level are entirely different things! Conclusions

Gene Set Enrichment

Fisher‘s exact test, once more

Gene Ontology Example 559

Gene Ontology Example (immune response) (macromolecule biosynthesis)

Kolmogorov-Smirnov Test < Move 1/K up when you see a gene from group a Move 1/(N-K) down when you see a gene not in group a

GO scoring: general problem

GO Independence Assumption light yellow GO sets

GO Independence Assumption light yellow

The elim method

Top 10 significant nodes (boxes) obtained with the elim method

Algorithms Summary

Evaluation: Top scoring GO term Significant GO terms in the ALL dataset

Advantages & Disadvantages for ALL

Simulation Study Introduce noise

Simulation Study

Quality of GO scoring methods 10% noise level 40% noise level

Summary