Multiple testing etc. Benjamin Neale Leuven 2008.

Slides:



Advertisements
Similar presentations
What Could We Do better? Alternative Statistical Methods Jim Crooks and Xingye Qiao.
Advertisements

Chi-square, Goodness of fit, and Contingency Tables
Shibing Deng Pfizer, Inc. Efficient Outlier Identification in Lung Cancer Study.
Genetic Statistics Lectures (5) Multiple testing correction and population structure correction.
How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-fit Statistics in Categorical Data Analysis Alberto Maydeu-Olivares.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Multiple Testing, Permutation, False Discovery Benjamin Neale Pak Sham Shaun Purcell.
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms.
07/01/15 MfD 2014 Xin You Tai & Misun Kim
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Statistical Data Analysis: Lecture 2 1Probability, Bayes’ theorem 2Random variables and.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Differentially expressed genes
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
False Discovery Rate Methods for Functional Neuroimaging Thomas Nichols Department of Biostatistics University of Michigan.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Multiple testing correction
Differential Analysis & FDR Correction
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
Tutorial 4 Single subject analysis. Tutorial 4 – topics Computing t values Contrasts Multiple comparisons Deconvolution analysis Trigger average analysis.
GenABEL: an R package for Genome Wide Association Analysis
Chi Square Test for Goodness of Fit Determining if our sample fits the way it should be.
Lec. 19 – Hypothesis Testing: The Null and Types of Error.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Boulder 2008 Benjamin Neale I have the power and multiple testing.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Lecture 1.31 Criteria for optimal reception of radio signals.
BINARY LOGISTIC REGRESSION
Applied statistics Usman Roshan.
Why Stochastic Hydrology ?
Stat 223 Introduction to the Theory of Statistics
Part Four ANALYSIS AND PRESENTATION OF DATA
Chapter 4: Sampling and Statistical Inference
Basic Estimation Techniques
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Differential Gene Expression
Genome Wide Association Studies using SNP
Do data match an expected ratio
Special Topics In Scientific Computing
Genome-Wide Pharmacogenomic Study on Methadone Maintenance Treatment
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
BMI/CS 776 Spring 2018 Anthony Gitter
Basic Estimation Techniques
Discrete Event Simulation - 4
Hypothesis testing and Estimation
POINT ESTIMATOR OF PARAMETERS
Eviews Tutorial for Labor Economics Lei Lei
Integrative Multi-omic Analysis of Human Platelet eQTLs Reveals Alternative Start Site in Mitofusin 2  Lukas M. Simon, Edward S. Chen, Leonard C. Edelstein,
Volume 88, Issue 3, Pages (November 2015)
Multivariate Linkage Continued
Jon Wakefield  The American Journal of Human Genetics 
Power Calculation Practical
Power Calculation Practical
Seed sequences, coding potential, and evolutionary conservation of circRNAs. Seed sequences, coding potential, and evolutionary conservation of circRNAs.
Manhattan plots for GWAS of LD50, µg/ml survival, 0
Incorporating the sample correlation between two test statistics to adjust the critical points for the control of type-1 error Dror Rom and Jaclyn McTague.
Presentation transcript:

Multiple testing etc. Benjamin Neale Leuven 2008

Technology advances – Affy 500K

How the Affy Chip works

Illumina’s Chip

Multiple testing approaches Bonferroni Correction Conservative Probability of observing at least 1 hit at that level False Discovery methodology Capitalizes on a distribution of positive results See work by Benjamin, Hochberg, Storey Bayes Factor Like a P-value, but for Bayesians Marginal likelihood of the two sets of parameters Correlation approaches 1 under the alternative

Genome-wide significance Multiple testing theory requires an estimate of the number of ‘independent tests’ Risch and Merikangas 1996 estimated a threshold of 10-6 = (0.05/(5*10000)) HapMap 2005 estimate 10-8 based on encode deep sequencing in ENCODE regions Dudbridge and Gusnato and Pe’er et al. 2008 Genetic Epidemiology estimate based on ‘infinite density’ like Lander and Kruglyak 1995 generates 5x10-8

Guessing game I’ll buy a beer for the closest guess We’re each going to simulate a single chi square There are about 25 groups

Distribution of chi squares 1 df chi square expectation Distribution of maximum 1 df chi square from 25 trials

QQ plotting

QQ plotting Expectation

QQ plotting Observation

QQ plotting Observation Expectation

Above the line indicates inflation of the test statistic QQ plotting Above the line indicates inflation of the test statistic Observation Expectation

Below the line indicates deflation of the test statistic QQ plotting Observation Below the line indicates deflation of the test statistic Expectation

Plotting the results in R CHANGE DIR… This is the menu item you must change to change where the PLINK results file is found Note you must have the R console highlighted

Picture of the dialog box Either type the path name or browse to where you saved plink.cmh

Screenshot of source code selection This is the file qqplot.R for the source code