Statistical Methods for Rare Variant Association Test Using Summarized Data Qunyuan Zhang Ingrid Borecki, Michael A. Province Division of Statistical Genomics.

Slides:



Advertisements
Similar presentations
Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants Journal club (Nov/13) SH Lee.
Advertisements

Shibing Deng Pfizer, Inc. Efficient Outlier Identification in Lung Cancer Study.
Association Tests for Rare Variants Using Sequence Data
A Method for Detecting Pleiotropy
Thursday, September 12, 2013 Effect Size, Power, and Exam Review.
1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of.
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
Differentially expressed genes
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Genome-wide association studies BNFO 601 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick.
Lecture 9: One Way ANOVA Between Subjects
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
BS704 Class 7 Hypothesis Testing Procedures
Inferences About Process Quality
Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.
5-3 Inference on the Means of Two Populations, Variances Unknown
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Robust and powerful sibpair test for rare variant association
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Claims about a Population Mean when σ is Known Objective: test a claim.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
T-distribution & comparison of means Z as test statistic Use a Z-statistic only if you know the population standard deviation (σ). Z-statistic converts.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Generalized Linear Mixed Model (GLMM) & Weighted Sum Test (WST) Detecting Association between Rare Variants and Complex Traits Qunyuan Zhang, Ingrid Borecki,
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*,
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M Computational Statistical Genetics.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Jeopardy Hypothesis Testing t-test Basics t for Indep. Samples Related Samples t— Didn’t cover— Skip for now Ancient History $100 $200$200 $300 $500 $400.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 9-1 Review and Preview.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Section 3.3: The Story of Statistical Inference Section 4.1: Testing Where a Proportion Is.
Introduction Sample Size Calculation for Comparing Strategies in Two-Stage Randomizations with Censored Data Zhiguo Li and Susan Murphy Institute for Social.
Qunyuan Zhang Ingrid Borecki, Michael A. Province
Correlation Matrix Diagonal Segmentation (CMDS) A Fast Genome-wide Approach for Identifying Recurrent DNA Copy Number Alterations across Cancer Patients.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
1 Validation of Qualitative Microbiological Test Methods NCS Conference Brugge, October 2014 Pieta IJzerman-Boon (MSD) Edwin van den Heuvel (TUe, UMCG/RUG)
Logic and Vocabulary of Hypothesis Tests Chapter 13.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Introduction to Hypothesis Testing
Statistics for Political Science Levin and Fox Chapter Seven
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
ENGR 610 Applied Statistics Fall Week 7 Marshall University CITE Jack Smith.
CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer Max Leiserson *, Hsin-Ta Wu *, Fabio Vandin, Benjamin.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Hypothesis Testing and Statistical Significance
Sequence Kernel Association Tests (SKAT) for the Combined Effect of Rare and Common Variants 統計論文 奈良原.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Power Calculations for GWAS
Genetic Association Analysis
Genome Wide Association Studies using SNP
Behaviorally dependent allele-specific expression.
Beyond GWAS Erik Fransen.
Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project  Paul L. Auer, Alex.
Xing Hua, Haiming Xu, Yaning Yang, Jun Zhu, Pengyuan Liu, Yan Lu 
Exact Test Fisher’s Statistics
Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test  Michael C. Wu, Seunggeun Lee, Tianxi Cai, Yun Li, Michael.
Xing Hua, Haiming Xu, Yaning Yang, Jun Zhu, Pengyuan Liu, Yan Lu 
Presentation transcript:

Statistical Methods for Rare Variant Association Test Using Summarized Data Qunyuan Zhang Ingrid Borecki, Michael A. Province Division of Statistical Genomics

Motivation Individual levelSummarized level Subject Variant Trait V1V2V3 1000case control …………… Variant V1V2V3 Variant No. in cases1083 Variant No. in controls201 No. of cases300 No. of controls500 Pooled DNA sequencing Public data (as control) Next generation sequencing => rare variants Two types of data

Existing Methods MethodDescription Bi-directional effects Ref. EFT Exclusive Frequency Test testing mutually exclusive allele/carrier freq. × Commonly- used in publications, such as Cohen et al., 2004 TFT Total Frequency Test testing total allele/carrier freq. × CAST Cohort Allele Sum Test testing total allele/carrier number × Morgenthaler & Thilly, 2006 C-alpha testing variance √ Neale et al., 2011

QQ Plots of Existing Methods (under the null) EFT and C-alpha inflated with false positives TFT and CAST no inflation, but assuming single effect-direction Objective More general, powerful methods … CAST C-alpha EFT TFT

Structure of Summarized data variant 1 variant i variant k variant 2 … Strategy Instead of testing total freq./number, we test the randomness of all tables. variant 3 …

4. Calculating p-value P= Prob.( ) Exact Probability Test (EPT) 1.Calculating the probability of each table based on hypergeometric distribution 2. Calculating the logarized joint probability (L) for all k tables 3. Enumerating all possible tables and L scores

Likelihood Ratio Test (LRT) Binomial distribution

Q-Q Plots of EPT and LRT (under the null) EPT N=500 EPT N=3000 LRT N=500 LRT N=3000

Power Comparison significance level= Variant proportion Positive causal 80% Neutral 20% Negative Causal 0% Power Sample size Power Sample size Power Sample size

Power Comparison significance level= Variant proportion Positive causal 60% Neutral 20% Negative Causal 20% Power Sample size

Power Comparison significance level= Variant proportion Positive causal 40% Neutral 20% Negative Causal 40% Power Sample size

Power Comparison individual-level data vs. summarized data N=1000, significance level= Power Variant proportion positive : neutral : negative (%) CMC Li & Leal, 2008 SKAT Wu et al., 2011

Application -LOG10 p-values of 933 cancer-related genes Cases: 460 ovarian cancer cases, germline exome data, from TCGA Controls: ~3500 individuals, exome data, from NHBLI

Conclusions  EFT and C-alpha produce inflated p-value.  TFT and CAST produce correct p-value, but lose power in detecting bi-directional effects.  EPT produces correct p-value and maintains power regardless of effect directions, more computer time.  LRT produces slightly biased p-value for small N, can be improved by larger N, similar power of EPT, less computer time, a good alternative for large datasets.  If no confounders need to be modeled, there is no significant loss of power in the use of summarized data

Acknowledgements Dr. Li Ding Charles Lu Krishna-Latha Kanchi (for providing the TCGA and NHBLI exome data)