Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants Journal club (Nov/13) SH Lee.

Slides:



Advertisements
Similar presentations
Multivariate Meta-analysis: Notes on Correlations Robert Platt Department of Epidemiology & Biostatistics McGill University Jack Ishak United BioSource.
Advertisements

Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
T tests, ANOVAs and regression
Multiple Regression.
Chapter 4: Basic Estimation Techniques
February Nature of the distribution is not known, or known to be non-normal. Sometimes called distribution free statistics Everything up to this.
Continued Psy 524 Ainsworth
Imputation for GWAS 6 December 2012.
Analysis of imputed rare variants
Chapter 6 The Mathematics of Diversification
Analysis of Variance (ANOVA)
What is an association study? Define linkage disequilibrium
Gene-by-Environment and Meta-Analysis Eleazar Eskin University of California, Los Angeles.
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Appendix 13A. Alternative Specifications Of APT By Cheng Few Lee Joseph Finnerty John Lee Alice C Lee Donald Wort.
Chapter 5 The Mathematics of Diversification
Why I chose: First reading results seemed counterintuitive Introduction full of references I didn’t know Useful? Or Gee Whizz so what?...Needed to read.
Shibing Deng Pfizer, Inc. Efficient Outlier Identification in Lung Cancer Study.
Association Tests for Rare Variants Using Sequence Data
Structural Equation Modeling
General Linear Model Introduction to ANOVA.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
Statistical presentation in international scientific publications 6. Reporting more complicated findings Malcolm Campbell Lecturer in Statistics, School.
Regression and Correlation
Using biological networks to search for interacting loci in genome-wide association studies Mathieu Emily et. al. European journal of human genetics, e-pub.
(1) Risk prediction by kernels and (2) Ranking SNPs Usman Roshan.
Correlation and Regression
Robust and powerful sibpair test for rare variant association
What ’s important to population growth? A bad question! Good questions are more specific Prospective vs. retrospective questions A parameter which does.
Introduction to Multilevel Modeling Using SPSS
Geuvadis RNAseq analysis at UNIGE Analysis plans
Generalized Linear Mixed Model (GLMM) & Weighted Sum Test (WST) Detecting Association between Rare Variants and Complex Traits Qunyuan Zhang, Ingrid Borecki,
Chapter 17 Partial Correlation and Multiple Regression and Correlation.
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M Computational Statistical Genetics.
Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Statistical Methods for Rare Variant Association Test Using Summarized Data Qunyuan Zhang Ingrid Borecki, Michael A. Province Division of Statistical Genomics.
Qunyuan Zhang Ingrid Borecki, Michael A. Province
Environmental Modeling Basic Testing Methods - Statistics III.
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Analysis of Next Generation Sequence Data BIOST /06/2015.
Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations Day 3. Evaluating relationships Scatterplots and correlation Day 4. Regression and.
Sequence Kernel Association Tests (SKAT) for the Combined Effect of Rare and Common Variants 統計論文 奈良原.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Effect Sizes.
BPK 304W Correlation Correlation Coefficient r Limitations of r
Genetic Association Analysis
Marker heritability Biases, confounding factors, current methods, and best practices Luke Evans, Matthew Keller.
Generalized Linear Models
BPK 304W Correlation.
Statistical Methods For Engineers
Beyond GWAS Erik Fransen.
Chapter 3 Statistical Concepts.
Rare-Variant Extensions of the Transmission Disequilibrium Test: Application to Autism Exome Sequence Data  Zongxiao He, Brian J. O’Roak, Joshua D. Smith,
Ch11 Curve Fitting II.
Zheng-Zheng Tang, Dan-Yu Lin  The American Journal of Human Genetics 
Variant Association Tools for Quality Control and Analysis of Large-Scale Sequence and Genotyping Array Data  Gao T. Wang, Bo Peng, Suzanne M. Leal  The.
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test  Michael C. Wu, Seunggeun Lee, Tianxi Cai, Yun Li, Michael.
Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data  Zihuai.
Tao Wang, Robert C. Elston  The American Journal of Human Genetics 
Iuliana Ionita-Laza, Seunggeun Lee, Vlad Makarov, Joseph D
Detecting Treatment by Biomarker Interaction with Binary Endpoints
Hong Zhang, Judong Shen & Devan V. Mehrotra
Presentation transcript:

Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants Journal club (Nov/13) SH Lee

Introduction Sequence data – Rare and unidentified variants Groupwise association tests – Omnibus tests – Burden test, CMC test, SKAT test Up-weighting for rare, down-weighting for common Rare/common variants tested separately

Introduction This study develops a joint test of rare/common – Combining burden/SKAT test for rare/common Can be applied to – whole exome sequencing + GWAS – Deep resequencing of GWAS loci Basically can analyse all variants including rare, low-frequency and common variants Simulation (type 1 error, power) Real data, CD and Autism

Materials and Methods Definition of rare/common <0.01 rare low frequency >0.05 common Or <1/sqrt(2*n) rare >1/sqrt(2*n) common – n = 500, rare MAF < – n = 10000, rare MAF < 0.007

Materials and Methods Testing for the overall effect of rare and common variants – Rare for Burden test – Common for SKAT test Weighted-sum statistics Fishers method of combining the p values

Weighted-sum statistics Within a region (e.g. a gene) having m variants – g(*) is a linear or logistic link function – Alpha is for covariates – X is n x m matrix – Beta is regression coefficient and random variable

Weighted sum score test (Variance component score test) Taking the first derivative of log-likelihood respect with the variance τ P-value from κχ 2 ν κ is scale parameter, v is degree of freedom

Weighted sum score test (Variance component score test) Wu et al (2010) AJHG 86: 929; Liu et al (2008) BMC Bioinformatics 8: 292; Lin (1997) Biometrika 84: 309; White (1982) Econometrica 50: 1

Weighted sum score test (Variance component score test) ρ : the correlation between regression coefficients If perfectly correlated (ρ = 1), they will be all the same after weighting, and one should collapse the variants first before running regression, i.e., the burden test If the regression coefficients are unrelated to each other, one should use SKAT Lee et al. (2012) AJHG 91: 224

Burden-C, SKAT-C Combined test statistic for rare and common – Weighting beta(p,1,25) for rare, – beta(p,0.5,0.5) for common Partitioning rare and common variants

Other methods Burden-A, SKAT-A – Adaptive combining rare/common – Searching φ for the minimum p-value Burden-F, SKAT-F – Fishers combination method

Simulation Sequence data on 10,000 haplotypes on 1 Mb region Calibrated model for the European pop Random sample of a region of 5 or 25 kb and simulated data with individuals Proportion of cases in the sample is 0.5

Disease model

Methods

Type I error The proposed methods agrees with the expectation

Power (separation cut-off) Using burden-C test Power with different separation cut-offs 1/sqrt(2n) will be used further

Power (proposed methods) Power for 8 different tests The proposed combination tests outperform

Power Rare/common causal variants (model 1, 2, 3, 6) – The combination methods perform better

Power Common causal variants (model 5) – The combination methods perform better Rare causal variants (model 4) – The combination methods perform similarly

Power (proposed methods) The proposed combination methods outperform CMC for all 6 disease models The proposed combination methods outperform the original SKAT for all 6 disease models

Power For model 1-4 which include only risk variants SKAT better than Burden when prop. risk variants is small (10%) Burden better than SKAT when prop. risk variants is large (30%)

Power Model 1-3 which include both rare/common SKAT-F better than burden-F regardless of prop. risk variants Model 5 which include only common risk variants SKAT better than burden regardless of prop. risk variants

Power Adaptive test (SKAT-A, Burden-A) – Perform worse than SKAT-C and Burden-C Results for a region of size 5 kb were similar

Real data CD NOD2 sequence data – 453 cases, 103 controls – 60 single nucleotide variations (9 of them have > MAF 0.05) – Because only pooled frequency counts available for each variants, sequencing data were simulated. Autism LRP2 sequencing data – 430 cases, 379 controls

Real data The combination methods powerful than others

Discussion The proposed combination methods – Partitioning rare/common – Powerful approach – Better than CMC (rare/common partitioning) – Better than original Burden and SKAT test – Extend to family-based designs

Discussion T1D HLA region – SKAT (2.7e-43) – Wald test (6.7e-49) – Likelihood ratio test (8.9e-221) LD between regions Multiple different components within a region

Thanks

Linear SKAT vs individual variant test statistics Linear SKAT (lower) and individual variant test (upper) is equivalent

Three disease model for power comparison