Robust and powerful sibpair test for rare variant association

Slides:



Advertisements
Similar presentations
Association Tests for Rare Variants Using Sequence Data
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Hypothesis Testing Steps in Hypothesis Testing:
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Human Genetics Genetic Epidemiology.
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
Lecture 5 Outline – Tues., Jan. 27 Miscellanea from Lecture 4 Case Study Chapter 2.2 –Probability model for random sampling (see also chapter 1.4.1)
Quantitative Genetics
Hypothesis Testing. G/RG/R Null Hypothesis: The means of the populations from which the samples were drawn are the same. The samples come from the same.
Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type 2 diabetes Altshuler et al. Nat Genet 26:76-80, 2000.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Quantitative Genetics
Standard error of estimate & Confidence interval.
Kaitlyn Cook Carleton College Northfield Undergraduate Mathematics Symposium October 7, 2014 A METHOD FOR COMBINING FAMILY-BASED RARE VARIANT TESTS OF.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
- Interfering factors in the comparison of two sample means using unpaired samples may inflate the pooled estimate of variance of test results. - It is.
Process of Genetic Epidemiology Migrant Studies Familial AggregationSegregation Association StudiesLinkage Analysis Fine Mapping Cloning Defining the Phenotype.
Figure S1. Quantile-quantile plot in –log10 scale for the individual studies The red line represents concordance of observed and expected values. The shaded.
1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M Computational Statistical Genetics.
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Determination of Sample Size: A Review of Statistical Theory
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Statistical Methods for Rare Variant Association Test Using Summarized Data Qunyuan Zhang Ingrid Borecki, Michael A. Province Division of Statistical Genomics.
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Experimental Psychology PSY 433 Appendix B Statistics.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 15: Linkage Analysis VII
Qunyuan Zhang Ingrid Borecki, Michael A. Province
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
California Pacific Medical Center
Mystery 1Mystery 2Mystery 3.
Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Genetic Association Study Principles: Andrew C. Heath.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection
Lecture 22: Quantitative Traits II
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Sequence Kernel Association Tests (SKAT) for the Combined Effect of Rare and Common Variants 統計論文 奈良原.
Genome-Wides Association Studies (GWAS) Veryan Codd.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Marker heritability Biases, confounding factors, current methods, and best practices Luke Evans, Matthew Keller.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Regression-based linkage analysis
Chapter 9 Hypothesis Testing.
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Beyond GWAS Erik Fransen.
Correlation for a pair of relatives
Exercise: Effect of the IL6R gene on IL-6R concentration
Lecture 9: QTL Mapping II: Outbred Populations
Lactase Haplotype Diversity in the Old World
Presentation transcript:

Robust and powerful sibpair test for rare variant association Sebastian Zöllner University of Michigan

Acknowledgements Keng-Han Lin Matthew Zawistowski Mark Reppell

Rare Variants –Why Do We Care? GWAS have been successful. Only some heritability is explained by common variants. Uncommon coding variants (maf 5%-0.5%) explain less. Rare variants could explain some ‘missing’ heritability. Better Risk prediction. Rare variants may identify new genes. Rare exonic variants may be easier to annotate functionally and interpret.

Burden/Dispersion Tests Testing individual variants is unfeasible. Limited power due to small number of observations. Multiple testing correction. Alternative: Joint test. Burden test (CMAT, Collapsing, WSS) Dispersion test (SKAT, C-alpha)

Challenges of Rare Variant Analysis Gene-based tests have low power. Nelson at al (2010) estimated that 10,000 cases & 10,000 controls are required for 80% power in half of the genes. Large sample size required More heterogeneous sample =>Danger of stratification Stratification may differ from common variants in magnitude and pattern.

Stratification in European Populations Test is symmetric… Add in lambda values… but note that genomic control lambda not enough to correct – b/c effect depends on total # of variants per gene… so per-gene correction factors needed. Likely to overcorrect genes that have small variation. (202 genes, n=900/900, MAF < 1%, Nonsense/nonsynonymous variants)

Variant Abundance across Populations African-American Southern Asia South-Eastern Europe Finland South-Western Europe Northern Europe Central Europe Western Europe Eastern Europe North-Western Europe Expected Number of variants per kb To-do: Add text box to mark up y-axis and fix typo A gradient in diversity from Southern to Northern Europe Sample Size

Allele Sharing Measure of rare variant diversity. Probability of two carriers of the minor alleles being from different populations (normalized). To-do: Sharing nubmers in middle panel are for (1,2.5]% but plot is for 1-5% Move to world map… Median EU-EU: 0.71 Median EU-EU: 0.86 Median EU-EU: 0.98

General Evaluation of Stratification Select 2 populations. Select mixing parameter r. Sample 30 variants from the 202 genes. Calculate inflation based on observed frequency differences.

Inflation by Mixture Proportion Zawistowski et al. 2014

Inflation across Comparisons

Family-based Test against Stratification If multiple affected family members are collected, it may be more powerful to sequence all family members. Family-based tests can be robust against stratification. TDT-Type tests are potentially inefficient. How to leverage low frequency? Low frequency risk variants should me more common in cases. And even more common on chromosomes shared among many cases.

Family Test Consider affected sibpairs. Estimate IBD sharing. Compare the number of rare variants on shared (solid) and non-shared chromosomes (blank). Any aggregate test can be applied. S=1 S=2

Basic Properties Twice as many non-shared as shared chromosomes. Null hypothesis determines test: Shared alleles : Non-shared alleles=1:2 Test for linkage or association Shared alleles : Non-shared alleles= Shared chromosomes : Non-shared chromosomes Test for association only

Haplotypes not required IBD sharing is known. Individuals don’t need phase to identify shared variants. Except one configuration: IBD 1 and both sibs are heterozygous Under null, probability of configuration 2 is allele frequency. Under the alternative, we need to use multiple imputation. Configuration 1 +1 shared Configuration 1 +2 non-shared

Evaluation of Internal Control S=0 Assume chromosome sharing status is known for each sibpair. Count rare variants; impute sharing status for double-heterozygotes. Compare number of rare variants between shared and non-shared chromosomes with chi-squared test (Burden Style). S=1 S=2

Enriching Based on Familial Risk Classic Case- Control Internal Control Selected Cases S=0 S=2 S=1

Stratification Consider 2 populations. p=0.01 in pop1, p=0.05 in pop2. 1000 sibpairs for internal control design. 1000 cases, 1000 controls for selected cases. 1000 cases and 1000 controls for case-control. Sample cases from pop1 with proportion . Test for association with α=0.05.

Robust to Population Stratification

Evaluating Study Designs Realistic rare variant models are unknown Typical allele frequency Number of risk variants/gene Typical effect size Distribution of effect sizes Identifiabillity of risk variants Goal: Create a model that summarizes these unknowns into Summed allele frequency Mean effect size Variance of effect size

Basic Genetic Model Assume many loci carrying risk variants. Risk alleles at multiple loci each increase the risk by a factor independently. Frequency of risk variant: Independent cases On shared chromosome A Affected AA Affected relative pair R Risk locus genotype

Effect Size Model A Affected r1,r2 Carrier status of chromosome 1,2 m1,m2 Relative risk of risk variants on 1,2  Mean effect size σ2 Variance of effect size Relative risk is sampled from distribution f with mean , variance σ2. Simplifications: Each risk variant occurs only once in the population. Each risk variant on its own haplotype. Then the risk in a random case is

Effect in Sib-pairs AA Affected rel pair ri Carrier stat chrom i mi Relative risk of variant on i f Distribution of RR  Mean RR σ2 Variance of RR S Sharing status To calculate the probability of having an affected sib-pair we condition on sharing S. For S>0, the probability depends on σ2. E.g. (S=2):

Analytic Power Analysis Select μ, σ2 and cumulative frequency f Calculate allele frequency in cases/controls P(R|A). Calculate allele frequency in shared/non- shared chromosomes. => Non-centrality parameter of χ2 distribution.

Minor Allele Frequency Conventional Case-Control Internal Control Selected Cases

Power Comparison by Mean Effect Size

Power Comparison by Variance

Gene-Gene Interaction Gene-gene interaction affects power in families. For broad range of interaction models, consider two-locus model. G now has alleles g1,g2. The joint effect is We compare the effect of  while adjusting L and G to maintain marginal risk.

Power for Antagonistic Interaction

Power for Positive Interaction

Conclusions Stratification is a strong confounder for rare variant tests. Family-based association methods are robust to stratification. Comparing rare variants between shared and non- shared chromosomes is substantially more powerful than case-control designs. All family based methods/samples depend on the model of gene-gene interaction. Under antagonistic interaction power can be lower than a population sample.

Thank you for your attention Questions? Thank you for your attention