C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Association Tests for Rare Variants Using Sequence Data
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Genetic Analysis in Human Disease
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Data Mining in Linkage Disequilibrium Mapping Jing Hua Zhao Epidemiology June 2003.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Alexander Lasiuk, Cassandra Dale; University of Wisconsin-Eau Claire Advisor: Abra Brisbin Collaborators: N. Sydney Moïse, Jenifer Cruickshank and Teresa.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
Signatures of Selection
Gene-gene and gene-environment interactions Manuel Ferreira Massachusetts General Hospital Harvard Medical School Center for Human Genetic Research.
Admixture Mapping Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Computational Statistical Genetics.
1 FSTL4 and SEMA5A are associated with alcohol dependence: meta- analysis of two genome-wide association studies Kesheng Wang, PhD Department of Biostatistics.
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
IGES 2003 How many markers are necessary to infer correct familial relationships in follow-up studies? Silvano Presciuttini 1,3, Chiara Toni 2, Fabio Marroni.
Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis Dinu et al, J. Biomedical.
Chi-square test Pearson's chi-square (χ 2 ) test is the best-known of several chi-square tests. It is mostly used to assess the tests of goodness of fit.
RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Robust and powerful sibpair test for rare variant association
Statistical Power Calculations Boulder, 2007 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
. Summary Lecture This class has been edited from several sources. Primarily from Terry Speed’s homepage at Stanford and the Technion course “Introduction.
Strong Heart Family Study Phase VI Genetics Center Aims October 8, 2009.
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
Family-Based Association Tests
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*,
Perlegen Sciences, Inc. Understanding the Genetic Architecture of Common Disease: A Comparison of Genome Scans Understanding the Genetic Architecture of.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
What host factors are at play? Paul de Bakker Division of Genetics, Brigham and Women’s Hospital Broad Institute of MIT and Harvard
Linkage in selected samples Manuel Ferreira QIMR Boulder Advanced Course 2005.
Complement Factor H Polymorphism in Age- Related Macular Degeneration* *Klein RJ, et al. Science. 2005; 308:
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
Sequential & Multiple Hypothesis Testing Procedures for Genome-wide Association Scans Qunyuan Zhang Division of Statistical Genomics Washington University.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
A Statistical Method for Adjusting Covariates in Linkage Analysis With Sib Pairs Colin O. Wu, Gang Zheng, JingPing Lin, Eric Leifer and Dean Follmann Office.
An quick overview of human genetic linkage analysis
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
A Transmission/disequilibrium Test for Ordinal Traits in Nuclear Families and a Unified Approach for Association Studies Heping Zhang, Xueqin Wang and.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Lecture 20 - Association Tests Gibson and Muse, Chapt. 3, 2nd Ed.
GenABEL: an R package for Genome Wide Association Analysis
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
An quick overview of human genetic linkage analysis Terry Speed Genetics & Bioinformatics, WEHI Statistics, UCB NWO/IOP Genomics Winterschool Mathematics.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Notes from the GAW14 “Genetic Analysis Workshop 14” September 7-10, 2004 Noordwijkerhout, NL Kelly Burkett September 20 th, 2004.
A simple method to localise pleiotropic QTL using univariate linkage analyses of correlated traits Manuel Ferreira Peter Visscher Nick Martin David Duffy.
Efficient calculation of empirical p- values for genome wide linkage through weighted mixtures Sarah E Medland, Eric J Schmitt, Bradley T Webb, Po-Hsiu.
Today’s lesson (Chapter 12) Paired experimental designs Paired t-test Confidence interval for E(W-Y)
Statistical Analysis of Candidate Gene Association Studies (Categorical Traits) of Biallelic Single Nucleotide Polymorphisms Maani Beigy MD-MPH Student.
Principal components analysis
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
Power Calculations for GWAS
Principal components analysis
Stratification Lon Cardon University of Oxford
Power to detect QTL Association
Christoph Lange, Nan M. Laird  The American Journal of Human Genetics 
Presentation transcript:

C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica Su, Nan Laird and Christoph Lange Harvard School of Public Health

Genome-wide association studies Limitation of linkage analysis and the potential of association analysis => genome-wide association studies (Risch & Merikangas 1997) 100,000 > SNPs and phenotypes are tested for association. Statistical road block: Severe multiple testing problem!!!

“ Using the same data set for screening and testing” Testing strategy: –Assess evidence for association for all SNPs based on S (Screening Step) –Select a small subset of N markers (10-200) –Compute the association test conditional upon S and adjust N comparisons (Testing Step) –If the screening step and the testing step are statistically independent, we can look at the data in the screening step without paying a “statistical price” for it. Screening technique S Testing statistic T

“ Using the same data set for screening and testing” General concept proposed by Laird and Lange (2006, Nat Rev Genet) Decomposition of joint-likelihood: P( {phenotype, genotype} ) = P( {phenotype, genotype} | S({phenotype, genotype}) ) * P(S{phenotype, genotype}) S = “Summary test statistic to assess evidence for association” Requirements for S: –The association test has to condition on S –S has to contain information about the potential association as well = Screening Step= Testing Step Testing strategy: –Assess evidence for association for all SNPs based on S (Screening Step) –Select a small subset of N markers (10-200) –Compute the association test conditional upon S and adjust N comparisons (Testing Step) –The screening step and the testing step are statistically independent !!!

“ Using the same data set for screening and testing” Application to family-based association tests (VanSteen et al (2005)) Decomposition of joint-likelihood: P( {phenotype, genotype, parent genotype} ) = P( {phenotype, genotype} | {phenotype, par. genotype} ) * P( {phenotype, par genotype}) S = “phenotype and parental genotype/sufficient statistic” = Screening Step based on conditional mean model Lange et al (2003) = Testing Step based FBAT Laird et al (2000) Properties of the testing strategy: –Outperforms standard adjustments for multiple comparions by factors up to 40 –Additional power boost by the use of complex phenotypes such as longitudinal data: Discovery of INSIG2 in a 100K-scan in the Framingham Heart Study First replicable association for BMI / obesity (Herbert et al (2006, Science)) Alternative approach: –Instead of using the between-component (Screening step) and the within-component (Testing Step) in 2 stage testing strategy one could include both components in the test statistics, e.g. QTDT (Abecasis et al (2000)) –Disadvantages: –Only marginal power gains (5%) over the FBAT-statistic when a single SNP is tested (Abecasis et al (2001)) –Lack of robustness against population admixture (Yu et al (2006)) = Within-family component (Fulker et al (1999)) = Between-family component Fulker et al (1999)

“ Using the same data set for screening and testing” Can we translate this concept to association studies in unrelated cases and controls?  2  Tests and Amitrage-trend tests are conditional tests that condition upon the margins => The data-partitioning statistic S are margins of the table COMPLETE SET Number of Alleles 012 Cases Controls

COMPLETE SET Number of Alleles 012 Cases Controls ESTIMATION SET Number of Alleles 012 Cases Controls TESTING SET Number of Alleles 012 Cases Controls ESTIMATION SET Number of Alleles 012 Cases Controls % % 49 25% 526. TESTING SET Number of Alleles 012 Cases Controls % % % 474. = Screening Step= Testing Step Testing strategy: 1.) Divide table into a “screening table” and a “testing table“ 2.) For each SNP, use the “screening table” and the margins of the “testing table” to assess evidence for association in the screening step 3.) Select the most promising N SNPs and test them for association based on the data of the testing table. How can we obtain information about an association from the margins?

COMPLETE SET Number of Alleles 012 Cases Control s NON-INFORMATIVE SET Number of Alleles 012 Cases Controls TESTING SET Number of Alleles 012 Cases Controls IMPUTED SET Number of Alleles 012 Cases Controls N (T) MARGINAL SET Number of Alleles 012 Cases Controls SCREENING SET Number of Alleles 012 Cases Control s Results will depend on the actual random split-up of the tables! Solution: 1.) Re-sampling of the tables 2.) p-value for testing set based on p(data)=p(data|S(data))*p(S(data)) and Monte-Carlo simulations

Simulation Study Cases/ControlsORSNPsMethodAllele Frequencies ,000C 2 BAT Standard ,000C 2 BAT Standard ,000C 2 BAT Standard ,000C 2 BAT Standard

Can C2BAT find INSIG2 in the 100K- scan in Framingham Heart Study again ? 1400 probands in about 300 families:  Randomly select 150 unrelated cases/controls (BMI>28 = “affected”) =>Apply standard analysis (p-value adjusted by Bonferroni correction) and C2BAT to see whether INSIG2 reaches genome-wide significance For 1000, replicates: Power of standard analysis to detect INSIG2: 5% Power of C2BAT to detect INSIG2:17%

Future work: 1.) Extension to quantitative traits =>Expression analysis 2.) Gene-gene interactions Software: