Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS.

Slides:



Advertisements
Similar presentations
Imputation for GWAS 6 December 2012.
Advertisements

Analysis of imputed rare variants
What is an association study? Define linkage disequilibrium
Association Tests for Rare Variants Using Sequence Data
AllerGen / Vancouver - 01/03//2009 Meta-Analysis of GABRIEL GWAS Asthma & IgE F. Demenais, M. Farrall, D. Strachan GABRIEL Statistical Group.
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Genetic Analysis in Human Disease
Genetic susceptibility: Polymorphisms of the 8q24 chromosome S. Lani Park 05/07/09.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Cardiovascular Continuum Sampling from Extremes Padmanabhan S et al. PLoS Genet 2011.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Office hours Wednesday 3-4pm 304A Stanley Hall. Fig Association mapping (qualitative)
Diabetes Genome Wide Association Alessandra C Goulart Ida Hatoum Stalo Karageorgi Mara Meyer EPI293 January 2008 Harvard School of Public Health Alessandra.
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Using biological networks to search for interacting loci in genome-wide association studies Mathieu Emily et. al. European journal of human genetics, e-pub.
Genome-Wide Association Studies Xiaole Shirley Liu Stat 115/215.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Common Disease Findings (case study on diabetes) GWAS Workshop Francis S. Collins, M.D., Ph.D. National Human Genome Research Institute May 1, 2007.
The genetic epidemiology of common hormonal cancers Deborah Thompson Centre for Cancer Genetic Epidemiology.
Genome Variations & GWAS
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Analysis of genome-wide association studies
Molecular and Genetic Epidemiology Kathryn Penney, ScD January 5, 2012.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
The Center for Medical Genomics facilitates cutting-edge research with state-of-the-art genomic technologies for studying gene expression and genetics,
Figure S1. Quantile-quantile plot in –log10 scale for the individual studies The red line represents concordance of observed and expected values. The shaded.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
CS177 Lecture 10 SNPs and Human Genetic Variation
What host factors are at play? Paul de Bakker Division of Genetics, Brigham and Women’s Hospital Broad Institute of MIT and Harvard
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Genome-wide association studies (GWAS) Thomas Hoffmann.
Genome-Wide Association Study (GWAS)
Whole genome association studies Introduction and practical Boulder, March 2009.
Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology.
Molecular & Genetic Epi 217 Association Studies: Indirect John Witte.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Risk Prediction of Complex Disease David Evans. Genetic Testing and Personalized Medicine Is this possible also in complex diseases? Predictive testing.
Prof. Dr. H. Schunkert Medizinische Klinik II UK S-H Campus Lübeck Genome-wide association for myocardial infarction.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
Genome-wide association studies (GWAS) Thomas Hoffmann Department of Epidemiology and Biostatistics, and Institute for Human Genetics.
Genome wide association studies (A Brief Start)
The International Consortium. The International HapMap Project.
C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.
Analysis of Next Generation Sequence Data BIOST /06/2015.
Genome-wide association studies (GWAS) Thomas Hoffmann.
Design and Analysis of Genome- wide Association Studies David Evans.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
EBF FLJ31951UBLCP1 IL12B B36 Position Genes LD Regions Genotyped Markers Chr5 (q33.3) rs rs Figure 1. Physical map of 360kb around IL12B.
GWAS Consortia and Meta-Analysis Inês Barroso Joint Head of Human Genetics Metabolic Disease Group Leader Wellcome Trust Sanger Institute 1.
Genome-Wides Association Studies (GWAS) Veryan Codd.
Genetic Analysis in Human Disease Kim R. Simpfendorfer, PhD Robert S.Boas Center for Genomics & Human Genetics The Feinstein Institute for Medical Research.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Genetics of diabetes Linkages
SNPs and complex traits: where is the hidden heritability?
Gene Hunting: Design and statistics
Case Study #2 Session 1, Day 3, Liu
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Genome-wide Associations
Beyond GWAS Erik Fransen.
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Perspectives from Human Studies and Low Density Chip
Presentation transcript:

Genome-wide Association Studies John S. Witte

Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Affymetrix Array Genome-wide Association Studies Altshuler & Clark, Science 2005

Genome-wide Assocation Studies (GWAS)

GWAS+ Strategy Clarification: Sequencing+ Confirmation / Characterization: Follow-up Genotyping+ Discovery: Multi-stage GWAS+ # Markers # Samples Time

GWAS+ Strategy Clarification: Sequencing+ Confirmation / Characterization: Follow-up Genotyping+ Discovery: Multi-stage GWAS+ # Markers # Samples Time

1,2,3,………………………,N 1,2,3,……………………………, M SNPs Samples One-Stage Design Stage 1 Stage 2  samples  markers Two-Stage Design 1,2,3,……………………………, M SNPs Samples 1,2,3,………………………,N One- and Two-Stage GWA Designs

SNPs Samples Replication-based analysis SNPs Samples Stage 1 Stage 2 One-Stage Design Joint analysis SNPs Samples Stage 1 Stage 2 Two-Stage Design

Multistage Designs Joint analysis has more power than replication p-value in Stage 1 must be liberal Lower cost—do not gain power

QC Steps Filter SNPs and Individuals – MAF, Low call rates Test for HWE among controls & within ethnic groups. Use conservative alpha-level Check for relatedness. Identity-by-state calculations.

Analysis of GWAS Most common approach: look at each SNP one-at-a-time. Possibly add in multi-marker information. Further investigate / report top SNPs only. Or backwards replication… P-values

GWAS Analysis Most commonly trend test. Log additive model, logistic regression. Adjust for potential population stratification.

Quantile-Quantile (QQ) Plot

chromosome Example: GWAS of Prostate Cancer Witte, Nat Genet 2007 Multiple prostate cancer loci on 8q24

LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A x10 -9 EHBP1: endocytic trafficking 3p12rs C/T x10 -8 Intergenic 6q25rs C/T x SLC22A3: drugs and toxins. 7q21rs T/C x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs C/A x Intergenic 8q24 (3)rs T/G x Intergenic 8q24 (1)rs C/A x Intergenic 10q11rs C/T x MSMB: suppressor prop. 10q26rs T/C x10 -8 CTBP2: antiapoptotic activity 11q13rs T/G x Intergenic 17q12rs G/A x HNF1B: suppressor properties 17q24rs T/G x Intergenic 19q13rs A/G x KLK2/KLK3: PSA Xp11rs T/C x10 -9 NUDT10, NUDT11: apoptosis Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs

LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A x10 -9 EHBP1: endocytic trafficking 3p12rs C/T x10 -8 Intergenic 6q25rs C/T x SLC22A3: drugs and toxins. 7q21rs T/C x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs C/A x Intergenic 8q24 (3)rs T/G x Intergenic 8q24 (1)rs C/A x Intergenic 10q11rs C/T x MSMB: suppressor prop. 10q26rs T/C x10 -8 CTBP2: antiapoptotic activity 11q13rs T/G x Intergenic 17q12rs G/A x HNF1B: suppressor properties 17q24rs T/G x Intergenic 19q13rs A/G x KLK2/KLK3: PSA Xp11rs T/C x10 -9 NUDT10, NUDT11: apoptosis Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs

LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A x10 -9 EHBP1: endocytic trafficking 3p12rs C/T x10 -8 Intergenic 6q25rs C/T x SLC22A3: drugs and toxins. 7q21rs T/C x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs C/A x Intergenic 8q24 (3)rs T/G x Intergenic 8q24 (1)rs C/A x Intergenic 10q11rs C/T x MSMB: suppressor prop. 10q26rs T/C x10 -8 CTBP2: antiapoptotic activity 11q13rs T/G x Intergenic 17q12rs G/A x HNF1B: suppressor properties 17q24rs T/G x Intergenic 19q13rs A/G x KLK2/KLK3: PSA Xp11rs T/C x10 -9 NUDT10, NUDT11: apoptosis SNPs Missed in Replication? Witte, Nat Rev Genet, ,223 smallest P-value!

Manolio et al. Clin Invest 2008www.genome.gov/gwastudies Prostate Cancer

Population Attributable Risks for GWAS Jorgenson & Witte, 2009 Smoking & lung cancer BRCA1 & Breast cancer

Limitations of GWAS Not very predictive Witte, Nat Rev Genet 2009 Example: AUC for Br Cancer Risk Gail = 58% SNPs = 58.9% G + S = 61.8% Wacholder et al. NEJM 2010

Limitations of GWAS Not very predictive Explain little heritability Focus on common variation Many associated variants are not causal

Where’s the Heritability? McCarthy et al., 2008 Many more of these? See: NEJM, April 30, 2009 Common disease rare variant (CDRV) hypothesis: diseases due to multiple rare variants with intermediate penetrances (allelic heterogeneity)

Will GWAS results explain more heritability? Possibly, if… 1.Causal SNPs not yet detected due to power / practical issues (e.g., not yet included in replication studies). 2.Stronger effects for causal SNPs: Associated SNP may only serve as a marker for multiple different causal SNPs.

Imputation of SNP Genotypes Estimate unmeasured or missing genotypes. Based on measured SNPs and external info (e.g., haplotype structure of HapMap). Increase GWAS power. Allow for combining data across different platforms (e.g., Affy & Illumina) (for replication / meta- analysis).

Imputation Example Study Sample HapMap/ 1K genomes Gonçalo Abecasis

Identify Match with Reference Gonçalo Abecasis

Phase chromosomes, impute missing genotypes Gonçalo Abecasis

Imputation Application Chromosomal Position Marchini Nature Genetics TCF7L2 gene region & T2D from the WTCCC data Observed genotypes black Imputed genotypes red.

Genome-wide Sequence Studies Trade off between number of samples, depth, and genomic coverage. MAF Sample SizeDepth0.5-1%2-5% 1,00020xperfect 2,00010xr 2 =0.98r 2 = ,0005xr 2 =0.90r 2 =0.98 Goncalo Abecasis

Near-term Design Choices For example, between: 1.Sequencing few subjects with extreme phenotypes: e.g., 200 cases, 200 controls, 4x coverage. Then follow- up in larger population. 2.10M SNP chip based on 1,000 genomes. 5K cases, 5K controls. Which design will work best…?

Many weak associations combine to risk? Score model: where – ln(OR i ) = ‘score’ for SNP i from ‘discovery’ sample – SNP ij = # of alleles (0,1,2) for SNP i, person j in ‘validation’ sample. – Large number of SNPs (m) x j associated with disease? Polygenic Models ISC / Purcell et al. Nature 2009

Purcell / ISC et al. Nature 2009 Application of Model

Application to CGEMs PCa GWAS 1,172 cases, 1,157 controls from PLCO Trial Oversampled more aggressive cases. Illumina 550K array. PCa & stratified by disease aggressiveness. Split into halves, resampling: – one as ‘discovery’ sample; – other as ‘validation’. LD filter: r 2 = 0.5. Witte & Hoffman 2010

Results for Prostate Cancer

Nat Rev Cancer 2010;10: Common Polygenic Model for Prostate and Breast Cancer? - CGEMs GWAS data on prostate and breast cancer. - Use one cancer as ‘discovery’ sample, the other as ‘validation’.

Results for PCa & BrCa

Complex diseases Diabetes Obesity Diet Physical activity Hypertension Hyperlipidemia Vulnerable plaques Atherosclerosis MI Genetic susceptibility Complex diseases: Many causes = many causal pathways!

Pathways Many websites / companies provide ‘dynamic’ graphic models of molecular and biochemical pathways. Example: BioCarta: May be interested in potential joint and/or interaction effects of multiple genes in one pathway.

Moving Beyond Genome Transcriptome: All messenger RNA molecules (‘transcripts’) Proteome: All proteins in cell or organism Metabolome: all metabolites in a biological organism (end products of its gene expression). Systems Biology