Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

What is an association study? Define linkage disequilibrium
Review of main points from last week Medical costs escalating largely due to new technology This is an ethical/social problem with major conseq. Many new.
Association Tests for Rare Variants Using Sequence Data
Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
METHODS FOR HAPLOTYPE RECONSTRUCTION
Genetic Analysis in Human Disease
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Recursive Partitioning Method on Survival Outcomes for Personalized Medicine 2nd International Conference on Predictive, Preventive and Personalized Medicine.
1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North.
Diabetes Genome Wide Association Alessandra C Goulart Ida Hatoum Stalo Karageorgi Mara Meyer EPI293 January 2008 Harvard School of Public Health Alessandra.
Genome-Wide Association Studies (GWAS) Epidemiology 243 Molecular Epidemiology of Cancer Spring 2008.
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
Basic Elements of Testing Hypothesis Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director, Data Coordinating Center College.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis Dinu et al, J. Biomedical.
Using biological networks to search for interacting loci in genome-wide association studies Mathieu Emily et. al. European journal of human genetics, e-pub.
BNFO 602 Lecture 2 Usman Roshan. Bioinformatics problems Sequence alignment: oldest and still actively studied Genome-wide association studies: new problem,
Genome-Wide Association Studies
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Study Design Discussion The Ghost of Candidate Gene Past and the Ghost of Genome-wide Association Yet to Come Stephen S. Rich, Ph.D. Wake Forest University.
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Analysis of genome-wide association studies
Comments on Rare Variants Analyses Ryo Yamada Kyoto University 2012/08/27 Japan.
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
Alexander Statnikov Discovery Systems Laboratory Department of Biomedical Informatics Vanderbilt University 10/3/
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
Figure S1. Quantile-quantile plot in –log10 scale for the individual studies The red line represents concordance of observed and expected values. The shaded.
Gene Hunting: Linkage and Association
Genome-Wide Association Study (GWAS)
BGRS 2006 SEARCH FOR MULTI-SNP DISEASE ASSOCIATION D. Brinza, A. Perelygin, M. Brinton and A. Zelikovsky Georgia State University, Atlanta, GA, USA 123.
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Sequential & Multiple Hypothesis Testing Procedures for Genome-wide Association Scans Qunyuan Zhang Division of Statistical Genomics Washington University.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Genome wide association studies (A Brief Start)
The International Consortium. The International HapMap Project.
C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Genome-Wide Association Studies (GWAS) ulty/zhang/Webpages/zhang/courses/epi243_07/lectures/Genome-
Statistical Analysis of Candidate Gene Association Studies (Categorical Traits) of Biallelic Single Nucleotide Polymorphisms Maani Beigy MD-MPH Student.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Genome-Wides Association Studies (GWAS) Veryan Codd.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Population stratification
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
Genomic Analysis: GWAS
Genome Wide Association Studies using SNP
Gene-set analysis Danielle Posthuma & Christiaan de Leeuw
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Genome-wide Associations
Beyond GWAS Erik Fransen.
Association Mapping Lon Cardon
Presentation transcript:

Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director, Center for Human Genomics Wake Forest University School of Medicine GWA ─ promising but challenging

Outline The need for genome-wide association studies The reality of genome-wide association studies Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

The need for GWA Current understanding of disease etiology is limited Therefore, candidate genes or pathways are insufficient Current understanding of functional variants is limited Therefore, the focusing on nonsynonymous changes is not sufficient Results from linkage studies are often inconsistent and broad Therefore, the utility of identified linkage regions is limited GWA studies offer an effective and objective approach Better chance to identify disease associated variants Improve understanding of disease etiology Improve ability to test gene-gene interaction and predict disease risk

GWA is promising Many diseases and traits are influenced by genetic factors i.e., they are caused by sequence variants in the genome Over 6 millions SNPs are known in the genome i.e., some SNPs will be directly or indirectly associated with causal variants The cost of SNP Genotyping is reduced i.e., it is affordable to genotype a large number of SNPs in the genome Large numbers of cases and controls are available i.e., there is statistical power to detect variants with modest effect When the above conditions are met… …associated SNPs will have different frequencies between cases and controls

GWA is challenging Many diseases and traits are influenced by genetic factors But probably due to multiple modest risk variants They confer a stronger risk when they interact True associated SNPs are not necessary highly significant Too many SNPs are evaluated False positives due to multiple tests Single studies tend to be underpowered False negatives Considerable heterogeneity among studies Phenotypic and genetic heterogeneity False positives due to population stratification

Reality of GWA AMD, IBD, T1D, etc. Parkinson’s, nicotine dependence, T2D, etc. Prostate cancer, breast cancer, and other ongoing studies Heart diseases, lung diseases, psychiatric diseases, inflammatory diseases, cancers, and many other studies that are in planning stages

Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

Genome coverage Two major platforms for GWA Illumina: HumanHap300, HumanHap550, and HumanHap1M Affymetrix: GeneChip 100K, 500K, and 1M Genome-wide coverage The percentage of known SNPs in the genome that are in LD with the genotyped SNPs Calculated based on HapMap Calculated based on ENCODE

Genome coverage Genome-wide coverage Genome coverage of common SNPs (MAF ≥ 0.05) Genome coverage of rare SNPs Genome coverage using multi-markers Pe’er, 2006

Genome coverage Genome coverage for common SNPs (MAF ≥ 0.05) Pe’er, 2006

Genome coverage Genome coverage for common SNPs (MAF ≥ 0.05) Genome coverage for common and rare SNPs Pe’er, 2006

Genome coverage Genome coverage of common SNPs (MAF ≥ 0.05) Genome coverage of common and rare SNPs Genome coverage using multi-markers Pe’er, 2006

Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

Strategies for pre-association analysis Quality control Filter SNPs by genotype call rates Filter SNPs by minor allele frequencies Filter SNPs by testing for Hardy-Weinberg Equilibrium

Strategies for pre-association analysis Quality control Quantile-quantile plot (Q-Q plot) Evaluate whether there is an upward bias in association tests

Q-Q plot Clayton, 2006 Adjust for stratificationFilter by call rateAll SNPs

Strategies for pre-association analysis Quality control Quantile-quantile plot (Q-Q plot) Population stratification Genomic control Correct for stratification by adjusting association statistics at each SNP by a uniform overall inflation factor Is susceptible to over or under adjustment

Strategies for pre-association analysis Quality control Quantile-quantile plot (Q-Q plot) Population stratification Genomic control Structure (STRUCTURE) Used to assign the samples to discrete subpopulation clusters and then aggregate evidence of association within each cluster Estimate individual proportion of ancestry and treat it as a covariate Computationally intensive when there are a large number of AIMs

Strategies for pre-association analysis Quality control Quantile-quantile plot (Q-Q plot) Population stratification Genomic control Structure (STRUCTURE) Principal component analysis (EIGENSTRAT) Identify several eigenvectors (ancestries or geographic regions) Adjust genotypes and phenotypes along each eigenvector Compute association statistics using adjusted genotypes and phenotypes No need for AIMs

Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

Strategies for association analysis Single SNP analysis using pre-specified genetic models 2 x 3 table (2-df) Additive model (1-df), and test for additivity All possible genetic models

Strategies for association analysis Single SNP analysis using pre-specified genetic models Haplotype analysis Two-marker and three-marker slide Multi-marker Within haplotype block Between two recombination hot spots

Strategies for association analysis Single SNP analysis using pre-specified genetic models Haplotype analysis Gene-gene and gene-environment interactions Interaction with main effect Logistic regression Interaction without main effect: data mining Classification and recursive tree (CART) Multifactor Dimensionality Reduction (MDR)

Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

Sample size and false positives Estimate sample size Sample size OR MAF Type I error Power Quanto Effective sample size

Sample size and false positives Estimate sample size False positives: too many dependent tests Adjust for number of tests Bonferroni correction  Nominal significance level = study-wide significance / number of tests  Nominal significance level = 0.05/500,000 = Effective number of tests  Take LD into account Permutation procedure  Permute case-control status  Mimic the actual analyses  Obtain empirical distribution of maximum test statistic under null hypothesis

Sample size and false positives Estimate sample size False positives: too many dependent tests Adjust for number of tests False discovery rate (FDR) Expected proportion of false discoveries among all discoveries Offers more power than Bonferroni Holds under weak dependence of the tests

Sample size and false positives Estimate sample size False positives: too many dependent tests Adjust for number of tests False discovery rate (FDR) Bayesian approach Taking a priori into account, False-Positive Report Probability (FPRP)

Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

Confirmation in independent study populations The above approaches may limit the number of false positives Confirmation is needed to dissect true from false positives Replication, examine the results from the 2 nd stage only Joint analysis, combining data from 1 st stage with 2 nd stage Multiple stages

Replication vs. joint analysis Skol, 2006

Multiple stages 1 st stage # of true sig. SNPs (80% power) # of total sig. SNPs (  = 0.01) # of Risk SNPs # of SNPs tested % of true sig. SNPs , , % 2 nd stage 16 5, % 3 rd stage %

Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotype, e.g. aggressive or early onset

Study aggressive cases

Increase the magnitude of effects of a specific gene Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset Cases with family history

Study cases with family history Antoniou and Easton, 2003

Increase the magnitude of effects of a specific gene Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset Cases with family history Controls that are disease free

Disease free controls

Increase the magnitude of effects of a specific gene Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset Cases with family history Controls that are disease free Increase their effects by studying a homogeneous population Lower levels of genetic heterogeneity

Summary GWA studies are promising but difficult There are many important issues in GWA The impact of these issues can be minimized by a well- designed study