Lecture 7 From GWAS to EWAS & Interpretation of epigenetic data

Slides:



Advertisements
Similar presentations
A quantitative trait locus not associated with cognitive ability in children: a failure to replicate Hill, L. et al.
Advertisements

Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Genetic Analysis in Human Disease
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Basics of Linkage Analysis
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI.
1 FSTL4 and SEMA5A are associated with alcohol dependence: meta- analysis of two genome-wide association studies Kesheng Wang, PhD Department of Biostatistics.
Differentially expressed genes
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis Dinu et al, J. Biomedical.
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
BS704 Class 7 Hypothesis Testing Procedures
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Epigenetic Mechanisms of Environmentally-mediated Cardiovascular Disease Risk in the WHI Andrea Baccarelli, MD, PhD, MPH Lab of Environmental Epigenetics.
Multiple testing correction
Genome-Wide Association (GWA) Studies National Human Genome Research Institute National Institutes of Health U.S. Department of Health and Human Services.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Molecular and Genetic Epidemiology Kathryn Penney, ScD January 5, 2012.
Measures of Association
The Center for Medical Genomics facilitates cutting-edge research with state-of-the-art genomic technologies for studying gene expression and genetics,
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
Figure S1. Quantile-quantile plot in –log10 scale for the individual studies The red line represents concordance of observed and expected values. The shaded.
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Epigenetics Heritable characteristics of the genome other than the DNA sequence Heritable during cell-division (mitosis) To a lesser extent also over generations.
Prospective Evaluation of B-type Natriuretic Peptide Concentrations and the Risk of Type 2 Diabetes in Women B.M. Everett, N. Cook, D.I. Chasman, M.C.
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.
Multiple Testing Matthew Kowgier. Multiple Testing In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical.
Other genomic arrays: Methylation, chIP on chip… UBio Training Courses.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.
Design of Clinical Research Studies ASAP Session by: Robert McCarter, ScD Dir. Biostatistics and Informatics, CNMC
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Assessment of genomewide association studies Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Supplemental Figure 1. False trans association due to probe cross-hybridization and genetic polymorphism at single base extension site. (A) The Infinium.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
A short introduction to epidemiology Chapter 6: Precision Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Ethnic variation in methylation of birth weight and length Presenter: Zahra Sohani Supervisor: Dr. Sonia Anand.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Lecture 11 Epigenetics of Aging Andrea Baccarelli, MD, PhD, MPH Laboratory of Environmental Epigenetics Harvard School of Public Health
Discovery of Multiple Differentially Methylated Regions
Genome Wide Association Studies using SNP
High level GWAS analysis
Genome-Wide Identification and Validation of a Novel Methylation Biomarker, SDC2, for Blood-Based Detection of Colorectal Cancer  TaeJeong Oh, Nayoung.
Fig. 2 Genotype-induced differential gene expression is different in MDMi cells compared to monocytes. Genotype-induced differential gene expression is.
Presentation transcript:

Lecture 7 From GWAS to EWAS & Interpretation of epigenetic data Andrea Baccarelli, MD, PhD, MPH Laboratory of Environmental Epigenetics Harvard School of Public Health abaccare@hsph.harvard.edu Lecture 7 From GWAS to EWAS & Interpretation of epigenetic data This presentation demonstrates the new capabilities of PowerPoint and it is best viewed in Slide Show. These slides are designed to give you great ideas for the presentations you’ll create in PowerPoint 2010! For more sample templates, click the File tab, and then on the New tab, click Sample Templates.

Candidate gene approach Genetics Candidate gene approach A priori knowledge → candidate genes test for association with disease/phenotype Genome-wide approach (GWAS) Agnostic approach → entire genome

Graphical representation of GWAS findings Manhattan plot Systemic Sclerosis (auto-immune disease) Radstake et al., Nature Genetics 2010

Published Genome-Wide Associations through 12/2013 Published GWA at p≤5X10-8 for 17 trait categories NHGRI GWA Catalog www.genome.gov/GWAStudies www.ebi.ac.uk/fgpt/gwas/

Candidate gene (gene-specific) approach Epigenetics Candidate gene (gene-specific) approach A priori knowledge → candidate genes test for association with exposure/risk factor test for association with disease/phenotype Global (average) level of methylation (5mC content) Average methylation of all CpG sites across the genome Epigenome-wide approach (EWAS) Agnostic approach → entire genome

Examples for DNA methylation Candidate gene approach AAB’s blood has 26% methylation in the IL6 promoter (N.B.: any other region of interest can be targeted, e.g., CpGi shore, shelf, etc.) Global methylation approach AAB’s blood has 4.5% methylation (i.e., 4.5% of all cytosines found in blood are methylated; no information on where the methylated cytosines are located) Genome-wide approach Methylation in AAB’s blood is measured at a high number of CpG sites (e.g, if we use Illumina Infinium 450K beadchip → we will get ≈486,000 numbers [one for each CpG site] for AAB’s blood)

Screen for 100Ks to millions of loci: GWAS/EWAS Screen for 100Ks to millions of loci: GWAS: Single nucleotide polymorphisms (SNPs) EWAS: CpG sites The EWAS field is relatively new Several tools are methods are inferred from GWAS

Features covered in the 450k Infinium BeadChip The 450K BeadChip covers a total of 77,537 CpG Islands and CpG Shores (N+S) Region Type Regions CpG sites covered on 450K BeadChip array Average # of CpG sites per region CpG Island 26,153 139,265 5.08 N Shore 25,770 73,508 2.74 S Shore 25,614 71,119 2.66 N Shelf 23,896 49,093 1.97 S Shelf 23,968 48,524 1.94 Remote/Unassigned - 104,926 Total   485,553 N Shelf N Shore CpG Island S Shore S Shelf 5’ UTR 3’ UTR TSS1500 TSS200 The 450K BeadChip covers a total of 20,617 genes

GWAS vs. EWAS Type of data Changes over time Tissue specificity GWAS: SNP can assume only 3 values: 0 (wt/wt); 1 (wt/var); 2 (var/var) EWAS: measures are quantitave: e.g.: Illumina infinium β value between 0 and 1 Changes over time GWAS: SNPs (almost) never change EWAS: epigenetic marks change over time Tissue specificity GWAS: SNPs are not tissue specific EWAS: epigenetic marks are tissue specific

Vulcano plot Differences between liver cancer cases and controls Shen Hepatology 2012

Infinium 450K methylation BeadChip Multiple comparisons Infinium 450K methylation BeadChip Methylation measured at 485,553 CpG sites We will do 485,553 statistical tests Any problem with that? If you conduct 20 tests at α=0.05 one significant (positive) by chance at p<0.05 If you conduct 485,553 tests 24,277 significant (positives) by chance at p<0.05

Statistical corrections for multiple comparisons Bonferroni correction Multiple tests inflate the cumulative α Dividing α/485,553 solves the problem Threshold for significance commonly set at p = 0.05/485,553 = 1.0e-7 False discovery rate (FDR) Focuses on positive (significant) findings at a ‘nominal’ uncorrected p-value FDR is the proportion of false positives among all positive findings FDR controlling procedures have been developed to control the expected proportion of false positives (e.g., Benjamini Hockberg)

(Proportion of false positives) True association FP P-value = TN + FP YES NO Probability of a false positive finding under the null hypothesis (i.e., no true association) True Positive False Positive Positive FP P-value FDR = TP + FP False Negative True Negative If I have a number X of significant p-values, how many are false positives? (Proportion of false positives) Negative

Learning from past experience (in genetics) Relative odds of alcohol dependency associated with Taq1A polymorphism 1990 OR=8.7 Original OR=8.7 1995 Odds Ratio as a Function of Publication Year 1999 Final OR=1.4 2004 Smith et al. (2008) American Journal of Epidemiology, 167(2): 125-138.

The winner’s curse On ebay – Given the lack of information on the true value of the item being auctioned High variance in the estimated (dollar) values many over-and many under-estimates (bids) The “winner” is likely to have made the largest overestimate of value i.e., he or she is paying (way) too much In genetics – The winner’s curse has been common the first report of an association of genetic variation with disease is likely to overestimate the effect size In epigenetics: Does the same apply?

Replication is needed Replication Hirschhorn & Daly Nat. Genet. Rev. 6: 95, 2005 NCI-NHGRI Working Group on Replication Nature 447: 655, 2007

Strategies for discovery and Replication We will review different approaches for discovery and replication Examples from published studies Examples from EWAS when available Same concepts apply to both EWAS and GWAS

EWAS validation – Study design Discovery only (Single study) Prone to false positive findings (negative too)

-66 cases of Hepatocellular carcinoma (HCC) assessed using 450K BeadChip -Differences in methylation in cancer tissues vs. adjacent non cancer tissues -Bonferroni-corrected p value ≤ 0.05; corresponds to a raw p value of ≤ 1.06 × 10−7 -After Bonferroni adjustment, a total of 130,512 CpG sites significantly differed in methylation level in tumor compared with non-tumor tissues, with 28,017 CpG sites hypermethylated and 102,495 hypomethylated in tumor tissues.

Additional filtering Hypermethylated sites Hypomethylated sites: mean difference in methylation tumor vs normal > 20% > 70% of the tumor tissues methylation >2SDs above mean methylation level of all 66 adjacent tissues mean methylation for adjacent tissues < 25% Hypomethylated sites: > 70% of the tumor tissues methylation >2SDs below mean methylation level of all 66 adjacent tissues

EWAS validation – Study design Discovery only (Single study) Prone to false positive findings (negative too) Internal Replication Sample two or more groups from the same population Group 1: EWAS; Other groups: candidate gene analysis Overall power lower than same-size discovery only (Skol AD, Nat Genet 2006).

All subjects from the ESTHER cohort in Germany Internal Replication Discovery on 177 participants from ESTHER (27K Infinium methylation BeadChip analysis) Replication on 316 participants from ESTHER (Sequenom MASS-ARRAY)

Discovery and replication groups

Discovery

Discovery → validation → replication (top gene)

EWAS validation – Study design Discovery only (Single study) Prone to false positive findings (negative too) Internal Replication Sample two or more groups from the same population Group 1: EWAS; Other groups: candidate gene analysis Overall power lower than same-size discovery only (Skol AD, Nat Genet 2006). Discovery > External (Independent) Replication Two (or more) independent studies Ensure validation + generalizability

Discovery: Cord blood and peripheral blood samples from 1018 ALSPAC child-mother pairs (450K Infinium methylation BeadChip analysis) External Replication: The WMHP and CANDLE cohort (27K Infinium methylation BeadChip analysis) The NB and MoBa cohort (450K Infinium methylation BeadChip analysis) And a case–control study (450K Infinium methylation BeadChip analysis)

Discovery → Replication Gestational Age: 224 top hits: GA had a negative association with methylation at 188 probes and a positive association at 36 probes 129 replicated in the NB cohort and 5 were replicated in the WMHP and CANDLE 72 previously reported in the case-control study Birth Weight: 23 associations observed between birth weight and cord blood methylation in the discovery study 2 out of 23 replicated in the MoBa cohort

EWAS validation – Study design Discovery only (Single study) Prone to false positive findings (negative too) Internal Replication Sample two or more groups from the same population Group 1: EWAS; Other groups: candidate gene analysis Overall power lower than same-size discovery only (Skol AD, Nat Genet 2006). Discovery > Replication Two (or more) independent studies Ensure validation + generalizability Meta-analysis Uses estimates from multiple populations Needed to achieve large sample size Allows for evaluating generalizability

44,494 participants of European ancestry from nine large studies participating in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium. seven additional studies Each study computes association statistics (e.g., ORs and p-values), then results are meta-analyzed Only results (not data) are shared

Results for intima media thickness

Forest plot for ZHX2 – rs11781551 (zinc fingers and homeoboxes 2)

Pontential biases in GWAS/EWAS

Population Stratification* Each population has unique genetic and social history; ancestral patterns of migration, mating, expansions/bottlenecks, stochastic variation all yield differences in allele frequencies between populations. Population stratification: cases and controls have different allele frequencies due to diversity in populations of origin and unrelated to outcome, requiring: 1) differences in disease prevalence 2) differences in allele frequencies *Cardon LR, Palmer LJ, Lancet 2003

What is population stratification? Balding, Nature Reviews Genetics 2010

Unlinked Genetic Markers in Population Stratification Population stratification (or any non-random mating) allows marker-allele frequencies to vary among population segments. Disease more prevalent in one subpopulation will be associated with any alleles in high frequency in that subpopulation. If population stratification exists, can often be detected by analysis of unlinked marker loci. [Pritchard JD, Rosenberg NA; AJHG 1999; 65:220-228] .

Adjusting for Population Stratification in a GWAS of T2DM* Case-control study of 661 cases of T2DM and 614 controls from France. Genotyping assayed 392,935 SNPs SNP 200kb from lactase gene on 2q21: Strong association with T2DM Strong north-south prevalence gradient in France Used 20,323 SNPs not related to T2DM as measure of population stratification. After adjustment for stratification, most of the association was removed. *Sladek R et al. Nature 2007; 445: 881-885.

Sources of analytical variability for methylation EWAS Several factors can affect results DNA/sample quality Plate effects Batch effect Row/column effect How to handle this Best laboratory practice Randomize/balance samples Universal DNA/Replicates Bioinformatics/Statistical analysis

Is DNA Collected and Handled Identically in Cases and Controls? T1DM gene association study: cases from GRID Study, controls from 1958 British Birth Cohort Study examining 6322 SNPs. Samples from lymphoblastoid cell lines extracted using same protocol in two different laboratories. Case and control DNAs randomly ordered with teams masked to case/control status. Some extreme associations could not be replicated by second genotyping method. Clayton DG et, Nat Genet 2005; 37: 1243-46.

Interpretation of epigenetic data

In-class Readings Papers Lee et al. Quantitative promoter hypermethylation analysis of RASSF1A in lung cancer: Comparison with methylation-specific PCR technique and clinical significance. Mol Med Report 2011. Joubert et al. 450K Epigenome-Wide Scan Identifies Differential DNA Methylation in Newborns Related to Maternal Smoking during Pregnancy. Environ Health Perspect 2012

In-class Readings Questions DNA methylation analysis: Which technique was used? How much DNA was used? Did it involve bisulfite treatment? Aim of the study: What was measured? Why? Results: How were DNA methylation results reported? Which statistical analysis was used?

Next lecture Guest Lectures: Reproductive Epigenetics and Prenatal Influences on the Epigenome Karin Michels, PhD, ScD Co-Director, Ob/Gyn Epidemiology Center, BWH Heather Herson Burris, MD, MPH Neonatology, BIDMC