Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 7 From GWAS to EWAS & Interpretation of epigenetic data

Similar presentations


Presentation on theme: "Lecture 7 From GWAS to EWAS & Interpretation of epigenetic data"— Presentation transcript:

1 Lecture 7 From GWAS to EWAS & Interpretation of epigenetic data
Andrea Baccarelli, MD, PhD, MPH Laboratory of Environmental Epigenetics Harvard School of Public Health Lecture 7 From GWAS to EWAS & Interpretation of epigenetic data This presentation demonstrates the new capabilities of PowerPoint and it is best viewed in Slide Show. These slides are designed to give you great ideas for the presentations you’ll create in PowerPoint 2010! For more sample templates, click the File tab, and then on the New tab, click Sample Templates.

2 Candidate gene approach
Genetics Candidate gene approach A priori knowledge → candidate genes test for association with disease/phenotype Genome-wide approach (GWAS) Agnostic approach → entire genome

3

4 Graphical representation of GWAS findings Manhattan plot
Systemic Sclerosis (auto-immune disease) Radstake et al., Nature Genetics 2010

5 Published Genome-Wide Associations through 12/2013
Published GWA at p≤5X10-8 for 17 trait categories NHGRI GWA Catalog

6 Candidate gene (gene-specific) approach
Epigenetics Candidate gene (gene-specific) approach A priori knowledge → candidate genes test for association with exposure/risk factor test for association with disease/phenotype Global (average) level of methylation (5mC content) Average methylation of all CpG sites across the genome Epigenome-wide approach (EWAS) Agnostic approach → entire genome

7 Examples for DNA methylation
Candidate gene approach AAB’s blood has 26% methylation in the IL6 promoter (N.B.: any other region of interest can be targeted, e.g., CpGi shore, shelf, etc.) Global methylation approach AAB’s blood has 4.5% methylation (i.e., 4.5% of all cytosines found in blood are methylated; no information on where the methylated cytosines are located) Genome-wide approach Methylation in AAB’s blood is measured at a high number of CpG sites (e.g, if we use Illumina Infinium 450K beadchip → we will get ≈486,000 numbers [one for each CpG site] for AAB’s blood)

8 Screen for 100Ks to millions of loci:
GWAS/EWAS Screen for 100Ks to millions of loci: GWAS: Single nucleotide polymorphisms (SNPs) EWAS: CpG sites The EWAS field is relatively new Several tools are methods are inferred from GWAS

9 Features covered in the 450k Infinium BeadChip
The 450K BeadChip covers a total of 77,537 CpG Islands and CpG Shores (N+S) Region Type Regions CpG sites covered on 450K BeadChip array Average # of CpG sites per region CpG Island 26,153 139,265 5.08 N Shore 25,770 73,508 2.74 S Shore 25,614 71,119 2.66 N Shelf 23,896 49,093 1.97 S Shelf 23,968 48,524 1.94 Remote/Unassigned - 104,926 Total 485,553 N Shelf N Shore CpG Island S Shore S Shelf 5’ UTR 3’ UTR TSS1500 TSS200 The 450K BeadChip covers a total of 20,617 genes

10 GWAS vs. EWAS Type of data Changes over time Tissue specificity
GWAS: SNP can assume only 3 values: 0 (wt/wt); 1 (wt/var); 2 (var/var) EWAS: measures are quantitave: e.g.: Illumina infinium β value between 0 and 1 Changes over time GWAS: SNPs (almost) never change EWAS: epigenetic marks change over time Tissue specificity GWAS: SNPs are not tissue specific EWAS: epigenetic marks are tissue specific

11 Vulcano plot Differences between liver cancer cases and controls
Shen Hepatology 2012

12 Infinium 450K methylation BeadChip
Multiple comparisons Infinium 450K methylation BeadChip Methylation measured at 485,553 CpG sites We will do 485,553 statistical tests Any problem with that? If you conduct 20 tests at α=0.05 one significant (positive) by chance at p<0.05 If you conduct 485,553 tests 24,277 significant (positives) by chance at p<0.05

13 Statistical corrections for multiple comparisons
Bonferroni correction Multiple tests inflate the cumulative α Dividing α/485,553 solves the problem Threshold for significance commonly set at p = 0.05/485,553 = 1.0e-7 False discovery rate (FDR) Focuses on positive (significant) findings at a ‘nominal’ uncorrected p-value FDR is the proportion of false positives among all positive findings FDR controlling procedures have been developed to control the expected proportion of false positives (e.g., Benjamini Hockberg)

14 (Proportion of false positives)
True association FP P-value = TN + FP YES NO Probability of a false positive finding under the null hypothesis (i.e., no true association) True Positive False Positive Positive FP P-value FDR = TP + FP False Negative True Negative If I have a number X of significant p-values, how many are false positives? (Proportion of false positives) Negative

15 Learning from past experience (in genetics)
Relative odds of alcohol dependency associated with Taq1A polymorphism 1990 OR=8.7 Original OR=8.7 1995 Odds Ratio as a Function of Publication Year 1999 Final OR=1.4 2004 Smith et al. (2008) American Journal of Epidemiology, 167(2):

16

17 The winner’s curse On ebay – Given the lack of information on the true value of the item being auctioned High variance in the estimated (dollar) values many over-and many under-estimates (bids) The “winner” is likely to have made the largest overestimate of value i.e., he or she is paying (way) too much In genetics – The winner’s curse has been common the first report of an association of genetic variation with disease is likely to overestimate the effect size In epigenetics: Does the same apply?

18 Replication is needed Replication
Hirschhorn & Daly Nat. Genet. Rev. 6: 95, 2005 NCI-NHGRI Working Group on Replication Nature 447: 655, 2007

19 Strategies for discovery and Replication
We will review different approaches for discovery and replication Examples from published studies Examples from EWAS when available Same concepts apply to both EWAS and GWAS

20 EWAS validation – Study design
Discovery only (Single study) Prone to false positive findings (negative too)

21 -66 cases of Hepatocellular carcinoma (HCC) assessed using 450K BeadChip
-Differences in methylation in cancer tissues vs. adjacent non cancer tissues -Bonferroni-corrected p value ≤ 0.05; corresponds to a raw p value of ≤ 1.06 × 10−7 -After Bonferroni adjustment, a total of 130,512 CpG sites significantly differed in methylation level in tumor compared with non-tumor tissues, with 28,017 CpG sites hypermethylated and 102,495 hypomethylated in tumor tissues.

22 Additional filtering Hypermethylated sites Hypomethylated sites:
mean difference in methylation tumor vs normal > 20% > 70% of the tumor tissues methylation >2SDs above mean methylation level of all 66 adjacent tissues mean methylation for adjacent tissues < 25% Hypomethylated sites: > 70% of the tumor tissues methylation >2SDs below mean methylation level of all 66 adjacent tissues

23 EWAS validation – Study design
Discovery only (Single study) Prone to false positive findings (negative too) Internal Replication Sample two or more groups from the same population Group 1: EWAS; Other groups: candidate gene analysis Overall power lower than same-size discovery only (Skol AD, Nat Genet 2006).

24 All subjects from the ESTHER cohort in Germany Internal Replication
Discovery on 177 participants from ESTHER (27K Infinium methylation BeadChip analysis) Replication on 316 participants from ESTHER (Sequenom MASS-ARRAY)

25 Discovery and replication groups

26 Discovery

27 Discovery → validation → replication (top gene)

28 EWAS validation – Study design
Discovery only (Single study) Prone to false positive findings (negative too) Internal Replication Sample two or more groups from the same population Group 1: EWAS; Other groups: candidate gene analysis Overall power lower than same-size discovery only (Skol AD, Nat Genet 2006). Discovery > External (Independent) Replication Two (or more) independent studies Ensure validation + generalizability

29 Discovery: Cord blood and peripheral blood samples from 1018 ALSPAC child-mother pairs (450K Infinium methylation BeadChip analysis) External Replication: The WMHP and CANDLE cohort (27K Infinium methylation BeadChip analysis) The NB and MoBa cohort (450K Infinium methylation BeadChip analysis) And a case–control study (450K Infinium methylation BeadChip analysis)

30 Discovery → Replication
Gestational Age: 224 top hits: GA had a negative association with methylation at 188 probes and a positive association at 36 probes 129 replicated in the NB cohort and 5 were replicated in the WMHP and CANDLE 72 previously reported in the case-control study Birth Weight: 23 associations observed between birth weight and cord blood methylation in the discovery study 2 out of 23 replicated in the MoBa cohort

31 EWAS validation – Study design
Discovery only (Single study) Prone to false positive findings (negative too) Internal Replication Sample two or more groups from the same population Group 1: EWAS; Other groups: candidate gene analysis Overall power lower than same-size discovery only (Skol AD, Nat Genet 2006). Discovery > Replication Two (or more) independent studies Ensure validation + generalizability Meta-analysis Uses estimates from multiple populations Needed to achieve large sample size Allows for evaluating generalizability

32 44,494 participants of European ancestry
from nine large studies participating in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium. seven additional studies Each study computes association statistics (e.g., ORs and p-values), then results are meta-analyzed Only results (not data) are shared

33 Results for intima media thickness

34 Forest plot for ZHX2 – rs11781551 (zinc fingers and homeoboxes 2)

35 Pontential biases in GWAS/EWAS

36 Population Stratification*
Each population has unique genetic and social history; ancestral patterns of migration, mating, expansions/bottlenecks, stochastic variation all yield differences in allele frequencies between populations. Population stratification: cases and controls have different allele frequencies due to diversity in populations of origin and unrelated to outcome, requiring: 1) differences in disease prevalence 2) differences in allele frequencies *Cardon LR, Palmer LJ, Lancet 2003

37 What is population stratification?
Balding, Nature Reviews Genetics 2010

38 Unlinked Genetic Markers in Population Stratification
Population stratification (or any non-random mating) allows marker-allele frequencies to vary among population segments. Disease more prevalent in one subpopulation will be associated with any alleles in high frequency in that subpopulation. If population stratification exists, can often be detected by analysis of unlinked marker loci. [Pritchard JD, Rosenberg NA; AJHG 1999; 65: ] .

39 Adjusting for Population Stratification in a GWAS of T2DM*
Case-control study of 661 cases of T2DM and 614 controls from France. Genotyping assayed 392,935 SNPs SNP 200kb from lactase gene on 2q21: Strong association with T2DM Strong north-south prevalence gradient in France Used 20,323 SNPs not related to T2DM as measure of population stratification. After adjustment for stratification, most of the association was removed. *Sladek R et al. Nature 2007; 445:

40 Sources of analytical variability for methylation EWAS
Several factors can affect results DNA/sample quality Plate effects Batch effect Row/column effect How to handle this Best laboratory practice Randomize/balance samples Universal DNA/Replicates Bioinformatics/Statistical analysis

41 Is DNA Collected and Handled Identically in Cases and Controls?
T1DM gene association study: cases from GRID Study, controls from 1958 British Birth Cohort Study examining 6322 SNPs. Samples from lymphoblastoid cell lines extracted using same protocol in two different laboratories. Case and control DNAs randomly ordered with teams masked to case/control status. Some extreme associations could not be replicated by second genotyping method. Clayton DG et, Nat Genet 2005; 37:

42 Interpretation of epigenetic data

43 In-class Readings Papers
Lee et al. Quantitative promoter hypermethylation analysis of RASSF1A in lung cancer: Comparison with methylation-specific PCR technique and clinical significance. Mol Med Report 2011. Joubert et al. 450K Epigenome-Wide Scan Identifies Differential DNA Methylation in Newborns Related to Maternal Smoking during Pregnancy. Environ Health Perspect 2012

44 In-class Readings Questions
DNA methylation analysis: Which technique was used? How much DNA was used? Did it involve bisulfite treatment? Aim of the study: What was measured? Why? Results: How were DNA methylation results reported? Which statistical analysis was used?

45 Next lecture Guest Lectures: Reproductive Epigenetics and Prenatal Influences on the Epigenome Karin Michels, PhD, ScD Co-Director, Ob/Gyn Epidemiology Center, BWH Heather Herson Burris, MD, MPH Neonatology, BIDMC


Download ppt "Lecture 7 From GWAS to EWAS & Interpretation of epigenetic data"

Similar presentations


Ads by Google