Presentation is loading. Please wait.

Presentation is loading. Please wait.

Epigenomic views of human disease reveal 1000s of regulatory variants

Similar presentations


Presentation on theme: "Epigenomic views of human disease reveal 1000s of regulatory variants"— Presentation transcript:

1 Epigenomic views of human disease reveal 1000s of regulatory variants
Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory

2 Interpreting complex disease: from regions to models
Gene annotation (Coding, 5’/3’UTR, RNAs) Evolutionary signatures Roles in gene/chromatin regulation  Activator/repressor signatures CATGACTG CATGCCTG Disease-associated variant (SNP/CNV/…) Non-coding annotation  Chromatin signatures Other evidence of function  Signatures of selection (sp/pop) Challenge: from loci to mechanism, pathways, drug targets Need: A systems-level understanding of genomes and gene regulation The regulators: Transcription factors, microRNAs, sequence specificities The regions: enhancers, promoters, and their tissue-specificity The targets: TFstargets, regulatorsenhancers, enhancersgenes The grammars: Interplay of multiple TFs  prediction of gene expression  The parts list = Building blocks of genome/disease regulatory networks add cartoon image here (remember slide is copied below) 2

3 Systems-level views of disease epigenomics
Chromatin states help interpret disease associations Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators  enhancers  targets Mechanistic predictions, T1D-associated enhancers Global methylation changes in Alzheimer’s Disease Little variability between individuals, genotype-driven Most variable regions: promoter-flanking, brain enhancers Global inhibition of probes. Predictive power for AD Enhancers, not promoters. Targets of NRSF, ELK1, CTCF Conclusions: Power of regulatory annotation for interpreting disease 1000s of regions functionally associated with disease Weak associations, concentrated in regulatory pathways

4 Interpreting disease-association signals
(1) Interpret variants using ENCODE - Chromatin states: Enhancers, promoters, motifs - Enrichment in individual loci, across 1000s of SNPs in T1D CATGACTG CATGCCTG GWAS Genotype Disease (2) Epigenome changes in disease - Molecular phenotypic changes in patients vs. controls - Small variation in brain methylomes, mostly genotype-driven - 1000s of brain-specific enhancers increase methylation in Alzheimer’s mQTLs MWAS

5 Chromatin states dynamics across nine cell types
Predicted linking Correlated activity Single annotation track for each cell type Summarize cell-type activity at a glance Can study 9-cell activity pattern across Key points to make: Chromatin states enabled us to study the dynamic nature of chromatin across many cell types. By distinguishing 15 different types of chromatin states, we could summarize all significant combinations of 81 different chromatin tracks and 2.4 billion reads in just nine chromatin annotation tracks, one for each cell type. For example, the same gene (WLS), is ‘poised’ in embryonic stem cells (ES), repressed in three other cell types (K562, blood, and liver), and active in the other five cell types. This allows us to now define ‘vectors’ of activity for each region of the genome, based on the chromatin annotation in the nine cell types. Ernst et al, Nature 2011

6 Enhancer-gene links supported by eQTL-gene links
eQTL study Validation rationale: Expression Quantitative Trait Loci (eQTLs) provide independent SNP-to-gene links Do they agree with activity-based links? 15kb Individuals Indiv. 1 -0.5 A Indiv. 2 -1.5 A Indiv. 3 -1.8 A Example: Lymphoblastoid (GM) cells study Expression/genotype across 60 individuals (Montgomery et al, Nature 2010) 120 eQTLs are eligible for enhancer-gene linking based on our datasets 51 actually linked (43%) using predictions  4-fold enrichment (10% exp. by chance) Indiv. 4 3.1 C Indiv. 5 1.1 A Indiv. 6 -1.8 A Indiv. 7 -1.4 A Indiv. 8 3.2 C Indiv. 9 4.4 C Independent validation of links. Relevance to disease datasets. Expression level of gene Sequence variant at distal position

7 Introducing multi-cell activity profiles
Link enhancers to target genes Gene expression Chromatin States Active TF motif enrichment TF regulator expression Dip-aligned motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 TF On TF Off Motif aligned Flat profile ON OFF Active enhancer Repressed Motif enrichment Motif depletion

8 Introducing multi-cell activity profiles
Link TFs to target enhancers Predict activators vs. repressors Gene expression Chromatin States Active TF motif enrichment TF regulator expression Dip-aligned motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 TF On TF Off Motif aligned Flat profile ON OFF Active enhancer Repressed Motif enrichment Motif depletion

9 Coordinated activity reveals activators/repressors
Activity signatures for each TF Enhancer activity Ex1: Oct4 predicted activator of embryonic stem (ES) cells Ex2: Gfi1 repressor of K562/GM cells Key points to make: Using these correlations in activity enabled us to start piecing together enhancer regulatory networks, which have been previously inaccessible, linking regulators to enhancers and enhancers to target genes. Putting it all together, we can (a) define 20 distinct profiles of activity (labeled A through T) across the nine cell types, (b) observe the expression patterns of associated genes, showing upward of 0.9 correlation with enhancer activity, (c) discover enriched regulatory motifs revealing candidate regulators, (d) distinguish activators and repressors based on positive or negative correlations between motif enrichment in active regions and expression of the corresponding regulator. [click-animate] For example, cluster Oct 4 is a predicted activator of enhancers active in embryonic stem (ES) cells. The motif is enriched in ES-specific enhancers (cluster A), and the Oct 4 TF is expressed specifically in the same cell type [click-animate] similarly, Ets is a predicted activator of cluster G, associated with GM and HUVEC activity but not either one alone. This is important for the next slide, as we predict that a disruption in the Ets1 motif in patients of lupus erythromatosus is responsible for disruption of the corresponding enhancer and disregulation of immunity gene HLA-DRB1 (Human Leukocyte Antigen) in the major histocompatibility locus. Enhancer networks: Regulator  enhancer  target gene

10 Systems-level views of disease epigenomics
Chromatin states help interpret disease associations Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators  enhancers  targets Mechanistic predictions, T1D-associated enhancers Global methylation changes in Alzheimer’s Disease Little variability between individuals, genotype-driven Most variable regions: promoter-flanking, brain enhancers Global inhibition of probes. Predictive power for AD Enhancers, not promoters. Targets of NRSF, ELK1, CTCF Conclusions: Power of regulatory annotation for interpreting disease 1000s of regions functionally associated with disease Weak associations, concentrated in regulatory pathways

11 Revisiting disease- associated variants
xx Revisiting disease- associated variants Disease-associated SNPs enriched for enhancers in relevant cell types E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator

12 Mechanistic predictions for top disease-associated SNPs
Lupus erythromatosus in GM lymphoblastoid Erythrocyte phenotypes in K562 leukemia cells ` Creation of repressor Gfi1 motif  Gain K562-specific repression  Loss of enhancer function  Loss of CCDC162 expression Disrupt activator Ets-1 motif  Loss of GM-specific activation  Loss of enhancer function  Loss of HLA-DRB1 expression

13 Detect SNPs that disrupt conserved regulatory motifs
Functionally-associated SNPs enriched in states, constraint Prioritize candidates, increase resolution, disrupted motifs

14 Automating prediction of likely causal variants in LD  HaploReg (compbio.mit.edu/HaploReg)
Start with any list of SNPs or select a GWA study Mine publically available ENCODE data for significant hits Hundreds of assays, dozens of cells, conservation, motifs Report significant overlaps and link to info/browser Ward and Kellis, NAR 2011

15 Functional enrichment for 1000s of SNPs
Beyond top few SNPs  entire rank list Abhishek Sarkar, Luke Ward

16 Studying functional enrichments down the rank list
Increase vs. expectation Enriched in high ranks Disease association P-Value (Rank all SNPs) Top ranks Bottom: least significant Expected at random Depletion vs. expectation Rank all SNPs by disease-association P-value Find annotations and cell types enriched in high ranks Estimate number of SNPs that show functional roles

17 1000s of GM/K562 enhancers contain Type1-Diabetes SNPs
Lymphoblastoid Leukemia Enhancers across cell types Chromatin states in GM12878 Enhancers: 2049 (excess 392) 1940 distinct loci (R^2<.8) Promoters: 462 (excess 81) Transcribed: 4740 (excess 522) Repressed: (excess 76) Insulator: 240 (excess 23) Other: 21k (deplete 1093) Type 1 diabetes: Rank all SNPs by association P-value Specific states in specific cell types enrich in high rank  Weak contributions from 1000s of regulatory regions

18 Systems-level views of disease epigenomics
Chromatin states help interpret disease associations Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators  enhancers  targets Mechanistic predictions, T1D-associated enhancers Global methylation changes in Alzheimer’s Disease Little variability between individuals, genotype-driven Most variable regions: promoter-flanking, brain enhancers Global inhibition of probes. Predictive power for AD Enhancers, not promoters. Targets of NRSF, ELK1, CTCF Conclusions: Power of regulatory annotation for interpreting disease 1000s of regions functionally associated with disease Weak associations, concentrated in regulatory pathways

19 Interpreting disease-association signals
(1) Interpret variants using ENCODE - Chromatin states: Enhancers, promoters, motifs - Enrichment in individual loci, across 1000s of SNPs in T1D CATGACTG CATGCCTG GWAS Genotype Disease (2) Epigenome changes in disease - Molecular phenotypic changes in patients vs. controls - Small variation in brain methylomes, mostly genotype-driven - 1000s of brain-specific enhancers increase methylation in Alzheimer’s mQTLs MWAS Epigenome

20 Methylation in 750 Alzheimer patients/controls
486,000 methylation probes 750 individuals (~50% w/AD) Memory and Aging Project Religious Order Study Brad Bernstein REMC mapping Philip deJager, Epigenomics Roadmap Genome Epigenome meQTL Phenotype Classification MWAS 1 2 Patients followed for 10+ years with cognitive evaluations Brain samples donated post-mortem methylation/genotype Seek predictive features: SNPs, QTLs, mQTLs, regulation

21 Little variability, focused on regulatory regions
Probe intensity distribution Inter-individual variability Hemi-methylated probes are also the most variable Tiny fraction (0.6%) of all probes Promoters: Stable low (active) Gene bodies: Stable high (active) Enhancers/poised: Most variable

22 Most epigenomic variability is genotype-driven
P-value (-log10P) -1 Distance from CpG (MB) 1 Chromosome and genomic position Overlay Manhattan plots of 450,000 methylation probes Cutoff of (10-2 after Benjamini-Hochberg correction) 150,000 mQTLs at P<0.01 after FDR correction

23 MultimodalSNP-associatedPromoter-depleted
All probes 1 Active promoter SNP-associated 2 Promoter flanking Multimodal probes (~3Κ) SNP-associated probes (29% of all) 138,731 184 2,647 3 Active enhancer 4 Weak enhancer 5 Gene bodies 6 Active gene bodies 93.5% of multimodal probes are SNP-associated Importance of distinguishing contribution of genotype to disease associations 7 Repetitive 8 Heterochromatin 9 Low signal % of CpG probes Remember the multi-modal probes that didn’t seem to fall into a functional group? Almost all of them are strongly SNP-associated, implying that their multi-modality is driven by genotype. SNP-associated probes depleted in promoters (driven epigenetically>genetically, open chrom)

24 >80% variance explained for 50,000+ probes
Significance q-value 25 210 215 220 Distance to CpG (MB) 8k 32k 1M Variance explained Adjusted R2 25 210 215 220 Distance to CpG (MB)

25 Systems-level views of disease epigenomics
Chromatin states help interpret disease associations Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators  enhancers  targets Mechanistic predictions, T1D-associated enhancers Global methylation changes in Alzheimer’s Disease Little variability between individuals, genotype-driven Most variable regions: promoter-flanking, brain enhancers Predictive power for AD: Global inhibition of 7000 probes Enhancers, not promoters. Targets of NRSF, ELK1, CTCF Conclusions: Power of regulatory annotation for interpreting disease 1000s of regions functionally associated with disease Weak associations, concentrated in regulatory pathways

26 Interpreting disease-association signals
(1) Interpret variants CATGACTG CATGCCTG GWAS Genotype Disease (2) Epigenome changes in disease mQTLs MWAS Epigenome

27 Phil de Jager: Methylation in 750 Alzheimer patients
486,000 methylation probes 750 individuals (~50% w/AD) Memory and Aging Project Religious Order Study Brad Bernstein REMC mapping Phil de Jager, Roadmap disease epigenomics Genome Epigenome meQTL Phenotype Classification MWAS 1 2 Patients followed for 10+ years with cognitive evaluations Brain samples donated post-mortem methylation/genotype Seek predictive features: SNPs, QTLs, mQTLs, regulation

28 Global hyper-methylation in 1000s of AD-associated loci
QQ plot: Many loci with weak effects? Expected (-logP) Observed (-logP) 10 8 6 4 2 Top 7000 probes P-value 480,000 probes, ranked by Alzheimer’s association Methylation Alzheimer’s-associated probes are hypermethylated Global effect across 1000s of probes Rank all probes by Alzheimer’s association Observe functional changes down ranklist 7000 probes show shift in methylation Complex disease: genome-wide effects Alzheimer’s Normal Hypermethylated probes (repressed)

29 Chromatin state breakdown reveals  activity
Red: More methylated in Alhzeimer’s Blue: Less methylated in Alzheimer’s Significant probes are in enhancers Not promoters % probes 1 Active promoter 2 Promoter flanking 3 Active enhancer 4 Weak enhancer 5 Gene bodies 6 Active gene bodies 7 Repetitive 8 Heterochromatin 9 Low signal * => fisher exact test, p-value <= 0.001

30 Estimating number of functionally-associated probes
Active TSS flanking Active enhancer Poised promoter Polycomb repressed Weak enhancer Expected Promoter Strong transcription Weak transcription 10,000 Functional enrichments found for 10,000 probes

31 Predictive power of hyper-methylation signal
Sum of methylation signal in 1,026 regulatory regions Sum total methylation levels across 1026 probes Individuals in top quintile show 2.5-fold higher risk By comparison, the APOE4 allele confers 1.5-fold The idea here is the same as the previous plot, but I’ve required that it only contain those probes that were both in the top 6000 and are either strong enhancers or TSS flanking regions.

32 AD-associated probes enriched in ELK1/NRSF targets
CTCF All probes, ranked by AD assoc. P-value Regulatory motifs enriched in top-scoring probes Genomic basis for association, potential cis or trans effect Reveals biological pathways involved and potential targets

33 Systems-level views of disease epigenomics
Chromatin states help interpret disease associations Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators  enhancers  targets Mechanistic predictions, T1D-associated enhancers Global methylation changes in Alzheimer’s Disease Little variability between individuals, genotype-driven Most variable regions: promoter-flanking, brain enhancers Predictive power for AD: Global inhibition of 7000 probes Enhancers, not promoters. Targets of NRSF, ELK1, CTCF Conclusions: Power of regulatory annotation for interpreting disease 1000s of regions functionally associated with disease Weak associations, concentrated in regulatory pathways

34 Interpreting disease-association signals
Epigenomic changes CATGACTG CATGCCTG GWAS Genotype Disease Regulatory Annotation

35 Collaborators and Acknowledgements
Chromatin state dynamics Brad Bernstein, ENCODE consortium Methylation in Alzheimer’s disease Philip deJager, Brad Bernstein, David Bennett Religious Order Study, Memory and Aging Project Large-scale epigenomic datasets Epigenomics Roadmap, ENCODE project, NHGRI Funding NHGRI, NIH, NSF, Sloan Foundation

36 MIT Computational Biology group Compbio.mit.edu
Mike Lin Ben Holmes Soheil Feizi Angela Yen #331: Luke Ward #19:Bob Altshuler Mukul Bansal Chris Bristow Stefan Washietl Pouya Kheradpour (#187) Matt Eaton Manolis Kellis Jason Ernst Irwin Jungreis Rachel Sealfon Jessica Wu Daniel Marbach Louisa DiStefano Dave Hendrix Loyal Goff Sushmita Roy Stata3 Stata4


Download ppt "Epigenomic views of human disease reveal 1000s of regulatory variants"

Similar presentations


Ads by Google