Epigenomic views of human disease reveal 1000s of regulatory variants

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Methods to read out regulatory functions
Regulomics II: Epigenetics and the histone code Jim Noonan GENE760.
Geuvadis RNAseq UNIGE Genetic regulatory variants
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Manolis Kellis: Research synopsis Brief overview 1 slide each vignette Why biology in a computer science group? Big biological questions: 1.Interpreting.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
The role of variation in finding functional genetic elements Andy Clark – Cornell Dave Begun – UC Davis.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Manolis Kellis Broad Institute of MIT and Harvard
ENCODE enhancers 12/13/2013 Yao Fu Gerstein lab. ‘Supervised’ enhancer prediction Yip et al., Genome Biology (2012) Get enhancer list away to genes DNase.
Epigenomic and regulatory genomics of complex human disease Manolis Kellis MIT Computer Science & Artificial Intelligence Laboratory Broad Institute of.
Geuvadis RNAseq analysis at UNIGE Analysis plans
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
Computational personal genomics: selection, regulation, epigenomics, disease Manolis Kellis MIT Computer Science & Artificial Intelligence Laboratory Broad.
Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics.
Supplemental Figure 1A. A small fraction of genes were mapped to >=20 SNPs. Supplemental Figure 1B. The density of distance from the position of an associated.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Manolis Kellis Broad Institute of MIT and Harvard
Jason Ernst Broad Institute of MIT and Harvard
Genomics 2015/16 Silvia del Burgo. + Same genome for all cells that arise from single fertilized egg, Identity?  Epigenomic signatures + Epigenomics:
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Transcriptional Enhancers Looking out for the genes and each other Sridhar Hannenhalli Department of Cell Biology and Molecular Genetics Center for Bioinformatics.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
The Chromatin State The scientific quest to decipher the histone code Lior Zimmerman.
EQTLs.
Functional Elements in the Human Genome
Epigenetics 04/04/16.
Functional Mapping and Annotation of GWAS: FUMA
Gene Hunting: Design and statistics
Manolis Kellis Broad Institute of MIT and Harvard
Structure of proximal and distant regulatory elements in the human genome Ivan Ovcharenko Computational Biology Branch National Center for Biotechnology.
Case Study #2 Session 1, Day 3, Liu
Jason Ernst Joint work with Pouya Kheradpour, Luke Ward
Jason Ernst Joint work with Pouya Kheradpour, Luke Ward
Chromatin-guided interpretation of variation in a disease cohort.
Beyond GWAS Erik Fransen.
Linking Genetic Variation to Important Phenotypes
Volume 152, Issue 3, Pages (January 2013)
Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease  Yi-An Ko, Huiguang Yi, Chengxiang Qiu, Shizheng.
Chromatin state and DNA sequence in TF binding dynamics and disease
1. Interpreting rich epigenomic datasets
Disentangling the Effects of Colocalizing Genomic Annotations to Functionally Prioritize Non-coding Variants within Complex-Trait Loci  Gosia Trynka,
Volume 18, Issue 9, Pages (February 2017)
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
High-Resolution Genetic Maps Identify Multiple Type 2 Diabetes Loci at Regulatory Hotspots in African Americans and Europeans  Winston Lau, Toby Andrew,
Volume 9, Issue 3, Pages (September 2017)
Volume 20, Issue 4, Pages e6 (April 2017)
Parisa Shooshtari, Hailiang Huang, Chris Cotsapas 
Revisiting the Thrifty Gene Hypothesis via 65 Loci Associated with Susceptibility to Type 2 Diabetes  Qasim Ayub, Loukas Moutsianas, Yuan Chen, Kalliope.
In collaboration with Mikkelsen Lab
Enhancer Connectome Nominates Target Genes of Inherited Risk Variants from Inflammatory Skin Disorders  Mark Y. Jeng, Maxwell R. Mumbach, Jeffrey M. Granja,
Volume 20, Issue 4, Pages e6 (April 2017)
Systematic mapping of functional enhancer-promoter connections with CRISPR interference by Charles P. Fulco, Mathias Munschauer, Rockwell Anyoha, Glen.
Diego Calderon, Anand Bhaskar, David A
Signatures of activators and repressors
Systematic mapping of functional enhancer–promoter connections with CRISPR interference by Charles P. Fulco, Mathias Munschauer, Rockwell Anyoha, Glen.
Predicting Gene Expression from Sequence
Volume 122, Issue 6, Pages (September 2005)
GWAS-eQTL signal colocalisation methods
Volume 165, Issue 3, Pages (April 2016)
Figure 1 Results of genome-wide association study for age at diagnosis of PD Results of genome-wide association study for age at diagnosis of PD Genome-wide.
Integrative analysis of 111 reference human epigenomes
Discovery and analysis of methylation quantitative trait loci (mQTLs) mapping to novel osteoarthritis genetic risk signals  S.J. Rice, K. Cheung, L.N.
Genetic and Epigenetic Regulation of Human lincRNA Gene Expression
IMPACT: Genomic Annotation of Cell-State-Specific Regulatory Elements Inferred from the Epigenome of Bound Transcription Factors  Tiffany Amariuta, Yang.
Presentation transcript:

Epigenomic views of human disease reveal 1000s of regulatory variants Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory

Interpreting complex disease: from regions to models Gene annotation (Coding, 5’/3’UTR, RNAs) Evolutionary signatures Roles in gene/chromatin regulation  Activator/repressor signatures CATGACTG CATGCCTG Disease-associated variant (SNP/CNV/…) Non-coding annotation  Chromatin signatures Other evidence of function  Signatures of selection (sp/pop) Challenge: from loci to mechanism, pathways, drug targets Need: A systems-level understanding of genomes and gene regulation The regulators: Transcription factors, microRNAs, sequence specificities The regions: enhancers, promoters, and their tissue-specificity The targets: TFstargets, regulatorsenhancers, enhancersgenes The grammars: Interplay of multiple TFs  prediction of gene expression  The parts list = Building blocks of genome/disease regulatory networks add cartoon image here (remember slide is copied below) 2

Systems-level views of disease epigenomics Chromatin states help interpret disease associations Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators  enhancers  targets Mechanistic predictions, 2000+ T1D-associated enhancers Global methylation changes in Alzheimer’s Disease Little variability between individuals, genotype-driven Most variable regions: promoter-flanking, brain enhancers Global inhibition of 7000+ probes. Predictive power for AD Enhancers, not promoters. Targets of NRSF, ELK1, CTCF Conclusions: Power of regulatory annotation for interpreting disease 1000s of regions functionally associated with disease Weak associations, concentrated in regulatory pathways

Interpreting disease-association signals (1) Interpret variants using ENCODE - Chromatin states: Enhancers, promoters, motifs - Enrichment in individual loci, across 1000s of SNPs in T1D CATGACTG CATGCCTG GWAS Genotype Disease (2) Epigenome changes in disease - Molecular phenotypic changes in patients vs. controls - Small variation in brain methylomes, mostly genotype-driven - 1000s of brain-specific enhancers increase methylation in Alzheimer’s mQTLs MWAS

Chromatin states dynamics across nine cell types Predicted linking Correlated activity Single annotation track for each cell type Summarize cell-type activity at a glance Can study 9-cell activity pattern across Key points to make: Chromatin states enabled us to study the dynamic nature of chromatin across many cell types. By distinguishing 15 different types of chromatin states, we could summarize all significant combinations of 81 different chromatin tracks and 2.4 billion reads in just nine chromatin annotation tracks, one for each cell type. For example, the same gene (WLS), is ‘poised’ in embryonic stem cells (ES), repressed in three other cell types (K562, blood, and liver), and active in the other five cell types. This allows us to now define ‘vectors’ of activity for each region of the genome, based on the chromatin annotation in the nine cell types. Ernst et al, Nature 2011

Enhancer-gene links supported by eQTL-gene links eQTL study Validation rationale: Expression Quantitative Trait Loci (eQTLs) provide independent SNP-to-gene links Do they agree with activity-based links? 15kb Individuals Indiv. 1 -0.5 A Indiv. 2 -1.5 A Indiv. 3 -1.8 A Example: Lymphoblastoid (GM) cells study Expression/genotype across 60 individuals (Montgomery et al, Nature 2010) 120 eQTLs are eligible for enhancer-gene linking based on our datasets 51 actually linked (43%) using predictions  4-fold enrichment (10% exp. by chance) Indiv. 4 3.1 C Indiv. 5 1.1 A Indiv. 6 -1.8 A Indiv. 7 -1.4 A Indiv. 8 3.2 C Indiv. 9 4.4 C … … … Independent validation of links. Relevance to disease datasets. Expression level of gene Sequence variant at distal position

Introducing multi-cell activity profiles Link enhancers to target genes Gene expression Chromatin States Active TF motif enrichment TF regulator expression Dip-aligned motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 TF On TF Off Motif aligned Flat profile ON OFF Active enhancer Repressed Motif enrichment Motif depletion

Introducing multi-cell activity profiles Link TFs to target enhancers Predict activators vs. repressors Gene expression Chromatin States Active TF motif enrichment TF regulator expression Dip-aligned motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 TF On TF Off Motif aligned Flat profile ON OFF Active enhancer Repressed Motif enrichment Motif depletion

Coordinated activity reveals activators/repressors Activity signatures for each TF Enhancer activity Ex1: Oct4 predicted activator of embryonic stem (ES) cells Ex2: Gfi1 repressor of K562/GM cells Key points to make: Using these correlations in activity enabled us to start piecing together enhancer regulatory networks, which have been previously inaccessible, linking regulators to enhancers and enhancers to target genes. Putting it all together, we can (a) define 20 distinct profiles of activity (labeled A through T) across the nine cell types, (b) observe the expression patterns of associated genes, showing upward of 0.9 correlation with enhancer activity, (c) discover enriched regulatory motifs revealing candidate regulators, (d) distinguish activators and repressors based on positive or negative correlations between motif enrichment in active regions and expression of the corresponding regulator. [click-animate] For example, cluster Oct 4 is a predicted activator of enhancers active in embryonic stem (ES) cells. The motif is enriched in ES-specific enhancers (cluster A), and the Oct 4 TF is expressed specifically in the same cell type [click-animate] similarly, Ets is a predicted activator of cluster G, associated with GM and HUVEC activity but not either one alone. This is important for the next slide, as we predict that a disruption in the Ets1 motif in patients of lupus erythromatosus is responsible for disruption of the corresponding enhancer and disregulation of immunity gene HLA-DRB1 (Human Leukocyte Antigen) in the major histocompatibility locus. Enhancer networks: Regulator  enhancer  target gene

Systems-level views of disease epigenomics Chromatin states help interpret disease associations Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators  enhancers  targets Mechanistic predictions, 2000+ T1D-associated enhancers Global methylation changes in Alzheimer’s Disease Little variability between individuals, genotype-driven Most variable regions: promoter-flanking, brain enhancers Global inhibition of 7000+ probes. Predictive power for AD Enhancers, not promoters. Targets of NRSF, ELK1, CTCF Conclusions: Power of regulatory annotation for interpreting disease 1000s of regions functionally associated with disease Weak associations, concentrated in regulatory pathways

Revisiting disease- associated variants xx Revisiting disease- associated variants Disease-associated SNPs enriched for enhancers in relevant cell types E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator

Mechanistic predictions for top disease-associated SNPs Lupus erythromatosus in GM lymphoblastoid Erythrocyte phenotypes in K562 leukemia cells ` Creation of repressor Gfi1 motif  Gain K562-specific repression  Loss of enhancer function  Loss of CCDC162 expression Disrupt activator Ets-1 motif  Loss of GM-specific activation  Loss of enhancer function  Loss of HLA-DRB1 expression

Detect SNPs that disrupt conserved regulatory motifs Functionally-associated SNPs enriched in states, constraint Prioritize candidates, increase resolution, disrupted motifs

Automating prediction of likely causal variants in LD  HaploReg (compbio.mit.edu/HaploReg) Start with any list of SNPs or select a GWA study Mine publically available ENCODE data for significant hits Hundreds of assays, dozens of cells, conservation, motifs Report significant overlaps and link to info/browser Ward and Kellis, NAR 2011

Functional enrichment for 1000s of SNPs Beyond top few SNPs  entire rank list Abhishek Sarkar, Luke Ward

Studying functional enrichments down the rank list Increase vs. expectation Enriched in high ranks Disease association P-Value (Rank all SNPs) Top ranks Bottom: least significant Expected at random Depletion vs. expectation Rank all SNPs by disease-association P-value Find annotations and cell types enriched in high ranks Estimate number of SNPs that show functional roles

1000s of GM/K562 enhancers contain Type1-Diabetes SNPs Lymphoblastoid Leukemia Enhancers across cell types Chromatin states in GM12878 Enhancers: 2049 (excess 392) 1940 distinct loci (R^2<.8) Promoters: 462 (excess 81) Transcribed: 4740 (excess 522) Repressed: 1351 (excess 76) Insulator: 240 (excess 23) Other: 21k (deplete 1093) Type 1 diabetes: Rank all SNPs by association P-value Specific states in specific cell types enrich in high rank  Weak contributions from 1000s of regulatory regions

Systems-level views of disease epigenomics Chromatin states help interpret disease associations Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators  enhancers  targets Mechanistic predictions, 2000+ T1D-associated enhancers Global methylation changes in Alzheimer’s Disease Little variability between individuals, genotype-driven Most variable regions: promoter-flanking, brain enhancers Global inhibition of 7000+ probes. Predictive power for AD Enhancers, not promoters. Targets of NRSF, ELK1, CTCF Conclusions: Power of regulatory annotation for interpreting disease 1000s of regions functionally associated with disease Weak associations, concentrated in regulatory pathways

Interpreting disease-association signals (1) Interpret variants using ENCODE - Chromatin states: Enhancers, promoters, motifs - Enrichment in individual loci, across 1000s of SNPs in T1D CATGACTG CATGCCTG GWAS Genotype Disease (2) Epigenome changes in disease - Molecular phenotypic changes in patients vs. controls - Small variation in brain methylomes, mostly genotype-driven - 1000s of brain-specific enhancers increase methylation in Alzheimer’s mQTLs MWAS Epigenome

Methylation in 750 Alzheimer patients/controls 486,000 methylation probes 750 individuals (~50% w/AD) Memory and Aging Project Religious Order Study Brad Bernstein REMC mapping Philip deJager, Epigenomics Roadmap Genome Epigenome meQTL Phenotype Classification MWAS 1 2 Patients followed for 10+ years with cognitive evaluations Brain samples donated post-mortem methylation/genotype Seek predictive features: SNPs, QTLs, mQTLs, regulation

Little variability, focused on regulatory regions Probe intensity distribution Inter-individual variability Hemi-methylated probes are also the most variable Tiny fraction (0.6%) of all probes Promoters: Stable low (active) Gene bodies: Stable high (active) Enhancers/poised: Most variable

Most epigenomic variability is genotype-driven P-value (-log10P) -1 Distance from CpG (MB) 1 Chromosome and genomic position Overlay Manhattan plots of 450,000 methylation probes Cutoff of 10-14 (10-2 after Benjamini-Hochberg correction) 150,000 mQTLs at P<0.01 after FDR correction

MultimodalSNP-associatedPromoter-depleted All probes 1 Active promoter SNP-associated 2 Promoter flanking Multimodal probes (~3Κ) SNP-associated probes (29% of all) 138,731 184 2,647 3 Active enhancer 4 Weak enhancer 5 Gene bodies 6 Active gene bodies 93.5% of multimodal probes are SNP-associated Importance of distinguishing contribution of genotype to disease associations 7 Repetitive 8 Heterochromatin 9 Low signal % of CpG probes Remember the multi-modal probes that didn’t seem to fall into a functional group? Almost all of them are strongly SNP-associated, implying that their multi-modality is driven by genotype. SNP-associated probes depleted in promoters (driven epigenetically>genetically, open chrom)

>80% variance explained for 50,000+ probes Significance q-value 25 210 215 220 Distance to CpG (MB) 8k 32k 1M Variance explained Adjusted R2 25 210 215 220 Distance to CpG (MB)

Systems-level views of disease epigenomics Chromatin states help interpret disease associations Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators  enhancers  targets Mechanistic predictions, 2000+ T1D-associated enhancers Global methylation changes in Alzheimer’s Disease Little variability between individuals, genotype-driven Most variable regions: promoter-flanking, brain enhancers Predictive power for AD: Global inhibition of 7000 probes Enhancers, not promoters. Targets of NRSF, ELK1, CTCF Conclusions: Power of regulatory annotation for interpreting disease 1000s of regions functionally associated with disease Weak associations, concentrated in regulatory pathways

Interpreting disease-association signals (1) Interpret variants CATGACTG CATGCCTG GWAS Genotype Disease (2) Epigenome changes in disease mQTLs MWAS Epigenome

Phil de Jager: Methylation in 750 Alzheimer patients 486,000 methylation probes 750 individuals (~50% w/AD) Memory and Aging Project Religious Order Study Brad Bernstein REMC mapping Phil de Jager, Roadmap disease epigenomics Genome Epigenome meQTL Phenotype Classification MWAS 1 2 Patients followed for 10+ years with cognitive evaluations Brain samples donated post-mortem methylation/genotype Seek predictive features: SNPs, QTLs, mQTLs, regulation

Global hyper-methylation in 1000s of AD-associated loci QQ plot: Many loci with weak effects? Expected (-logP) Observed (-logP) 10 8 6 4 2 Top 7000 probes P-value 480,000 probes, ranked by Alzheimer’s association Methylation Alzheimer’s-associated probes are hypermethylated Global effect across 1000s of probes Rank all probes by Alzheimer’s association Observe functional changes down ranklist 7000 probes show shift in methylation Complex disease: genome-wide effects Alzheimer’s Normal Hypermethylated probes (repressed)

Chromatin state breakdown reveals  activity Red: More methylated in Alhzeimer’s Blue: Less methylated in Alzheimer’s Significant probes are in enhancers Not promoters % probes 1 Active promoter 2 Promoter flanking 3 Active enhancer 4 Weak enhancer 5 Gene bodies 6 Active gene bodies 7 Repetitive 8 Heterochromatin 9 Low signal * => fisher exact test, p-value <= 0.001

Estimating number of functionally-associated probes Active TSS flanking Active enhancer Poised promoter Polycomb repressed Weak enhancer Expected Promoter Strong transcription Weak transcription 10,000 Functional enrichments found for 10,000 probes

Predictive power of hyper-methylation signal Sum of methylation signal in 1,026 regulatory regions Sum total methylation levels across 1026 probes Individuals in top quintile show 2.5-fold higher risk By comparison, the APOE4 allele confers 1.5-fold The idea here is the same as the previous plot, but I’ve required that it only contain those probes that were both in the top 6000 and are either strong enhancers or TSS flanking regions.

AD-associated probes enriched in ELK1/NRSF targets CTCF All probes, ranked by AD assoc. P-value Regulatory motifs enriched in top-scoring probes Genomic basis for association, potential cis or trans effect Reveals biological pathways involved and potential targets

Systems-level views of disease epigenomics Chromatin states help interpret disease associations Annotate dynamic regulatory elements in multiple cell types Activity-based linking of regulators  enhancers  targets Mechanistic predictions, 2000+ T1D-associated enhancers Global methylation changes in Alzheimer’s Disease Little variability between individuals, genotype-driven Most variable regions: promoter-flanking, brain enhancers Predictive power for AD: Global inhibition of 7000 probes Enhancers, not promoters. Targets of NRSF, ELK1, CTCF Conclusions: Power of regulatory annotation for interpreting disease 1000s of regions functionally associated with disease Weak associations, concentrated in regulatory pathways

Interpreting disease-association signals Epigenomic changes CATGACTG CATGCCTG GWAS Genotype Disease Regulatory Annotation

Collaborators and Acknowledgements Chromatin state dynamics Brad Bernstein, ENCODE consortium Methylation in Alzheimer’s disease Philip deJager, Brad Bernstein, David Bennett Religious Order Study, Memory and Aging Project Large-scale epigenomic datasets Epigenomics Roadmap, ENCODE project, NHGRI Funding NHGRI, NIH, NSF, Sloan Foundation

MIT Computational Biology group Compbio.mit.edu Mike Lin Ben Holmes Soheil Feizi Angela Yen #331: Luke Ward #19:Bob Altshuler Mukul Bansal Chris Bristow Stefan Washietl Pouya Kheradpour (#187) Matt Eaton Manolis Kellis Jason Ernst Irwin Jungreis Rachel Sealfon Jessica Wu Daniel Marbach Louisa DiStefano Dave Hendrix Loyal Goff Sushmita Roy Stata3 Stata4