Chromatin state and DNA sequence in TF binding dynamics and disease Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory
DNA vs. epigenome in dynamics & disease Sequence specificity Motifs TF binding ? Interplay ENCODE Ernst, Bernstein Chromatin state CATGACTG CATGCCTG GWAS Genotype Disease QTLs QTLs Epigenotype Roadmap Eaton, De Jager
States combine histone marks, FAIRE, Pol2, DNase Transition matrix ENCODE datasets: Bernstein, Stam, Lieb, Crawford Several classes of Dnase hypersensitive regions Do they have different TF-binding properties?
TFs show characteristic chromatin state preferences Confirm TFDNAse relationship However: Different TFs bind different chromatin states Dynamic binding across cell types?
Patterns hold across 300+ TF binding expts What about dynamics?
Dynamic enhancers vs. constitutive CTCF/promoters
Dynamic TF binding dynamic enhancer activity Dynamic enh./static promoters TF binding corr. w/ TF expression
TF co-occurrence patterns driven by chromatin state Raw enrichments
TF co-occurrence patterns driven by chromatin state Raw enrichments Conditional enrichments (if state preference is known)
Chromatin state preferences are motif encoded States bound by TFs enriched in corresponding motifs Enrichment also found in states of specific repression
Bound regions in preferred states depleted in motifs Permissive binding in promoters/enhancers/insulators DNase/FAIRE regions lacking marks: not permissive
Summary Chromatin states, TF dynamics, and motifs TFs bind DNase; distinct chromatin state preferences Chromatin state preferences are partly motif-encoded States predict most previously-observed co-binding Motifs guide states, states enable permissive binding Methylation vs. genotype in Alzheimer’s Disease Variability between individuals mostly genotype-driven Most variable: promoter-flanking, brain enhancers Predictive for AD: Global inhibition of 7000 probes Enhancers, not promoters. NRSF, ELK1, CTCF targets Conclusions: Power of regulatory annotation for interpreting disease Interplay of DNA sequence & epigenome in TFs/disease
DNA vs. epigenome in dynamics & disease Sequence specificity Motifs TF binding ? Interplay ENCODE Ernst, Bernstein Chromatin state CATGACTG CATGCCTG GWAS Genotype Disease QTLs QTLs Epigenotype Roadmap Eaton, De Jager
Interpreting disease-association signals (1) Interpret variants using ENCODE - Chromatin states: Enhancers, promoters, motifs - Enrichment in individual loci, across 1000s of SNPs in T1D CATGACTG CATGCCTG GWAS Genotype Disease (2) Epigenome changes in disease - Molecular phenotypic changes in patients vs. controls - Small variation in brain methylomes, mostly genotype-driven - 1000s of brain-specific enhancers increase methylation in Alzheimer’s mQTLs MWAS Epigenome
Methylation in 750 Alzheimer patients/controls 486,000 methylation probes 750 individuals (~50% w/AD) Memory and Aging Project Religious Order Study Brad Bernstein REMC mapping Philip deJager, Epigenomics Roadmap Genome Epigenome meQTL Phenotype Classification MWAS 1 2 Patients followed for 10+ years with cognitive evaluations Brain samples donated post-mortem methylation/genotype Seek predictive features: SNPs, QTLs, mQTLs, regulation
Global variability in DLPFC and CD4+ methylation T-cells CD4+ Dorso-Lateral Pre-Frontal Cortex Gender (M/F) Batch Colors along the top represent gender, colors along the left indicate “batch” (CD4+ batch vs DLPFC batch, the four red bars in the black section are DLPFC samples run in the CD4+ batch to make sure that batch effect wasn’t stronger than cell type effect). Most similar Least similar
Little variability, focused on regulatory regions Probe intensity distribution Inter-individual variability Hemi-methylated probes are also the most variable Tiny fraction (0.6%) of all probes Promoters: Stable low (active) Gene bodies: Stable high (active) Enhancers/poised: Most variable
Most epigenomic variability is genotype-driven P-value (-log10P) -1 Distance from CpG (MB) 1 Chromosome and genomic position Overlay Manhattan plots of 450,000 methylation probes Cutoff of 10-14 (10-2 after Benjamini-Hochberg correction) 150,000 mQTLs at P<0.01 after FDR correction
MultimodalSNP-associatedPromoter-depleted All probes 1 Active promoter SNP-associated 2 Promoter flanking Multimodal probes (~3Κ) SNP-associated probes (29% of all) 138,731 184 2,647 3 Active enhancer 4 Weak enhancer 5 Gene bodies 6 Active gene bodies 93.5% of multimodal probes are SNP-associated Importance of distinguishing contribution of genotype to disease associations 7 Repetitive Remember the multi-modal probes that didn’t seem to fall into a functional group? Almost all of them are strongly SNP-associated, implying that their multi-modality is driven by genotype. 8 Heterochromatin 9 Low signal % of CpG probes SNP-associated probes depleted in promoters (driven epigenetically>genetically, open chrom)
>80% variance explained for 50,000+ probes Significance q-value 25 210 215 220 Distance to CpG (MB) 8k 32k 1M Variance explained Adjusted R2 25 210 215 220 Distance to CpG (MB)
Phil de Jager: Methylation in 750 Alzheimer patients 486,000 methylation probes 750 individuals (~50% w/AD) Memory and Aging Project Religious Order Study Brad Bernstein REMC mapping Phil de Jager, Roadmap disease epigenomics Genome Epigenome meQTL Phenotype Classification MWAS 1 2 Patients followed for 10+ years with cognitive evaluations Brain samples donated post-mortem methylation/genotype Seek predictive features: SNPs, QTLs, mQTLs, regulation
Global hyper-methylation in 1000s of AD-associated loci QQ plot: Many loci with weak effects? Expected (-logP) Observed (-logP) 10 8 6 4 2 Top 7000 probes P-value 480,000 probes, ranked by Alzheimer’s association Methylation Alzheimer’s-associated probes are hypermethylated Global effect across 1000s of probes Rank all probes by Alzheimer’s association Observe functional changes down ranklist 7000 probes show shift in methylation Complex disease: genome-wide effects Alzheimer’s Normal Hypermethylated probes (repressed)
Chromatin state breakdown reveals activity Red: More methylated in Alhzeimer’s Blue: Less methylated in Alzheimer’s Significant probes are in enhancers Not promoters % probes 1 Active promoter 2 Promoter flanking 3 Active enhancer 4 Weak enhancer 5 Gene bodies 6 Active gene bodies 7 Repetitive 8 Heterochromatin 9 Low signal * => fisher exact test, p-value <= 0.001
Estimating number of functionally-associated probes Active TSS flanking Active enhancer Poised promoter Polycomb repressed Weak enhancer Expected Promoter Strong transcription Weak transcription 10,000 Functional enrichments found for 10,000 probes
Predictive power of hyper-methylation signal Sum of methylation signal in 1,026 regulatory regions The idea here is the same as the previous plot, but I’ve required that it only contain those probes that were both in the top 6000 and are either strong enhancers or TSS flanking regions. Sum total methylation levels across 1026 probes Individuals in top quintile show 2.5-fold higher risk By comparison, the APOE4 allele confers 1.5-fold
AD-associated probes enriched in ELK1/NRSF targets CTCF All probes, ranked by AD assoc. P-value Regulatory motifs enriched in top-scoring probes Genomic basis for association, potential cis or trans effect Reveals biological pathways involved and potential targets
DNA vs. epigenome in dynamics & disease Sequence specificity Motifs TF binding ? Interplay ENCODE Ernst, Bernstein Chromatin state CATGACTG CATGCCTG GWAS Genotype Disease QTLs QTLs Epigenotype Roadmap Eaton, De Jager
Summary Chromatin states, TF dynamics, and motifs TFs bind DNase; distinct chromatin state preferences Chromatin state preferences are partly motif-encoded States predict most previously-observed co-binding Motifs guide states, states enable permissive binding Methylation vs. genotype in Alzheimer’s Disease Variability between individuals mostly genotype-driven Most variable: promoter-flanking, brain enhancers Predictive for AD: Global inhibition of 7000 probes Enhancers, not promoters. NRSF, ELK1, CTCF targets Conclusions: Power of regulatory annotation for interpreting disease Interplay of DNA sequence & epigenome in TFs/disease
Collaborators and Acknowledgements Chromatin state dynamics, ENCODE Brad Bernstein, John Stam, Jason Lieb, Crawford Methylation in Alzheimer’s disease Philip deJager & Gyan Srivastava, Brad Bernstein Religious Order Study, Memory and Aging Project Large-scale epigenomic datasets Epigenomics Roadmap, ENCODE project, NHGRI Funding NHGRI, NIH, NSF, Sloan Foundation
MIT Computational Biology group Compbio.mit.edu Mike Lin Ben Holmes Soheil Feizi Angela Yen Luke Ward Bob Altshuler Mukul Bansal Chris Bristow Stefan Washietl Pouya Kheradpour Matt Eaton Manolis Kellis Jason Ernst Irwin Jungreis Rachel Sealfon Jessica Wu Daniel Marbach Louisa DiStefano Dave Hendrix Loyal Goff Sushmita Roy Stata3 Stata4