Jason Ernst Broad Institute of MIT and Harvard

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Regulomics II: Epigenetics and the histone code Jim Noonan GENE760.
Manolis Kellis: Research synopsis Brief overview 1 slide each vignette Why biology in a computer science group? Big biological questions: 1.Interpreting.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Epigenetics 12/05/07 Statisticians like data.
[Bejerano Fall10/11] 1 Thank you for the midterm feedback! Projects will be assigned shortly.
[Bejerano Spr06/07] 1 TTh 11:00-12:15 in Clark S361 Profs: Serafim Batzoglou, Gill Bejerano TAs: George Asimenos, Cory McLean.
Computational Approaches in Epigenomics Guo-Cheng Yuan Department of Biostatistics and Computational Biology Dana-Farber Cancer Institute Harvard School.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
The Hardwiring of development: organization and function of genomic regulatory systems Maria I. Arnone and Eric H. Davidson.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
[Bejerano Fall09/10] 1 Thank you for the midterm feedback!
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger.
Genetic Regulators of Large-scale Transcriptional Signatures in Cancer Presented by Mei Liu September 26, 2007.
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
Identification of obesity-associated intergenic long noncoding RNAs
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
Comparative Genomics II: Functional comparisons Caterino and Hayes, 2007.
Manolis Kellis Broad Institute of MIT and Harvard
ENCODE enhancers 12/13/2013 Yao Fu Gerstein lab. ‘Supervised’ enhancer prediction Yip et al., Genome Biology (2012) Get enhancer list away to genes DNase.
1 1 - Lectures.GersteinLab.org Overview of ENCODE Elements Mark Gerstein for the "ENCODE TEAM"
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
P300 Marks Active Enhancers Ruijuan LiChao HeRui Fu.
Outline  Nucleosome distribution  Chromatin modification patterns  Mechanisms of chromatin modifications  Biological roles.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
Computational personal genomics: selection, regulation, epigenomics, disease Manolis Kellis MIT Computer Science & Artificial Intelligence Laboratory Broad.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics.
Regulation of Gene Expression Chapter 18. Warm Up Explain the difference between a missense and a nonsense mutation. What is a silent mutation? QUIZ TOMORROW:
Introduction to the Tsinghua University ENCODE Journal Club Monica C. Sleumer ( 苏漠 )
Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology.
Understanding Gene Regulation Through Integrated Analysis of Genomic Data Guo-Cheng Yuan Department of Biostatistics and Computational Biology Dana-Farber.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Thank you for the midterm feedback!
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Manolis Kellis Broad Institute of MIT and Harvard
Overview of ENCODE Elements
GENE REGULATION RESULTS IN DIFFERENTIAL GENE EXPRESSION, LEADING TO CELL SPECIALIZATION Eukaryotic DNA.
CS173 Lecture 9: Transcriptional regulation III
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
Outline Molecular Cell Biology Assessment Review from last lecture Role of nucleoporins in transcription Activators and Repressors Epigenetic mechanisms.
Genomics 2015/16 Silvia del Burgo. + Same genome for all cells that arise from single fertilized egg, Identity?  Epigenomic signatures + Epigenomics:
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Agenda  Epigenetics and microRNAs – Update –What’s epigenetics? –Preliminary results.
Transcriptional Enhancers Looking out for the genes and each other Sridhar Hannenhalli Department of Cell Biology and Molecular Genetics Center for Bioinformatics.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Integrative Genomics. Double-helix DNA strands are separated in the gene coding region Which enzyme detects the beginning of a gene ? RNA Polymerase (multi-subunit.
The Chromatin State The scientific quest to decipher the histone code Lior Zimmerman.
Functional Elements in the Human Genome
Epigenetics 04/04/16.
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
Functional Mapping and Annotation of GWAS: FUMA
Integration methods and analysis
Manolis Kellis Broad Institute of MIT and Harvard
Regulation of Gene Expression by Eukaryotes
Jason Ernst Joint work with Pouya Kheradpour, Luke Ward
Jason Ernst Joint work with Pouya Kheradpour, Luke Ward
Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease  Yi-An Ko, Huiguang Yi, Chengxiang Qiu, Shizheng.
1. Interpreting rich epigenomic datasets
Epigenomic views of human disease reveal 1000s of regulatory variants
Integrative analysis of genomic and epigenomic data
Revisiting the Thrifty Gene Hypothesis via 65 Loci Associated with Susceptibility to Type 2 Diabetes  Qasim Ayub, Loukas Moutsianas, Yuan Chen, Kalliope.
In collaboration with Mikkelsen Lab
Presentation by: Hannah Mays UCF - BSC 4434 Professor Xiaoman Li
Signatures of activators and repressors
Integrative analysis of 111 reference human epigenomes
Chromatin state mapping pinpoints PAX3–FOXO1 (P3F) in active enhancers
Presentation transcript:

Disease epigenomics: Interpreting non-coding variants using chromatin and activity signatures Jason Ernst Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory

Challenge: interpreting disease-associated variants Gene annotation (Coding, 5’/3’UTR, RNAs) Evolutionary signatures Roles in gene/chromatin regulation  Activator/repressor signatures CATGACTG CATGCCTG Disease-associated variant (SNP/CNV/…) Non-coding annotation  Chromatin signatures Other evidence of function  Signatures of selection (sp/pop) GWAS, case-control,… reveal disease-associated variants  Molecular mechanism, cell-type specificity, drug targets Challenges towards interpreting disease variants Find ‘true’ causative SNP among many candidates in LD Use ‘causal’ variant: predict function, pathway, drug targets Non-coding variant: type of function, cell type of activity Regulatory variant: upstream regulators, downstream targets This talk: genomics tools for addressing these challenges

The good news: ever-expanding dimensions Additional dimensions: Environment Genotype Disease Gender Stage Age Each point represents a genome-wide dataset Chromatin marks Cell types Now: Cell-type and chromatin-mark dimensions Next: References for each background All clearly needed, and increasingly available

Difficulty of interpreting increasing # tracks Challenge: simplify Learn combinations Interpret function Prioritize marks Study dynamics

Challenge of data integration in many marks/cells Epigenetic modifications DNA/histone/nucleosome Encode epigenetic state Histone code hypothesis Distinct function for distinct combinations of marks? Hundreds of histone marks Astronomical number of histone mark combinations How do we find biologically relevant ones? Unsupervised approach Probabilistic model Explicit combinatorics Epigenomic information retains genome ‘state’ in differentiation and development Genome-wide modification maps Hundreds of histone tail modifications already known Two types: DNA methyl. Histone marks DNA packaged into chromatin around histone proteins

Genomic tools for disease SNP interpretation Chromatin states  regulatory region annotation Combinatorial patterns of marks  chromatin states Distinct classes of prom/enh/transcr/repres’d/repetitive Reveal new genes, lincRNAs, enhancers, GWAS/SNP Activity signatures  linking enhancer networks Correlated changes in expression, chromatin, motifs Link TFs to enhancers and enhancers to targets Predict causal cell-type specific activators/repressors Interpreting disease variants Predicting SNP chromatin states and cell-type specificity Specific mechanistic predictions for disease SNPs Measuring selective pressures within human populations

ChromHMM: learning ‘hidden’ chromatin states Transcription Start Site Enhancer DNA Observed chromatin marks. Called based on a poisson distribution Most likely Hidden State Transcribed Region 1 6 5 3 4 1: 3: 4: 5: 6: High Probability Chromatin Marks in State 2: 0.8 0.9 0.7 200bp intervals All probabilities are learned de novo from chromatin data alone (Baum-Welch aka. EM) 2 K4me3 K36me3 K4me1 K27ac We had talked about adding the H3K4 etc labels within the shapes Each state: vector of emissions, vector of transitions Ernst and Kellis, Nature Biotech 2010

Chromatin states for genome annotation Learn de novo significant combinations of chromatin marks Reveal functional elements, even without looking at sequence Use for genome annotation Use for studying regulation dynamics in different cell types Promoter states Transcribed states Active Intergenic Repressed

Emerging large-scale genomic/epigenomic datasets Multiple cell types Diverse experiments Developmental time-course Reference Epigenome Mapping Centers Used to study many disease epigenomes ENCODE Chromatin Group (PI: Bernstein) Insulator Enhancer Promoter Transcribed Repressed Repetitive 15-state model learned jointly 9 chromatin marks+WCE 9 human cell types HUVEC Umbilical vein endothelial NHEK Keratinocytes GM12878 Lymphoblastoid K562 Myelogenous leukemia HepG2 Liver carcinoma NHLF Normal human lung fibroblast HMEC Mammary epithelial cell HSMM Skeletal muscle myoblasts H1 Embryonic H3K4me1 H3K4me2 H3K4me3 H3K27ac H3K9ac H3K27me3 H4K20me1 H3K36me3 CTCF +WCE +RNA x NHEK HUVEC H1 … Cell type concatenation approach Ensures common emission parameters Verified with independent learning

Chromatin states capture coordinated mark changes State definitions are cell-type invariant Same combinations consistently found State locations are cell-type specific Can study pair-wise or multi-way changes

Chromatin states correlation with gene expression TSS +50kb -50kb Lower expression Higher expression

Pair-wise changes reveal cell-type specific functions Gene functional enrichments match cell function Distinguish On, Off, and Poised promoter states

Genomic tools for disease SNP interpretation Chromatin states  regulatory region annotation Combinatorial patterns of marks  chromatin states Distinct classes of prom/enh/transcr/repres’d/repetitive Reveal new genes, lincRNAs, enhancers, GWAS/SNP Activity signatures  linking enhancer networks Correlated changes in expression, chromatin, motifs Link TFs to enhancers and enhancers to targets Predict causal cell-type specific activators/repressors Interpreting disease variants Predicting SNP chromatin states and cell-type specificity Specific mechanistic predictions for disease SNPs Measuring selective pressures within human populations

Introducing multi-cell activity profiles Gene expression Chromatin States Active TF motif enrichment TF regulator expression Dip-aligned motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 TF On TF Off Motif aligned Flat profile ON OFF Active enhancer Repressed Motif enrichment Motif depletion

Enhancer vs. promoter dynamics Promoters typically active in many cells Enhancers exquisitely cell-type specific Enhancer vs. promoter dynamics

Linking candidate enhancers to correlated target genes Search for coherent changes between: gene expression chromatin marks at distant loci (10kb) Combine two vectors: Expression vector for each gene Vector of mark intensities at dist locus (combine marks based on enhancer emissions) 3. High correlation  enhancer/target link 10kb Candidate TM4SF1 Enhancer

Predictive power of distal enhancer regions Correlation of individual regions (Sorted by Rank) Mark intensity correlation w/ expr 10kb upstream 100kb upstream 10kb/100kb controls At least 100 regions with >80% correlation

Coordinated activity reveals enhancer links Enhancer activity Gene activity Predicted regulators Activity signatures for each TF Distal enhancer hard to integrate in regulatory models Linked to target genes based on coordinated activity Linked to upstream regulators using TF expr & motifs

Nucleosome Positioning Footprints Supports Transcription Factor Cell Type Predictions Tag Enrichment for H3K27ac

Genomic tools for disease SNP interpretation Chromatin states  regulatory region annotation Combinatorial patterns of marks  chromatin states Distinct classes of prom/enh/transcr/repres’d/repetitive Reveal new genes, lincRNAs, enhancers, GWAS/SNP Activity signatures  linking enhancer networks Correlated changes in expression, chromatin, motifs Link TFs to enhancers and enhancers to targets Predict causal cell-type specific activators/repressors Interpreting disease variants Predicting SNP chromatin states and cell-type specificity Specific mechanistic predictions for disease SNPs Measuring selective pressures within human populations

Enhancer annotation revisits disease SNPs xx Enhancer annotation revisits disease SNPs  Previously unlinked phenotypes enriched for cell-type specific enhancers

Application1: Pinpoint disease SNPs in enhancers Much smaller fraction of genome considered Strong enhancers 1.9%, weak 2.8%, promoter 1.4%

Application 2: Make much more precise predictions Use: * Cell-type specificity of chromatin states * Predicted activators/repressors of these states * Predicted motif instances across the genome

Ex1: Systemic lupus erythematosus intergenic SNP SNP in lymphoblastoid GM-specific enhancer state Disrupts Ets1 motif instance, predicted GM regulator  Model: Disease SNP abolishes GM-specific enhancer

Ets-1 is a predicted activator of GM/HUVEC enhancers Enhancer activity Gene activity Predicted regulators Activity signatures for each TF Enhancer class specific to GM and HUVEC cell types Ets expression  Ets-1 motif enrichment in enhancers  Model: Ets-1 disruption would abolish enhancer state

Ex2: Erythrocyte phenotype study intronic SNP K562: erythroleukaemia cell type ` ` Disease SNP creates motif instance for Gfi-1 repressor Gfi-1 predicted repressor for K562-specific enhancers  Creation of repressive motif abolishes K562 enhancer

Gfi-1 is a predicted repressor of non-K562 enhancers Enhancer activity Gene activity Predicted regulators Activity signatures for each TF Gfi expression  Gfi-1 motif depletion in enhancers Prediction: Gfi-1 large-scale repression of non-K562  Motif created  Gfi-1 recruited  enhancer repressed

More generally: eQTLs in specific chromatin states Dixon 2007: All eQTLs, Lymphoblasts, 400 ind. Schadt 2008: Trans eQTLs, liver cells, 427 ind. Nucleotide-resolution genome-wide expr. predictors Strong enrichment for promoter and enhancer states Trans-eQTLs select for cell-type specific enhancers

Genomic tools for disease SNP interpretation Chromatin states  regulatory region annotation Combinatorial patterns of marks  chromatin states Distinct classes of prom/enh/transcr/repres’d/repetitive Reveal new genes, lincRNAs, enhancers, GWAS/SNP Activity signatures  linking enhancer networks Correlated changes in expression, chromatin, motifs Link TFs to enhancers and enhancers to targets Predict causal cell-type specific activators/repressors Interpreting disease variants Predicting SNP chromatin states and cell-type specificity Specific mechanistic predictions for disease SNPs Measuring selective pressures within human populations