Download presentation
Presentation is loading. Please wait.
Published byGarey Woods Modified over 9 years ago
1
Fly ModENCODE data integration update Manolis Kellis, MIT MIT Computer Science & Artificial Intelligence Laboratory Broad Institute of MIT and Harvard
2
modENCODE integration goals Annotate all functional elements –Enhancers, promoters, insulators, silencers –Protein-coding genes, RNA genes, alternative splice forms Understand their dynamics –Tissue- and stage-specific activity of each type of element Mechanisms –Relative roles of histones, chromatin, specific/general TFs –Sequence specificity, regulatory motifs and grammars Community involvement will be key –Seeking both computational and experimental partners –Large-scale: Complementary datasets / computation –Small-scale: Directed follow-up studies / genes, pathways Drosophila 2009 modENCODE workshop discussion
3
Each dataset is supported by all others Each type of element requires multiple data types –Protein genes –RNA genes –Promoters –Enhancers –Transcripts –Heterochromatin –Initiation sites Replication Chromatin Nucleosomes Small RNAs Transcripts TFs/Chromatin Karpen Henikoff Celniker White Lai Mac Alpine Already presented Underway Data Integration efforts
4
modENCODE is not alone Community data types –Boundaries –DNAse HS sites, low buoyant density (protein binding) –evolutionary properties (correlations with conserved/non- conserved properties) –Dam mapping –Small RNAs Techniques and functional genomics –Gene Disruption projects –RNAi collection –Recombineering –Computational analyses Replication Chromatin Nucleosomes Small RNAs Transcripts TFs/Chromatin Karpen Henikoff Celniker White Lai Mac Alpine Boundaries DNAse HS 12flies (+8 flies) Dam mapping etc
5
Comparative resources for Drosophila genomes Identify functional elements by their evolutionary signatures: complement experimental studies done priority1 priority2 New SpeciesDist D. ficusphila0.80 D. biarmipes0.70 D. elegans0.72 D. kikkawai0.89 D. eugracilis0.76 D. takahashii0.65 D. rhopaloa0.66 D. bipectinata0.99
6
Evolutionary signatures for diverse functions Protein-coding genes - Codon Substitution Frequencies - Reading Frame Conservation RNA structures - Compensatory changes - Silent G-U substitutions microRNAs - Shape of conservation profile - Structural features: loops, pairs - Relationship with 3’UTR motifs Regulatory motifs - Mutations preserve consensus - Increased Branch Length Score - Genome-wide conservation Stark et al, Nature 2007; Clark et al, Nature 2007
7
Functional annotation of Novel Transcripts using evo. sigs CSF Score (best 30 aa window) -20 0 20 40 60 CSF Score (best 30 aa window) -20 0 20 40 60 Fraction Frequency 73 Putative protein coding 57 Putative non-coding CSF = Heuristic metric for codon substitution frequency Mike Lin, Jane Landolin, Sue Celniker
8
ConsensusMCSMatches to known Tissue specific target expression PromotersEnhancers 1CTAATTAAA65.6engrailed (en)25.42 2TTKCAATTAA57.3reversed-polarity (repo)5.84.2 3WATTRATTK54.9araucan (ara)11.72.6 4AAATTTATGCK54.4paired (prd)4.516.5 5GCAATAAA51ventral veins lacking (vvl)13.20.3 6DTAATTTRYNR46.7Ultrabithorax (Ubx)163.3 7TGATTAAT45.7apterous (ap)7.11.7 8YMATTAAAA43.1abdominal A (abd-A)72.2 9AAACNNGTT41.2 20.14.3 10RATTKAATT40 3.90.7 11GCACGTGT39.5fushi tarazu (ftz)17.9 12AACASCTG38.8broad-Z3 (br-Z3)10.7 13AATTRMATTA38.2 19.51.2 14TATGCWAAT37.8 5.82 15TAATTATG37.5Antennapedia (Antp)14.15.4 16CATNAATCA36.9 1.81.7 17TTACATAA36.9 5.4 18RTAAATCAA36.3 3.22.8 19AATKNMATTT36 3.60 20ATGTCAAHT35.6 2.44.6 21ATAAAYAAA35.5 57.2-0.5 22YYAATCAAA33.9 5.30.6 23WTTTTATG33.8Abdominal B (Abd-B)6.36 24TTTYMATTA33.6extradenticle (exd)6.71.7 25TGTMAATA33.2 8.91.6 26TAAYGAG33.1 4.72.7 27AAAKTGA32.9 7.60.3 28AAANNAAA32.9 449.70.8 29RTAAWTTAT32.9gooseberry-neuro (gsb-n)110.8 30TTATTTAYR32.9Deformed (Dfd)30.7 Discover motifs associated with binding Ability to discover full dictionary of regulatory motifs de novo Stark et al, Nature, 2007
9
ChIP-grade quality –Similar functional enrichment –High sens. High spec. Systems-level –81% of Transc. Factors –86% of microRNAs –8k + 2k targets –46k connections Lessons learned –Pre- and post- are correlated (hihi/lolo) –Regulators are heavily targeted, feedback loop Kheradpour et al, Genome Research, 2007 Sushmita Roy Initial regulatory network for an animal genome
10
Temporal latencies in regulatory networks TF-specific latencies, coherent with TF function Latencies associated with network motifs Extensions to tissue-specific networks Rogerio Candeias
11
Incorporating ENCODE functional datasets Pouya Kheradpour, Jason Ernst, Chris Bristow, Rachel Sealfon
12
modENCODE and gene regulation Goal: Understand the DNA elements responsible for gene regulation: The regulators: TFs, GFs, miRNAs, their specificities The regions: enhancers, promoters, insulators The targets: individual regulatory motif instances The grammars: combinations predictive of tissue-specific activity Building blocks of gene regulation Our tools: Comparative genomics & large-scale experimental datasets. Evolutionary signatures for promoter/enhancer/3’UTR motif annotation Chromatin signatures for integrating histone modification datasets TFs, GFs, motifs, instances associated with tissue-specific activity Infer regulatory networks, their temporal and spatial dynamics Integrate diverse datasets
13
Sequence motifs predictive of insulators Understand specificity of each factor How predictable are these of binding Motif combinations and grammars GAF, check CTCF, check Su(Hw), check BEAF-32, variant Mod(mdg4), novel CP190, novel Motifs specific to each insulator Pouya Kheradpour
14
Motif instances correlate with ChIP peaks CTCF motif instances correlate strongly with narrow peak calls from multiple peak callers, even at 40bp window Correlation extends down rank link (to all 50,000 peaks) Implications for peak calling and for motif discovery SPP, 40bp window Narrow Peak Interval Rank x10 4 Fraction overlapping CTCF motif instances Pouya Kheradpour, Ben Brown Performance (higher is better) Peak size Recovery of CTCF inst. at 90% confid.
15
Motifs and tissue-specific chromatin marks Fold enrichment or over expression The NF-κB motif is enriched in H3K4me2 regions found uniquely in GM12878 cells It is likewise enriched in the uniquely bound regions for other active marks Conversely, it is enriched in the uniquely unbound regions for the repressive mark H3K27me3 We find that NF-κB is also over expressed in GM12878, suggesting a causative explanation NF-κB motif Active marks Repressive mark Pouya Kheradpour
16
Motifs and stage-specific chromatin marks Fold enrichment or over expression abd-A motif is enriched in new H3K27me3 regions at L2 –Coincides with a drop in the expression of abd-A –Model: sites gain H3K27me3 as abd-A binding lost Additional intriguing stories found, to be explored H3K27me3
17
What about combinations of chromatin marks? Jason Ernst
18
A hidden Markov model for chromatin state Enhancer Transcription Start Site DNA Observed Histone Modifications Most likely Hidden State Transcribed Region 1 2 5 6 3 4 55 5 5 6 1: 3: 4: 5: 6: Even though modification was not observed can still infer correct state based on neighboring locations that this state is likely of the same type as its neighboring states 6 Highly Likely Modifications in State 2: 0.8 0.9 0.8 0.7 0.9.8
19
20 distinct chromatin states, combinations of marks Combinations of chromatin marks –More informative than individual marks (A&B ≠ A&C) –Small number of states (20 instead of all 2 million=2 21 ) –Allow study of co-occurrence patterns, independence…
20
Each chromatin state associated w/ distinct function Reveals active/repressed promoters & enhancers Distinct enrichments for 5’UTR/3’UTR/transcripts Distinct chromatin properties of exons / introns Tentative annotations
21
Transcriptional unit enrichment
22
Transcription start site (TSS) enrichment
23
Transcription termination site (TTS) enrichment
24
Transcriptional unit enrichment
25
Chromatin signatures as context for TF analysis TF role in establishing chromatin states Chromatin role in modulating TF function
26
Specific enrichment for DV and AP factors
27
Functions of 20 distinct chromatin states in fly DV enhancersAP enhancersGeneral TFsInsulatorsReplicationMotifs Chromatin marks
28
The grand challenge ahead Anterior-Posterior Dorsal-Ventral Annotations & images for all expression patterns Expression domain primitives reveal underlying logic Binding sites of every developmental regulator GAF, check Su(Hw), check BEAF-32, variant Mod(mdg4), novel CP190, novel CTCF, check Sequence motifs for every regulator Understand regulatory logic specifying development
29
Summary of our lab’s experience in (mod)ENCODE Protein-coding genes (Mike Lin) –Hubbard: Predict new genes, evaluate novel genes –Celniker: Distinguish coding/non-coding transcripts Chromatin domains (Jason Ernst) –Karpen: Chromatin states in Drosophila –Bernstein: Chromatin states in Human Motif and grammar discovery (Pouya Kheradpour) –White: Motifs associated with insulator proteins –Bernstein: Tissue-specific chromatin states –White: Expression and Binding Time-course Tissue-specific gene expression (Chris Bristow) –Celniker: Embryo expression domains –All: Predictive models of gene expression
30
Acknowledgements Alex Stark TFs/Insul.Kevin White, Bing Ren, Nicolas Negre, Par Shah, Jim Posakony 12+8-fliesAndy Clark, Mike Eisen, Bill Gelbart, Doug Smith, Peter Cherbas ChromatinGary Karpen, Aki Minoda, Nicole Riddle, Peter Park + Kharchenko Prot.GenesBDGP: Sue Celniker, Jane Landolin, FlyBase: Bill Gelbart Pouya Kheradpour Mike Lin Jason Ernst Chris Bristow FundingENCODE, modENCODE, NHGRI, NSF, Sloan Foundation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.