Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparative genomics in flies and mammals

Similar presentations


Presentation on theme: "Comparative genomics in flies and mammals"— Presentation transcript:

1 Comparative genomics in flies and mammals
Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory

2 Resolving power in mammals, flies, fungi
8 Candida 9 Yeasts Post-duplication Diploid Haploid Pre-dup P Many species lead to high resolving power in close distances

3 Comparative genomics and evolutionary signatures
Comparative genomics can reveal functional elements For example: exons are deeply conserved to mouse, chicken, fish Many other elements are also strongly conserved: exons / regulatory? Can we also pinpoint specific functions of each region? Yes! Patterns of change distinguish different types of functional elements Specific function  Selective pressures  Patterns of mutation/inse/del Develop evolutionary signatures characteristic of each function

4 1. Evolutionary signature of protein-coding genes
Revise protein-coding gene catalogue

5 Protein-coding evolution vs. nucleotide conservation
High protein-coding signal, low conservation  Evolutionary signatures highly sensitive High conservation, but not protein-coding  Evolutionary signatures highly specific Annotated FlyBase gene Existing cDNA data New predicted exon cDNA validation (iPCR)

6 2. Evolutionary signatures of RNA genes
Typical substitutions Compensatory changes G:C  G:U … G:U  A:U Prediction methodology Jakob Pedersen: EvoFold with very stringent parameters

7 Reveal novel RNA genes and structures
Intronic: enriched in A-to-I editing, also novel ncRNAs Coding: A-to-I editing, also translational regulation 3’UTRs: enriched in regulators of mRNA localization 5’UTRs: translational regulation, ribosomal proteins - 3’ & 5’UTR structures mostly on coding strand (75% & 80%)

8 3. Structural and evolutionary signatures of miRNAs
Discover novel miRNAs Recognize miRNA hairpin Length of hairpin & length of arms Fold stability, symm/assym bulges Conservation profile: high|low|high Pinpoint mature miRNA 5’end Perfect 8mer conservation at start Predominance of 5’U (78%) Number of paired bases is bound Complementary to 3’UTR motifs Revise existing miRNAs

9 4. Evolutionary signatures for regulatory motifs
Known engrailed site (footprint) D.mel CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.sim CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.sec CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.yak CAGC--TAGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.ere CAGCGGTCGCCAAACTCTCTAATTAGCGACCAAGTC-CAAGTC D.ana CACTAGTTCCTAGGCACTCTAATTAGCAAGTTAGTCTCTAGAG ** * * *********** * **** * ** D.mel D. ere D. ana D. pse. Motifs discovered - Recover known regulators - Many novel motifs Evidence for novel motifs Tissue-specific enrichment Functional enrichment In promoters & enhancers Surprises Core promoter elements miRNA motifs in coding ex. To address this question, we first studied the conservation properties of known regulatory motifs. Here, you can see the alignment of the promoter of Gabpa gene across the four mammalian species used. The alignment of this region reveals a small conservation island which seems to be under selection. This island in fact corresponds to a functional binding site for Erra, which has been experimentally validated. The ERRa motif appears perfectly conserved in the four species and stands out from the largely diverged neighboring sequences. However, is this enough to discover regulatory motifs? The answer is clearly no, since many many such small islands exist, and the vast majority of them are likely to be solely due to chance.

10 Functions of discovered motifs
Positional biases Tissue-specific enrichment and clustering miRNA targeting in coding regions

11 5. Evolutionary signatures of motif instances
Allow for motif movements Sequencing/alignment errors Loss, movement, divergence Measure branch-length score Sum evidence along branches Close species little contribution BLS: 25% Mef2:YTAWWWWTAR BLS: 83%

12 Motif confidence selects functional instances
Transcription factor motifs Confidence Confidence Increasing BLS  Increasing confidence Confidence selects functional regions Confidence selects in vivo bound sites High sensitivity microRNA motifs Increasing BLS  Increasing confidence Confidence selects functional regions Confidence selects positive strand

13 6. Initial regulatory network for an animal genome
ChIP-grade quality Similar functional enrichment High sens. High spec. Systems-level 81% of Transc. Factors 86% of microRNAs 8k + 2k targets 46k connections Lessons learned Pre- and post- are correlated (hihi/lolo) Regulators are heavily targeted, feedback loop

14 Network captures literature-supported connections

15 Network captures co-expression supported edges
Red = co-expressed Grey = not co-expressed Named = literature-supported Bold = literature-supported

16 7. ChIP vs. conservation: similar power / complementary
Together: best  complementary Bound but not conserved: reduced enrich.  Selects functional All-ChIP vs. All-cons: similar enr.  Similar power Cons-only vs. ChIP-all: similar  Additional sites

17 Recovery of regulatory motif instances in mammals
(80% confidence) ~6X 11,000 instances 10k 8k 6k miRNA motif instances recovered (80%) Total branch length of inf. species 4k 2k HMRD (0.74) pl-mam (3.36) mamm. (4.33) H+non-mamm. (6.36) HMRD+ non-mam (6.96) All vertebr. (9.66) ~6X Performance increases with branch length (requires closely-related species) Measure number of recovered motif instances at a fixed confidence (80%) / FDR (20%) Discovery power: 6-fold higher than HMRD (Branch length also ~6-fold higher) With 20 currently-aligned mammals: Transcription factor motifs: 47 TFs | 16,000 instances | 340 targets on avg microRNA motifs 21 miRNAs | 11,000 instances | 523 targets on avg An initial regulatory network for mammalian genomes

18 New insights into animal biology

19 1.Large-scale evidence of translational read-through
Protein-coding conservation Continued protein-coding conservation No more conservation Stop codon read through 2nd stop codon New mechanism of post-transcriptional control. Hundreds of fly genes, handful of human genes. Enriched in brain proteins, ion channels. Experiments show ADAR necessary & sufficient (Reenan Lab). Many questions remain A-to-I editing of stop codon TAG|TGA|TAA  TGG Cryptic splice sites? RNA secondary structure?

20 2. Stop codon read-through in mammals
Four candidates found: GPX2, OPRK1, OPRL1, GRIA2, mostly neuronal A look at FOXP2 – Possible 3’UTR function (not in fish, yes in frog)

21 3. New insights into miRNA regulation: miRNA* function
Both miRNA arms can be functional High scores, abundant processing, conserved targets Hox miRNAs miR-10 and miR-iab-4 as master Hox regulators

22 4. New insights into miRNA regulation: miR-AS function
A single miRNA locus transcribed from both strands Both processed to mature miRNAs: mir-iab-4, miR-iab-4AS (anti-sense) The two miRNAs show distinct expression domains (mutually exclusive) The two show distinct Hox targets – another Hox master regulator

23 5. New insights into miRNA regulation: miR-AS function
wing w/bristles Sensory bristles haltere wing haltere WT Note: C,D,E same magnification wing sense Antisense Mis-expression of mir-iab-4S & AS: altereswings homeotic transform. Stronger phenotype for AS miRNA Sense/anti-sense pairs as general building blocks for miRNA regulation 9 new anti-sense miRNAs in mouse

24 Summary of Contributions
Evolutionary signatures specific to each function Protein-coding genes: Revised catalogue affects 10% of genes RNA: hundreds of new high-confidence structures discovered miRNAs: ~double number of genes, families, targeting density Motifs: ~double number of motifs, tissue & positional enrichment Targets: ChIP-grade quality, global scale, experimental support New insights on animal biology Genes: Abundant stop codon read-through in neuronal proteins RNA: Abundant structures in RNA editing, translational regulation Motifs: Coding regions show miRNA targeting miRNAs: miR/miR* and sense/anti-sense pairs: building blocks Networks: TF vs. miRNA targets redundancy and integration Methods are general, applicable in any species

25 Next steps: Drosophila and Human ENCODE
modENCODE: White / Ren / Kellis / Posakony Hundreds of sequence-specific factors Dozens of chromatin / histone modifications Dozens of tissues / stages / conditions humENCODE: Bernstein / Lander / Kellis / Broad ChIP-seq for dozens of chromatin modifications Follow differentiation lineages – activation inactivation Discover tissue-specific regulatory motifs Many open questions remain Dynamics of tissue-specific regulatory networks Sequence determinants of chromatin establ. & maint Global views of pre- & post-transcriptional regulation Many open positions remain (postdoc/grad/ugrad)

26 Acknowledgements Alex Stark Mike Lin Pouya Kheradpour Matt Rasmussen
Genes FlyBase, BDGP, Bill Gelbart, Sue Celniker, Lynn Crosby miRNAs Leo Parts, Julius Brennecke, Greg Hannon, David Bartel iab-4AS Natascha Bushati, Steve Cohen, Julius, Greg Hannon 12-flies Andy Clark, Mike Eisen, Bill Gelbart, Doug Smith 24 mammals Sante Gnerre, Michele Clamp, Manuel Garber, Eric Lander


Download ppt "Comparative genomics in flies and mammals"

Similar presentations


Ads by Google