Comparative genomics in flies and mammals

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

Regulation of eukaryotic gene sequence expression Lecture 6.
Manolis Kellis: Research synopsis Brief overview 1 slide each vignette Why biology in a computer science group? Big biological questions: 1.Interpreting.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Two short pieces MicroRNA Alternative splicing.
Functional Non-Coding DNA Part I Non-coding genes and non-coding elements of coding genes BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG.
Predicting RNA Structure and Function
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
1 Alternative Splicing. 2 Eukaryotic genes Splicing Mature mRNA.
Comparative Motif Finding
Computational biology seminar
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Presenting: Asher Malka Supervisor: Prof. Hermona Soreq.
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
Regulation of eukaryotic gene sequence expression
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
13.4 Gene Regulation and Expression
Gene Regulation An expressed gene is one that is transcribed into RNA
Interpreting the human genome Manolis Kellis CSAIL MIT Computer Science and Artificial Intelligence Lab Broad Institute of MIT and Harvard for Genomics.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Small RNAs and their regulatory roles. Presented by: Chirag Nepal.
Manolis Kellis modENCODE analysis group January 11, 2007 Part 1: Target identification: comparative vs. exprmt. (really the topic for today) Part 2: Target.
Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation.
CSLS Retreat 2007 Matan Hofree & Assaf Weiner 1. Outline  A brief introduction to microRNA  Project motivation and goal  Selecting the data sets 
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
TITLE OF PRESENTATION Board of Scientific Counselors January 2007 Your Name.
Questions?. Novel ncRNAs are abundant: Ex: miRNAs miRNAs were the second major story in 2001 (after the genome). Subsequently, many other non-coding genes.
Mark D. Adams Dept. of Genetics 9/10/04
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Levels at which eukaryotic gene expression is controlled
Comparative genomics of 24 mammals Manolis Kellis MIT MIT Computer Science & Artificial Intelligence Laboratory Broad Institute of MIT and Harvard.
Last Class 1. Transcription 2. RNA Modification and Splicing
Motif Search and RNA Structure Prediction Lesson 9.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Regulatory Non-Coding RNAs Yair Aaronson
Regulating Gene Expression WITH OVER GENES IN EVERY CELL, HOW DOES THE CELL KNOW WHAT GENES TO EXPRESS AND WHEN TO EXPRESS THEM?
Gene Regulation, Part 2 Lecture 15 (cont.) Fall 2008.
GROUP 2 DNA TO PROTEIN. 9.1 RICIN AND YOUR RIBOSOMES.
Regulation of Gene Expression
The Transcriptional Landscape of the Mammalian Genome
Fig Prokaryotes and Eukaryotes
Lesson Four Structure of a Gene.
Lesson Four Structure of a Gene.
Eukaryotic Genome & Gene Regulation
The Central Dogma Transcription & Translation
Transcription & Gene Expression
Predicting RNA Structure and Function
Structure of proximal and distant regulatory elements in the human genome Ivan Ovcharenko Computational Biology Branch National Center for Biotechnology.
Lecture 6 By Ms. Shumaila Azam
Non-coding RNA April 11, 2018.
13.4 Gene Regulation and Expression
Chapter 11 Gene Expression.
Chapter 13 Regulatory RNA.
Bellwork: How is gene regulation in prokaryotes and Eukaryotes similar
Interpreting the human genome
Non-coding RNA October 25, 2017.
MicroRNAs: regulators of gene expression and cell differentiation
mRNA Degradation and Translation Control
MicroRNAs: Hidden in the Genome
In collaboration with Mikkelsen Lab
Higher Biology Unit 1: 1.3 Transcription.
The Structure of the Genome
Volume 128, Issue 6, Pages (March 2007)
From DNA to Protein Class 4 02/11/04 RBIO-0002-U1.
Study phylogeny in the context of species evolution
Volume 11, Issue 7, Pages (May 2015)
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

Comparative genomics in flies and mammals Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory

Resolving power in mammals, flies, fungi 8 Candida 9 Yeasts Post-duplication Diploid Haploid Pre-dup P Many species lead to high resolving power in close distances

Comparative genomics and evolutionary signatures Comparative genomics can reveal functional elements For example: exons are deeply conserved to mouse, chicken, fish Many other elements are also strongly conserved: exons / regulatory? Can we also pinpoint specific functions of each region? Yes! Patterns of change distinguish different types of functional elements Specific function  Selective pressures  Patterns of mutation/inse/del Develop evolutionary signatures characteristic of each function

1. Evolutionary signature of protein-coding genes Revise protein-coding gene catalogue

Protein-coding evolution vs. nucleotide conservation High protein-coding signal, low conservation  Evolutionary signatures highly sensitive High conservation, but not protein-coding  Evolutionary signatures highly specific Annotated FlyBase gene Existing cDNA data New predicted exon cDNA validation (iPCR)

2. Evolutionary signatures of RNA genes Typical substitutions Compensatory changes G:C  G:U … G:U  A:U Prediction methodology Jakob Pedersen: EvoFold with very stringent parameters

Reveal novel RNA genes and structures Intronic: enriched in A-to-I editing, also novel ncRNAs Coding: A-to-I editing, also translational regulation 3’UTRs: enriched in regulators of mRNA localization 5’UTRs: translational regulation, ribosomal proteins - 3’ & 5’UTR structures mostly on coding strand (75% & 80%)

3. Structural and evolutionary signatures of miRNAs Discover novel miRNAs Recognize miRNA hairpin Length of hairpin & length of arms Fold stability, symm/assym bulges Conservation profile: high|low|high Pinpoint mature miRNA 5’end Perfect 8mer conservation at start Predominance of 5’U (78%) Number of paired bases is bound Complementary to 3’UTR motifs Revise existing miRNAs

4. Evolutionary signatures for regulatory motifs Known engrailed site (footprint) D.mel CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.sim CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.sec CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.yak CAGC--TAGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.ere CAGCGGTCGCCAAACTCTCTAATTAGCGACCAAGTC-CAAGTC D.ana CACTAGTTCCTAGGCACTCTAATTAGCAAGTTAGTCTCTAGAG ** * * *********** * **** * ** D.mel D. ere D. ana D. pse. Motifs discovered - Recover known regulators - Many novel motifs Evidence for novel motifs Tissue-specific enrichment Functional enrichment In promoters & enhancers Surprises Core promoter elements miRNA motifs in coding ex. To address this question, we first studied the conservation properties of known regulatory motifs. Here, you can see the alignment of the promoter of Gabpa gene across the four mammalian species used. The alignment of this region reveals a small conservation island which seems to be under selection. This island in fact corresponds to a functional binding site for Erra, which has been experimentally validated. The ERRa motif appears perfectly conserved in the four species and stands out from the largely diverged neighboring sequences. However, is this enough to discover regulatory motifs? The answer is clearly no, since many many such small islands exist, and the vast majority of them are likely to be solely due to chance.

Functions of discovered motifs Positional biases Tissue-specific enrichment and clustering miRNA targeting in coding regions

5. Evolutionary signatures of motif instances Allow for motif movements Sequencing/alignment errors Loss, movement, divergence Measure branch-length score Sum evidence along branches Close species little contribution BLS: 25% Mef2:YTAWWWWTAR BLS: 83%

Motif confidence selects functional instances Transcription factor motifs Confidence Confidence Increasing BLS  Increasing confidence Confidence selects functional regions Confidence selects in vivo bound sites High sensitivity microRNA motifs Increasing BLS  Increasing confidence Confidence selects functional regions Confidence selects positive strand

6. Initial regulatory network for an animal genome ChIP-grade quality Similar functional enrichment High sens. High spec. Systems-level 81% of Transc. Factors 86% of microRNAs 8k + 2k targets 46k connections Lessons learned Pre- and post- are correlated (hihi/lolo) Regulators are heavily targeted, feedback loop

Network captures literature-supported connections

Network captures co-expression supported edges Red = co-expressed Grey = not co-expressed Named = literature-supported Bold = literature-supported

7. ChIP vs. conservation: similar power / complementary Together: best  complementary Bound but not conserved: reduced enrich.  Selects functional All-ChIP vs. All-cons: similar enr.  Similar power Cons-only vs. ChIP-all: similar  Additional sites

Recovery of regulatory motif instances in mammals (80% confidence) ~6X 11,000 instances 10k 8k 6k miRNA motif instances recovered (80%) Total branch length of inf. species 4k 2k HMRD (0.74) pl-mam (3.36) mamm. (4.33) H+non-mamm. (6.36) HMRD+ non-mam (6.96) All vertebr. (9.66) ~6X Performance increases with branch length (requires closely-related species) Measure number of recovered motif instances at a fixed confidence (80%) / FDR (20%) Discovery power: 6-fold higher than HMRD (Branch length also ~6-fold higher) With 20 currently-aligned mammals: Transcription factor motifs: 47 TFs | 16,000 instances | 340 targets on avg microRNA motifs 21 miRNAs | 11,000 instances | 523 targets on avg An initial regulatory network for mammalian genomes

New insights into animal biology

1.Large-scale evidence of translational read-through Protein-coding conservation Continued protein-coding conservation No more conservation Stop codon read through 2nd stop codon New mechanism of post-transcriptional control. Hundreds of fly genes, handful of human genes. Enriched in brain proteins, ion channels. Experiments show ADAR necessary & sufficient (Reenan Lab). Many questions remain A-to-I editing of stop codon TAG|TGA|TAA  TGG Cryptic splice sites? RNA secondary structure?

2. Stop codon read-through in mammals Four candidates found: GPX2, OPRK1, OPRL1, GRIA2, mostly neuronal A look at FOXP2 – Possible 3’UTR function (not in fish, yes in frog)

3. New insights into miRNA regulation: miRNA* function Both miRNA arms can be functional High scores, abundant processing, conserved targets Hox miRNAs miR-10 and miR-iab-4 as master Hox regulators

4. New insights into miRNA regulation: miR-AS function A single miRNA locus transcribed from both strands Both processed to mature miRNAs: mir-iab-4, miR-iab-4AS (anti-sense) The two miRNAs show distinct expression domains (mutually exclusive) The two show distinct Hox targets – another Hox master regulator

5. New insights into miRNA regulation: miR-AS function wing w/bristles Sensory bristles haltere wing haltere WT Note: C,D,E same magnification wing sense Antisense Mis-expression of mir-iab-4S & AS: altereswings homeotic transform. Stronger phenotype for AS miRNA Sense/anti-sense pairs as general building blocks for miRNA regulation 9 new anti-sense miRNAs in mouse

Summary of Contributions Evolutionary signatures specific to each function Protein-coding genes: Revised catalogue affects 10% of genes RNA: hundreds of new high-confidence structures discovered miRNAs: ~double number of genes, families, targeting density Motifs: ~double number of motifs, tissue & positional enrichment Targets: ChIP-grade quality, global scale, experimental support New insights on animal biology Genes: Abundant stop codon read-through in neuronal proteins RNA: Abundant structures in RNA editing, translational regulation Motifs: Coding regions show miRNA targeting miRNAs: miR/miR* and sense/anti-sense pairs: building blocks Networks: TF vs. miRNA targets redundancy and integration Methods are general, applicable in any species

Next steps: Drosophila and Human ENCODE modENCODE: White / Ren / Kellis / Posakony Hundreds of sequence-specific factors Dozens of chromatin / histone modifications Dozens of tissues / stages / conditions humENCODE: Bernstein / Lander / Kellis / Broad ChIP-seq for dozens of chromatin modifications Follow differentiation lineages – activation inactivation Discover tissue-specific regulatory motifs Many open questions remain Dynamics of tissue-specific regulatory networks Sequence determinants of chromatin establ. & maint Global views of pre- & post-transcriptional regulation Many open positions remain (postdoc/grad/ugrad)

Acknowledgements Alex Stark Mike Lin Pouya Kheradpour Matt Rasmussen Genes FlyBase, BDGP, Bill Gelbart, Sue Celniker, Lynn Crosby miRNAs Leo Parts, Julius Brennecke, Greg Hannon, David Bartel iab-4AS Natascha Bushati, Steve Cohen, Julius, Greg Hannon 12-flies Andy Clark, Mike Eisen, Bill Gelbart, Doug Smith 24 mammals Sante Gnerre, Michele Clamp, Manuel Garber, Eric Lander