Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.

Slides:



Advertisements
Similar presentations
Two short pieces MicroRNA Alternative splicing.
Advertisements

Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
1 Alternative Splicing. 2 Eukaryotic genes Splicing Mature mRNA.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Gene Finding Charles Yan.
Alternative splicing and evolution Daniel Jeffares.
Finding genes in human using the mouse Finding genes in mouse using the human Lior Pachter Department of Mathematics U.C. Berkeley.
Alternative Splicing As an introduction to microarrays.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Characterizing Alternative Splicing With Respect To Protein Domains BME 220 Project Charlie Vaske.
The Influence of Alternative Splicing in Protein Structure The fact that gene number is not significantly different between mammals and some invertebrates.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Lecture 12 Splicing and gene prediction in eukaryotes
RNA processing. RNA species in cells RNA processing.
Anum kamal(BB ) Umm-e-Habiba(BB ). Gene splicing “Gene splicing is the removal of introns from the primary trascript of a discontinuous gene.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Name of the topic Author, group, General information about protein (we suggest to use the websites
1. Bacterial genomes - genes tightly packed, no introns... HOW TO FIND GENES WITHIN A DNA SEQUENCE? Scan for ORFs (open reading frames) - check all 6 reading.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
Molecular Biology in a Nutshell (via UCSC Genome Browser) Personalized Medicine: Understanding Your Own Genome Fall 2014.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Construction of Substitution Matrices
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Sackler Medical School
Alternative splicing: A playground of evolution Mikhail Gelfand Institute for Information Transmission Problems, RAS May 2004.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Mark D. Adams Dept. of Genetics 9/10/04
Using Exons to Define Isoforms in PRO Timothy Danford Novartis Institutes for Biomedical Research PRO / AlzForum Kickoff Meeting Oct. 4, 2011.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Copyright © 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings PowerPoint ® Lecture Presentations for Biology Eighth Edition Neil Campbell.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Research about Alternative Splicing recently 楊佳熒.
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10: , 2004.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation.
Chapter 3 The Interrupted Gene.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
bacteria and eukaryotes
Genomes and Their Evolution
Ab initio gene prediction
What are the Patterns Of Nucleotide Substitution Within Coding and
Ensembl Genome Repository.
Alternative Splicing May Not Be the Key to Proteome Complexity
Introduction to Alternative Splicing and my research report
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Presentation transcript:

Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems

Alternative splicing of human (and mouse) genes

Evolution of alternative exon- intron structure –human-mouse –Drosophila and Anopheles Evolution of alternative splicing sites: MAGE-A family of CT antigens Evolutionary rate in constitutive and alternative regions –human-mouse –human SNPs Alternative splicing and protein structure

Data and Methods (routine) known alternative splicing –HASDB (human, ESTs+mRNAs) –ASMamDB (mouse, mRNAs+genes) additional variants –UniGene (human and mouse EST clusters) complete genes and genomic DNA –GenBank (full-length mouse genes) –human genome TBLASTN (initial identification of orthologs: mRNAs against genomic DNA) BLASTN (human mRNAs against genome) Pro-EST (spliced alignment, ESTs and mRNA against genomic DNA)

Pro-Frame (spliced alignment of proteins against genomic DNA) –confirmation of orthology: same exon-intron structure for at least one isoform >70% identity over the entire protein length –analysis of conservation of human alternative splicing in the mouse genome: align human protein to mouse genomic DNA; the isoform is conserved if all exons or parts of exons are conserved all sites are conserved –same procedure for mouse proteins and human DNA We do not require that the isoform is actually observed as mRNA or ESTs

166 gene pairs human mouse Known alternative splicing:

Elementary alternatives Cassette exon Alternative donor site Alternative acceptor site Retained intron

Human genes mRNAEST cons.non-cons.cons.non-cons. Cassette exons Alt. donors Alt. acceptors Retained introns4350 Total Total genes Conserved elementary alternatives: 69% (EST) - 76% (mRNA) Genes with all isoforms conserved: 57 (45%)

Mouse genes mRNAEST cons.non-cons.cons.non-cons. Cassette exons Alt. donors Alt. acceptors Retained introns87104 Total Total genes Conserved elementary alternatives: 75% (EST) - 83% (mRNA) Genes with all isoforms conserved: 79 (64%)

Real or aberrant non-conserved AS? 24-31% human vs % mouse elementary alternatives are not conserved 55% human vs 36% mouse genes have at least one non-conserved variant denser coverage of human genes by ESTs: –pick up rare (tissue- and stage-specific) => younger variants –pick up aberrant (non-functional) variants 17-24% mRNA-derived elementary alternatives are non-conserved (compared to 25-32% EST- derived ones)

Comparison to other studies. Modrek and Lee, 2003: skipped exons inclusion level is a good predictor of conservation –98% constitutive exons are conserved –98% major form exons are conserved –28% minor form exons are conserved inclusion level of conserved exons in human and mouse is highly correlated Minor non-conserved form exons are errors? No: –minor form exons are supported by multiple ESTs –28% of minor form exons are upregulated in one specific tissue –70% of tissue-specific exons are not conserved –splicing signals of conserved and non-conserved exons are similar

Evolution of alternative exon- intron structure –human-mouse –Drosophila and Anopheles Evolution of alternative splicing sites: MAGE-A family of CT antigens Evolutionary rate in constitutive and alternative regions –human-mouse –human SNPs Alternative splicing and protein structure

Fruit fly and mosquito Technically more difficult than human- mouse: –incomplete genomes –difficulties in alignment, especially at gene termini –changes in exon-intron structure irrespective of alternative splicing (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)

Methods Pro-Frame: Align Dme protein isoforms to Dps and Aga genes coding segments: regions in Dme genes between Dme intron shadows We follow the fate of Dme exons and coding segments in Dps and Aga genomes slices: regions between all exon-exon junctions (intron shadows) from all three genomes (Dme, Dps, Aga) mapped to Dme isoforms slice is conserved if it aligns with  35% identity

Conservation of coding segments constitutive segments alternative segments D. melanogaster – D. pseudoobscura 97%75-80% D. melanogaster – Anopheles gambiae 77%~45%

Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes blue – exact green – divided exons yellow – joined exon orange – mixed red – non-conserved retained introns are the least conserved mutually exclusive exons are as conserved as constitutive exons

Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes blue – exact green – divided exons yellow – joined exons orange – mixed red – non-conserved ~30% joined, ~10% divided exons (less introns in Aga) mutually exclusive exons are conserved exactly cassette exons are the least conserved

CG1517: cassette exon in Drosophila, alternative acceptor site in Anopheles Dme, Dps Aga a)

CG31536: cassette exon in Drosophila, shorter cassette exon and alternative donor site in Anopheles Dme, Dps Aga

CG1587: alternative acceptor site in Drosophila, candidate retained intron in intronless gene of Anopheles Dme Aga Dps

Evolution of alternative exon-intron structure –human-mouse –Drosophila and Anopheles Evolution of alternative splicing sites: MAGE-A family of CT antigens Evolutionary rate in constitutive and alternative regions –human-mouse –human SNPs Alternative splicing and protein structure

Alternative splicing in a multigene family: the MAGEA family of cancer/testis specific antigens A locus at the X chromosome containing eleven recently duplicated genes: two subfamilies of four genes each and three single genes Retrogene: one protein-coding exon, multiple different 5’-UTR exons Mutations create new splicing sites or disrupt existing sites

Birth of donor sites (new GT in alternative intial exon 5)

Birth of an acceptor site (new AG and polyY tract in MAGEA8-specific cassette exon 3)

Birth of an alternative donor site (enhanced match to the consensus (AG) in cassette exon 2)

Birth of an alternative acceptor site (enhanced polyY tract in cassette exon 4)

Evolution of alternative exon-intron structure –human-mouse –Drosophila and Anopheles Evolution of alternative splicing sites: MAGE-A family of CT antigens Evolutionary rate in constitutive and alternative regions –human-mouse –human SNPs Alternative splicing and protein structure

Concatenates of constitutive and alternative regions in all genes: different evolutionary rates Columns (left-to-right) – (1) constitutive regions; (2–4) alternative regions: N-end, internal, C-end Relatively more non-synonimous substitutions in alternative regions (higher dN/dS ratio) Less amino acid identity in alternative regions

Individual genes: the rate of non-synonymous to synonymous substitutions d n /d s tends to be larger in alternative regions (vertical acis) than in constitutive regions (horizontal acis)

d n /d s (con) – d n /d s (alt) N-terminal regions complete genes internal regions C-terminal regions

Evolution of alternative exon-intron structure –human-mouse –Drosophila and Anopheles Evolution of alternative splicing sites: MAGE-A family of CT antigens Evolutionary rate in constitutive and alternative regions –human-mouse –human SNPs Alternative splicing and protein structure

Na/Ns (alternative) > Na/Ns (constitutive) for all evidence levels

Evolution of alternative exon-intron structure –human-mouse –Drosophila and Anopheles Evolution of alternative splicing sites: MAGE-A family of CT antigens Evolutionary rate in constitutive and alternative regions –human-mouse –human SNPs Alternative splicing and protein structure

Alternative splicing avoids disrupting domains (and non-domain units) Control: fix the domain structure; randomly place alternative regions

… and this is not simply a consequence of the (disputed) exon-domain correlation

Positive selection towards domain shuffling (not simply avoidance of disrupting domains)

Short (<50 aa) alternative splicing events within domains target protein functional sites c) Prosite patterns unaffected Prosite patterns affected FT positions unaffected FT positions affected ExpectedObserved

An attempt of integration AS is often young (as opposed to degenerating) young AS isoforms are often minor and tissue-specific … but still functional –although unique isoforms may be result of aberrant splicing AS often arises from duplication of exons … or point mutations creating splicing sites … or intron insertions AS regions show evidence for positive selection –excess non-synonymous and damaging SNPs –excess non-synonymous codon substitutions AS tends to shuffle exons and target functional sites in proteins Thus AS may serve as a testing ground for new functions without sacrificing old ones

Acknowledgements Discussions –Vsevolod Makeev (GosNIIGenetika) –Eugene Koonin (NCBI) –Igor Rogozin (NCBI) –Dmitry Petrov (Stanford) –Dmitry Frishman (GSF, TUM) Data –King Jordan (NCBI) Support –Ludwig Institute of Cancer Research –Howard Hughes Medical Institute –Russian Academy of Sciences (program “Molecular and Cellular Biology”) –Russian Fund of Basic Research

Authors Andrei Mironov (Moscow State University) – spliced alignment Ramil Nurtdinov (Moscow State University) – human/mouse, data Irena Artamonova (GSF/MIPS) – human/mouse, MAGE-A Dmitry Malko (GosNIIGenetika, Moscow) – mosquito/drosophila Ekaterina Ermakova (Moscow State University) – evolution of alternative/constitutive regions Vasily Ramensky (Institute of Molecular Biology, Moscow) – SNPs Shamil Sunyaev (EMBL, now Harvard University Medical School) – protein structure Eugenia Kriventseva (EBI, now EMBL) – protein structure