Alternative splicing: A playground of evolution Mikhail Gelfand Institute for Information Transmission Problems, RAS May 2004.

Slides:



Advertisements
Similar presentations
A very short introduction (in plants)
Advertisements

Ab initio gene prediction Genome 559, Winter 2011.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
1 Alternative Splicing. 2 Eukaryotic genes Splicing Mature mRNA.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Alternative splicing and evolution Daniel Jeffares.
Protein Modules An Introduction to Bioinformatics.
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Characterizing Alternative Splicing With Respect To Protein Domains BME 220 Project Charlie Vaske.
The Influence of Alternative Splicing in Protein Structure The fact that gene number is not significantly different between mammals and some invertebrates.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Generating Diversity: how genes and genomes evolve Erin “They call me Dr. Worm” Friedman 29 September 2005.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
1. Bacterial genomes - genes tightly packed, no introns... HOW TO FIND GENES WITHIN A DNA SEQUENCE? Scan for ORFs (open reading frames) - check all 6 reading.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Sackler Medical School
Protein and RNA Families
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Mark D. Adams Dept. of Genetics 9/10/04
Using Exons to Define Isoforms in PRO Timothy Danford Novartis Institutes for Biomedical Research PRO / AlzForum Kickoff Meeting Oct. 4, 2011.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Research about Alternative Splicing recently 楊佳熒.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation.
Chapter 3 The Interrupted Gene.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
A knowledge-based approach to integrated genome annotation Michael Brent Washington University.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Eukaryotic Gene Finding
Ab initio gene prediction
Ensembl Genome Repository.
Chapter 4 The Interrupted Gene.
Introduction to Alternative Splicing and my research report
Basic Local Alignment Search Tool
Presentation transcript:

Alternative splicing: A playground of evolution Mikhail Gelfand Institute for Information Transmission Problems, RAS May 2004

Alternative splicing of human (and mouse) genes

Alternative splicing of orthologous human and mouse genes Sequence divergence in alternative and constitutive regions Evolution of splicing sites Alternative splicing and protein structure

Data known alternative splicing –HASDB (human, ESTs+mRNAs) –ASMamDB (mouse, mRNAs+genes) additional variants –UniGene (human and mouse EST clusters) complete genes and genomic DNA –GenBank (full-length mouse genes) –human genome

Methods Direct comparison of EST-derived alternatives difficult because of uneven coverage. Instead, align alternative isoforms from one species to the genomic DNA of other species. If alignable (complete exon or part of exon, no significant loss of similarity, no in-frame stops, conserve splicing sites), then conserved. This is an upper estimate on conservation: an isoform may be non-functional for other reasons (e.g. disruption of regulatory sites). Cannot analyze skipped exons.

Tools TBLASTN (initial identification of orthologs: mRNAs against genomic DNA) BLASTN (human mRNAs against genome) Pro-EST (spliced alignment, ESTs and mRNA against genomic DNA) Pro-Frame (spliced alignment, proteins against genomic DNA) –confirmation of orthology same exon-intron structure >70% identity over the entire protein length –analysis of conservation of alternative splicing conservation of exons or parts of exons conservation of sites

166 gene pairs human mouse Known alternative splicing:

Elementary alternatives Cassette exon Alternative donor site Alternative acceptor site Retained intron

Human genes mRNAEST cons.non-cons.cons.non-cons. Cassette exons Alt. donors Alt. acceptors Retained introns4350 Total Total genes Conserved elementary alternatives: 69% (EST) - 76% (mRNA) Genes with all isoforms conserved: 57 (45%)

Mouse genes mRNAEST cons.non-cons.cons.non-cons. Cassette exons Alt. donors Alt. acceptors Retained introns87104 Total Total genes Conserved elementary alternatives: 75% (EST) - 83% (mRNA) Genes with all isoforms conserved: 79 (64%)

Real or aberrant non-conserved AS? 24-31% human vs % mouse elementary alternatives are not conserved 55% human vs 36% mouse genes have at least one non-conserved variant denser coverage of human genes by ESTs: –pick up rare (tissue- and stage-specific) => younger variants –pick up aberrant (non-functional) variants 17-24% mRNA-derived elementary alternatives are non-conserved (compared to 25-32% EST- derived ones)

smoothelin human common mouse human-specific donor-site mouse-specific cassette exon

autoimmune regulator human common mouse retained intron; downstream exons read in two frames

Na/K-ATPase gamma subunit (Fxyd2) human mouse (deleted) intron common alternative acceptor site within (inserted) intron

MutS homolog (DNA mismatch repair) human common dual donor/acceptor site

Modrek and Lee, 2003: conserved skipped exons: –98% constitutive –98% major form –28% minor form inclusion level: –highly correlated – good predictor of conservation Minor non-conserved form exons are not aberrant: –minor form exons are supported by multiple ESTs –28% of minor form exons are upregulated in one specific tissue –70% of tissue-specific exons are not conserved Thanaraj et al., 2003: 61% (47-86%) alternative splice junctions are conserved

Alternative splicing of orthologous human and mouse genes Sequence divergence in alternative and constitutive regions Evolution of splicing sites Alternative splicing and protein structure

Our preliminary observations: less synonymous, more non-synonymous divergence in alternative exons (human/mouse) => positive selection towards variability “Contrary to our prediction, synonymous divergence between humans and non-human mammals was significantly higher in constitutive exons … Intriguingly, non- synonymous divergence was marginally significantly higher in alternative exons” Iida and Akashi, 2000

279 proteins from SwissProt+TREMBL with “varsplic” features constitutivealternative% alt. to all length % all SNPs % synonymous576 (51%)167 (45%)22% benign401 (36%)141 (38%)26% damaging149 (13%)60 (16%)29% again, there is some evidence of positive selection towards diversity. This is not due to aberrant ESTs (only protein data are considered).

Alternative splicing of orthologous human and mouse genes Sequence divergence in alternative and constitutive regions Evolution of splicing sites Alternative splicing and protein structure

Alternative splicing in a multigene family: the MAGEA family of cancer/testis specific antigens A locus at the X chromosome containing eleven recently duplicated genes: two subfamilies of four genes each and three single genes One protein-coding exon, multiple different 5’- UTR exons Originates from retroposed spliced mRNA Mutations create new splicing sites or disrupt existing sites

Phylogenetic trees (protein-coding and upstream regions)

Expression data pooled by organ/tissue; maximum recorded expression level retained no data for MAGEA10; MAGEA3 and MAGEA6 likely non-distinguishable green: normal; brown: cancer

Simple genes with alternatives in exon 1 (MAGEA1, MAGEA5, MAGEA3/6) 1 1b MAGEA1 1 MAGEA5 (normal placenta) 1 MAGEA3 1a 1 1 MAGEA6 (testis, brain/medulla, cancer) 1a

Two more genes of subfamily B: multiple isoforms of MAGEA2 and a deletion in MAGEA12 MAGEA a 4d MAGEA

Isoforms of subfamily A d 2 4a 4c 4b MAGEA8 MAGEA9 (testis, no cancers) MAGEA10 MAGEA11

Multiple duplications of the initial exon in MAGEA MAGEA4 (testis and cancers; brain/medulla; also common 3’ ESTs in placenta)

Chimaeric mRNAs (splicing of readthrough transcripts) 1 initial exon of MAGEA10exons of MAGEA5 exon in intergenic space initial exon of MAGEA12 exons of BC exon in intergenic space

Other examples: galactose-1-phosphate uridylyltransferase + interleukin-11 receptor alpha chain (Magrangeas et al., 1998) P2Y11 [receptor] + SSF1 [nuclear protein] (Communi et al., 2001) PrP [Prion protein] + Dpl [prion-like protein Doppel] (Moore et al., 1999) cytochrome P450 3A: CYP3A7 + two exons of a downstream pseudogene read in a different frame (Finta & Zaphiropoulos, 2000) HHLA1 + OC90 [otoconin-90] (Kowalski et al., 1999) TRAX [translin-associated factor X] + DISC1 [candidate schizophirenia gene] (Millar et al., 2000) Kua + UEV1 [polyubiquination coeffector] (Thomson et al., 2000) FR + GAP [Rho GTPase activating protein] (Romani et al., 2003) - ? methyonyl tRNA synthetase + advillin (Romani et al., 2003) - ?

Birth of donor sites (new GT in alternative intial exon 5)

Birth of an acceptor site (new AG and polyY tract in MAGEA8-specific cassette exon 3)

Birth of an alternative donor site (enhanced match to the consensus (AG) in cassette exon 2)

Birth of an alternative acceptor site (enhanced polyY tract in cassette exon 4)

Disactivation of a donor site and birth of a new site (non-consensus G and new GT in major-isoform cassette exon 4)

Series of mutations sequentially activating downstream acceptor sites (mutated AG in exon 4)

Alternative splicing of orthologous human and mouse genes Sequence divergence in alternative and constitutive regions Evolution of splicing sites Alternative splicing and protein structure

Data Alternatively spliced genes (proteins) from SwissProt –human –mouse Protein structures from PDB Domains from InterPro –SMART –Pfam –Prosite –etc.

Alternative splicing avoids disrupting domains (and non-domain units) Control: fix the domain structure; randomly place alternative regions

… and this is not simply a consequence of the (disputed) exon-domain correlation

Positive selection towards domain shuffling (not simply avoidance of disrupting domains)

Short (<50 aa) alternative splicing events within domains target protein functional sites c) Prosite patterns unaffected Prosite patterns affected FT positions unaffected FT positions affected ExpectedObserved

An attempt of integration AS is often young (as opposed to degenerating) young AS isoforms are often minor and tissue-specific … but still functional –although unique isoforms may be result of aberrant splicing AS regions show evidence for positive selection –excess damaging SNPs –excess non-synonymous codon substitutions MAGEA - not aberrant, because explainable by effects of mutations

What to do Each isoform (alternative region) can be characterized: –by conservation (between genomes) –if conserved, by selection (positive vs negative) human-mouse, also add rat –pattern of SNPs (synonymous, benign, damaging) –tissue-specificity in particular, whether it is cancer-specific –degree of inclusion (major/minor) –functionality (for isoforms) whether it generates a frameshift how bad it is (the distance between the stop-codon and the last exon-exon junction)

What to expect (hypotheses) Cancer-specific isoforms will be less functional and more often non-conserved Non-conserved isoforms will contain a larger fraction of non-functional isoforms; and this may influence evolutionary conclusions Still, after removal of non-functional isoforms, one should see positive selection in alternative regions (more non-synonymous substitutions compared to constant regions etc.); especially in tissue-specific ones.

Plans careful and detailed analysis of human- mouse-(rat)-((dog)) AS isoforms (human and mouse ESTs) conservation of AS regulatory sites mosquito-drosophila more families of paralogs; add mouse data AS of transcription factors and receptors

Acknowledgements Discussions –Vsevolod Makeev (GosNIIGenetika) –Eugene Koonin (NCBI) –Igor Rogozin (NCBI) –Dmitry Petrov (Stanford) Support –Ludwig Institute of Cancer Research –Howard Hughes Medical Institute

Authors Andrei Mironov (GosNIIGenetika) – spliced alignment Shamil Sunyaev (EMBL, now Harvard University Medical School) – protein structure Vasily Ramensky (Institute of Molecular Biology) – SNPs Irena Artamonova (Institute of Bioorganic Chemistry) – human/mouse comparison, MAGEA family Dmitry Malko (GosNIIGenetika) – mosquito/drosophila comparison Eugenia Kriventseva (EBI, now BASF) – protein structure Ramil Nurtdinov (Moscow State University) – human/mouse comparison Ekaterina Ermakova (Moscow State University) – evolution of alternative/constitutive regions

References Nurtdinov RN, Artamonova II, Mironov AA, Gelfand MS (2003) Low conservation of alternative splicing patterns in the human and mouse genomes. Human Molecular Genetics 12: Kriventseva EV, Koch I, Apweiler R, Vingron M, Bork P, Gelfand MS, Sunyaev S. (2003) Increase of functional diversity by alternative splicing. Trends in Genetics 19: Brudno M, Gelfand MS, Spengler S, Zorn M, Dubchak I, Conboy JG (2001) Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing. Nucleic Acids Research 29: Dralyuk I, Brudno M, Gelfand MS, Zorn M, Dubchak I (2000) ASDB: database of alternatively spliced genes. Nucleic Acids Research 28: Mironov AA, Fickett JW, Gelfand MS (1999). Frequent alternative splicing of human genes. Genome Research 9: