Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.

Slides:



Advertisements
Similar presentations
A very short introduction (in plants)
Advertisements

Modular proteins I Level 3 Molecular Evolution and Bioinformatics Jim Provan Patthy Sections –
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
1 Alternative Splicing. 2 Eukaryotic genes Splicing Mature mRNA.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Origins of recently gained introns in Caenorhabditis Avril Coghlan and Kenneth H. Wolfe Department of Genetics, Trinity College Dublin, Ireland.
Alternative splicing and evolution Daniel Jeffares.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
Sequencing a genome and Basic Sequence Alignment
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
Anum kamal(BB ) Umm-e-Habiba(BB ). Gene splicing “Gene splicing is the removal of introns from the primary trascript of a discontinuous gene.
Comparative Genomics of the Eukaryotes
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
RNA and Protein Synthesis
RNA AND PROTEIN SYNTHESIS RNA vs DNA RNADNA 1. 5 – Carbon sugar (ribose) 5 – Carbon sugar (deoxyribose) 2. Phosphate group Phosphate group 3. Nitrogenous.
Genomics Lecture 8 By Ms. Shumaila Azam. 2 Genome Evolution “Genomes are more than instruction books for building and maintaining an organism; they also.
Assignment 2: Papers read for this assignment Paper 1: PALMA: mRNA to Genome Alignments using Large Margin Algorithms Paper 2: Optimal spliced alignments.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
1 The Interrupted Gene. Ex Biochem c3-interrupted gene Introduction Figure 3.1.
The Biology and Genetic Base of Cancer. 2 (Mutation)
More regulating gene expression. Combinations of 3 nucleotides code for each 1 amino acid in a protein. We looked at the mechanisms of gene expression,
Sequencing a genome and Basic Sequence Alignment
Chapter 21 Eukaryotic Genome Sequences
Click to edit Master title style Click to edit Master subtitle style CLICKER QUESTIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry,
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Main Idea #4 Gene Expression is regulated by the cell, and mutations can affect this expression.
Alternative splicing: A playground of evolution Mikhail Gelfand Institute for Information Transmission Problems, RAS May 2004.
Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M Genome Res 17: example of complexities observed.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Using blast to study gene evolution – an example.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Research about Alternative Splicing recently 楊佳熒.
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation.
Chapter 3 The Interrupted Gene.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
How many genes are there?
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used.
Evolution of eukaryotic genomes
Evolution of gene function
Genetics and Evolutionary Biology
Basics of Comparative Genomics
Ab initio gene prediction
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
Gene duplications: evolutionary role
Evolution of eukaryote genomes
Chapter 4 The Interrupted Gene.
Alternative Splicing May Not Be the Key to Proteome Complexity
Chapter 6 Clusters and Repeats.
Basics of Comparative Genomics
Presentation transcript:

Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia RECOMB, 20 May 2008

% of alternatively spliced human and mouse genes, by year of publication Human (genome / random sample) Human (individual chromosomes) Mouse (genome / random sample) All genes Only multiexon genes Genes with high EST coverage 2008 C.Burge 100%

Roles of alternative splicing Functional: –creating protein diversity human: ~ genes, > proteins –maintaining protein identity e.g. membrane (receptor) and secreted isoforms dominant negative isoforms combinatorial (transcription factors, signaling domains) –regulatory e.g. via chanelling to NMD (nonsense-mediated decay) Evolutionary

Evolution of alternative exon- intron structure Origin of new (alternative) exons and sites Evolutionary rates in constitutive and alternative regions Plan

Elementary alternatives Cassette exon Alternative donor site Alternative acceptor site Retained intron Mutually exclusive exons

Sources of data ESTs: 1999 global comparative –mapping exon-intron structure to genome –global alignment of genomes –identifying non-conserved exons and splice sites oligonucleotide arrays (chips): 2001 global 2004 comparative –qualitative analysis (inclusion values) –genome-specific constitutive / alternative exons mRNA-seq (new generation high-throughput): 2008 global expected comparative

Alternative exons are often genome-specific (Modrek & Lee, 2003)

~ 25% AS events in ~50% genes are not conserved Na/K-ATPase Fxyd2/FXYD2 p53 Nurtdinov…Gelfand, 2003

Alternative exon-intron structure in fruit flies and malarial mosquito Same procedure (AS data from FlyBase) –cassette exons, splicing sites –also mutually exclusive exons, retained introns Follow the fate of D. melanogaster exons in the D. pseudoobscura and Anopheles genomes Technically more challenging: –incomplete genomes –the quality of alignment with the Anopheles genome is lower, especially for terminal exons –frequent intron insertion/loss (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles) Malko…Gelfand, 2006

Conservation of coding segments constitutive segments alternative segments D. melanogaster – D. pseudoobscura 97%75-80% D. melanogaster – Anopheles gambiae 77%~45%

Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes blue – exactgreen – divided exonsyellow – joined exon orange – mixedred – non-conserved retained introns are the least conserved (are all of them really functional?) mutually exclusive exons are as conserved as constitutive exons

Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes blue – exactgreen – divided exonsyellow – joined exons orange – mixedred – non-conserved ~30% joined, ~10% divided exons (less introns in Aga) mutually exclusive exons are conserved exactly cassette exons are the least conserved

Genome-specific AS: real or noise? young or deteriorating? minor isoforms, small inclusion rate often frameshifting and/or stop-containing => NMD –regulatory role? Sorek, Shamir & Ast, 2004

Alternative exon-intron structure in the human, mouse and dog genomes Human-mouse-dog triples of orthologous genes We follow the fate of human alternative sites and exons in the mouse and dog genomes Each human AS isoform is spliced-aligned to the mouse and dog genome. Definition of conservation: –conservation of the corresponding region (homologous exon is actually present in the considered genome); –conservation of splicing sites (GT and AG) Nurtdinov…Gelfand, 2007

Caveats we consider only possibility of AS in mouse and dog: do not require actual existence of corresponding isoforms in known transcriptomes we do not account for situations when alternative human exon (or site) is constitutive in mouse or dog functionality assignments (translated / NMD- inducing) are not very reliable

Gains/losses: loss in mouse Common ancestor

Gains/losses: gain in human (or noise) Common ancestor

Gains/losses: loss in dog (or possible gain in human+mouse) Common ancestor

Human-specific alternatives: noise? Conserved alternatives Triple comparison Human-specific alternatives: noise? Conserved alternatives Lost in dog Lost in mouse

Translated and NMD-inducing cassette exons Mainly included exons are highly conserved irrespective of function Mainly skipped translated exons are more conserved than NMD-inducing ones Numerous lineage-specific losses –more in mouse than in dog –more of NMD-inducing than of translated exons ~40% of almost always skipped (<1% inclusion) human exons are conserved in at least one lineage (mouse or dog)

Mouse+rat vs human and dog: a possibility to distinguish between exon gain and noise Nurtdinov…Gelfand, 2009

The rate of exon gain: decreases with the exon inclusion rate; increases with the sequence evolutionary rate Caveat: spurious exons still may seem to be conserved in the rodent lineage due to short time

Conserved rodent-specific exons and pseudoexons Estimation of “FDR” by analysis of conservation of pseudoexons intronic fragments with the same characteristics (length distribution etc.) apply standard rules to estimate “conservation” obtain the number (fraction) of rodent-specific exons that could be pseudoexons conserved by chance (brown) obtain the number (fraction) of real rodent-specific exons (dark green): ~50%, that is, ~15% of mouse-specific exons (the rest is likely noise)

Alternative donor and acceptor sites: same trends Higher conservation of ~uniformly used sites Internal sites are more conserved than external ones (as expected)

Evolution of (alternative) exon-intron structure in 11 Drosophila spp. Dana Dmel Dsec Dyak Dere Dpse Dmoj Dvir Dgri D. melanogaster D. sechelia D. yakuba D. erecta D. ananassae D. pseudoobscura D. mojavensis D. virilis D. grimshawi D. Pollard, D.willistonii D.persimilis

Gain and loss of alternative segments and constitutive exons Dmel Dsec Dyak Dere Dana Dpse Dmoj Dvir Dgri – 34. – – 13. – 0.6 – 5. – 0.2 ± 57. ± 1.0 Sample size 397 / – 134. – 1.1 – 24. – 1.2 – 14. – 1.6 – 40. – – 100. – – 37. – 8.7 – 57. – Dwil – 16. – Unique events per 1000 substitutions. Caveat: We cannot observe exon gain outside and exon loss within the D.mel. lineage Dper – 175. – 20.2 – 75. – 7.2

Gain and loss of alternative segments and constitutive exons Dmel Dsec Dyak Dere Dana Dpse Dmoj Dvir Dgri – 151. – – 68. – 1.4 – 72. – 0.4 ± 81. ± 1.3 Sample size 452 / – 233. – 1.8 – 33. – 2.9 – 40. – 2.1 – 83. – – 330. – – 164. – 11.7 – 272. – Dwil – 49. – Non-unique events per 1000 substitutions (Dollo parsimony) Dper – 408. – 27.6 – 238. – 9.8

Conserved alternative splicing in nematodes 92% of cassette exons from Caenorhabditis elegans are conserved in Caenorhabditis briggsae and/or Caenorhabditis remanei (EST-genome comparisons) –in minor isoforms as well –especially for complex events there is less difference between levels of AS (exon inclusion) in natural C.elegans isolates than in mutation accumulation lines (microarray analysis) => positive selection on the level of AS. Irimia…Roy, 2007; Barberan-Sohler & Zaler, 2008

Plants: little conservation of alternative splicing Arabidopsis thaliana – Oriza sativa (rice) Oriza sativa (rice) – Zea mays (maize) Few AS events are conserved (5% of genes compared to ~50% of genes with AS) the level of conservation is the same for translated and NDM isoforms Severing…van Hamm, 2009

Constitutive exons becoming alternative human-mouse comparison, EST data => 612 exons constitutively spliced in one species and alternatively in the other all are major isoform (predominantly included) analysis of other species (selected cases): ancestral exons have been constitutive characteristics of such exons (molecular evolution: Kn/Ks, conservation of intron flanks etc) are similar to those of constitutive exons Lev-Maor…Ast, 2007

Changes in inclusion rate orthologous alternatively spliced (cassette) exons of human and chimpanzee quantitative microarray profiling estimate the inclusion rate by comparison of exon and exon-junction probes => 6-8% of altertnative exons have significantly different inclusion levels Calarco…Blencowe, 2007

Sources of new exons exon shuffling and duplications –mutually exlusive exons exonisation: new exons, new sites –in repeats constitutive exons becoming alternative

Alternative splice sites: Model of random site fixation Plots: Fraction of exon- extending alternative sites as dependent on exon length –Main site defined as the one in protein or in more ESTs –Same trends for the acceptor (top) and donor (bottom) sites The distribution of alt. region lengths is consistent with fixation of random sites –Extend short exons –Shorten long exons

A natural model: genetic diseases Mutations in splice sites yield exon skips or activation of cryptic sites Exon skip or activation of a cryptic site depends on: –Density of exonic splicing enhancers (lower in skipped exons) –Presence of a strong cryptic nearby Av. dist. to a stronger site Skipped exons Cryptic site exons Non-mutated exons Donor sites Acceptor sites Kurmangaliev & Gelfand, 2008

Creation of sites acceptor sitesin exonin intron cryptic sites (mutations in the main site) 8829 new sites3278 Vorechovsky, 2006; Buratti…Vorechovsky, 2007 donor sitesin exonin intron cryptic sites (mutations in the main site) new sites46

MAGE-A family of human CT-antigens Retroposition of a spliced mRNA, then duplication Numerous new (alternative) exons in individual copies arising from point mutations Creation of donor sites

Improvement of an acceptor site

Exonisation of repeats early studies: 61 alternatively spliced translated exon with hits to Alu (no constitutive exons) 84% frame-shiting or stop- containing exonisation by point mutations in cryptic sites in the Alu consensus –studied in experiment both donor and acceptor sites recent studiy: 1824 human exons, 506 mouse exons –Alu, L1, LTR may generate completely new exons Sorek, Ast, Graur, 2002; Lev-Maor…Ast, 2003; Sorek…Ast, 2004; Sela…Ast, 2007 humanmouse unique1060 (Alu) 285 (B1, B2, B4, ID) MIR18127 L L21039 CR1120 LTP15572 DNA9311

Evolutionary rate in constitutive and alternative regions Human and mouse orthologous genes D. melanogaster and D. pseudoobscura Estimation of the d n /d s ratio: higher fraction of non-synonymous substitutions (changing amino acid) => weaker stabilizing (or stronger positive) selection

Human/mouse genes: non-symmetrical histogram of d n /d s (const. regions)– d n /d s (alt. regions) Black: shadow of the left half. In a larger fraction of genes d n /d s (alt) > d n /d s (const), especially for larger values

Concatenated regions : Alternative regions evolve faster than constitutive ones (*) in some other studies dN(alt)<dN(const): less synonymous substitutions in alternaitve regions dNdN dN/dSdN/dS dSdS dN/dSdN/dS dSdS dNdN 1 0

Weaker stabilizing selection (or positive selection) in alternative regions (insignificant in Drosophila) dN/dSdN/dS dNdN dSdS dN/dSdN/dS dSdS dNdN 1 0

Different behavior of terminal alternatives dN/dSdN/dS dSdS dNdN 1,5 0 Mammals: Density of substitutions increases in the N-to-C direction Drosophila: Synonymous substitutions prevalent in terminal alternative regions; non-synonymous substitutions, in internal alternative regions

Many drosophilas, different alternatives dN in mutually exclusive exons same as in constitutive exons dS lower in almost all alternatives: regulation?

Relaxed (positive?) selection in alternative regions

The MacDonald-Kreitman test: evidence for positive selection in (minor isoform) alternative regions Human and chimpanzee genome substitutions vs human SNPs Exons conserved in mouse and/or dog Genes with at least 60 ESTs (median number) Fisher’s exact test for significance Pn/Ps (SNPs)Kn/Ks (genomes)diff.Signif. Const – Major – % Minor % Minor isoform alternative regions: More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06% More non-synonym. substitutions: Kn(alt_minor)=.91% >> Kn(const)=.37% Positive selection (as opposed to lower stabilizing selection): α = 1 – (Pa/Ps) / (Ka/Ks) ~ 25% positions Similar results for all highly covered genes or all conserved exons

An attempt of integration AS is often species-specific young AS isoforms are often minor and tissue-specific … but still functional –although species-specific isoforms may result from aberrant splicing AS regions show evidence for decreased negative selection –excess non-synonymous codon substitutions AS regions show evidence for positive selection –excess fixation of non-synonymous substitutions (compared to SNPs) AS tends to shuffle domains and target functional sites in proteins Thus AS may serve as a testing ground for new functions without sacrificing old ones

What next? Changes in inclusion rates (mRNA-seq) –revisit constitutive-becoming-alternative exons Other taxonomical groups Evolution of regulation –donor and acceptor splicing sites –splicing enhabcers and silencers –cellular context (SR-proteins etc.) Control for: –functionality: translated / NMD-inducing (frameshifts, stop codons) –exon inclusion (or site choice) level: major / minor isoform –tissue specificity pattern (?) –type of alternative – 1: N-terminal / internal / C-terminal –type of alternative – 2: cassette and mutually exclusive exons, alternative sites, etc.

Acknowledgements Discussions –Eugene Koonin (NCBI) –Igor Rogozin (NCBI) –Vsevolod Makeev (GosNIIGenetika) –Dmitry Petrov (Stanford) –Dmitry Frishman (GSF, TUM) –Sergei Nuzhdin (USC) Support –Howard Hughes Medical Institute –Russian Academy of Sciences (program “Molecular and Cellular Biology”) –Russian Foundation of Basic Research

Authors Andrei Mironov (Moscow State University) Ramil Nurtdinov (Moscow State University) – human/mouse+rat/dog Dmitry Malko (GosNIIGenetika, Moscow) – drosophila/mosquito Ekaterina Ermakova (IITP) – Kn/Ks Vasily Ramensky (Institute of Molecular Biology, Moscow) – SNPs, MacDonald-Kreitman test Irena Artamonova (Inst. of General Genetics and IITP, Moscow) – human/mouse, plots, MAGE-A

Bonus track: conserved secondary structures regulating (alternative) splicing in the Drosophila spp. ~ introns 17% alternative, 2% with alt. polyA signals >95% of D.melanogaster introns mapped to at least 7 of 12 other Drosophila genomes Search for conserved complementary words at intron termini (within 150 nt. of intron boundaries), then align Restrictive search => 200 candidates 6 tested in experiment (3 const., 3 alt.). All 3 alt. ones confirmed

CG33298 (phopspholipid translocating ATPase): alternative donor sites

Atrophin (histone deacetylase): alternative acceptor sites

Nmnat (nicotinamide mononucleotide adenylytransferase): alternative splicing and polyadenylation

Less restrictive search => many more candidates

Properties of regulated introns Often alternative Longer than usual Overrepresented in genes linked to development

Authors Andrei Mironov (idea) Dmitry Pervouchine (bioinformatics) Veronica Raker, Center for Genome Regulation, Barcelona (experiment) Juan Valcarcel, Center for Genome Regulation, Barcelona (advice) Mikhail Gelfand (general pessimism)