Introduction In higher eukaryotes splicing of pre-mRNA occurs with a help of at least two different major (U2) and minor (U12) spliceosomes. Introns, spliced.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

RNAseq.
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Transcriptome Sequencing with Reference
Basics of Comparative Genomics Dr G. P. S. Raghava.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Comparative ab initio prediction of gene structures using pair HMMs
Alternative splicing and evolution Daniel Jeffares.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Protein Modules An Introduction to Bioinformatics.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
Similar Sequence Similar Function Charles Yan Spring 2006.
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
Lecture 12 Splicing and gene prediction in eukaryotes
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor.
Todd J. Treangen, Steven L. Salzberg
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
EXPLORING DEAD GENES Adrienne Manuel I400. What are they? Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Assignment 2: Papers read for this assignment Paper 1: PALMA: mRNA to Genome Alignments using Large Margin Algorithms Paper 2: Optimal spliced alignments.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
By Zemin Ning & Adam Spargo Informatics Division The Wellcome Trust Sanger Institute The SSAHA2 Application Pack.
CDS predictions using DOGFISH-C David Carter Wellcome Trust Sanger Institute 6th May 2005.
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Sackler Medical School
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
SHI Meng. Abstract Changes in gene expression are thought to underlie many of the phenotypic differences between species. However, large-scale analyses.
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
Mark D. Adams Dept. of Genetics 9/10/04
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Using blast to study gene evolution – an example.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
AceView Danielle and Jean Thierry-Mieg NCBI = global annotation of the whole human genome ● Restricted to the Gencode Regions ●
Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment Raja Jothi, Teresa.
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
bacteria and eukaryotes
Basics of Comparative Genomics
Ab initio gene prediction
Ensembl Genome Repository.
Chapter 4 The Interrupted Gene.
lincRNAs: Genomics, Evolution, and Mechanisms
Joseph Rodriguez, Jerome S. Menet, Michael Rosbash  Molecular Cell 
Volume 128, Issue 6, Pages (March 2007)
Beth Elliott, Christine Richardson, Maria Jasin  Molecular Cell 
Basics of Comparative Genomics
Introduction to Alternative Splicing and my research report
Volume 11, Issue 7, Pages (May 2015)
Evolutionary Fates and Origins of U12-Type Introns
Presentation transcript:

Introduction In higher eukaryotes splicing of pre-mRNA occurs with a help of at least two different major (U2) and minor (U12) spliceosomes. Introns, spliced by U12 spliceosome, are rare (<0.5%) and thus, are commonly ignored by the majority of gene prediction and annotation pipelines. However, some well- known disease-related genes such as huntingtin and PTEN contain one or more U12 introns making determination of their precise gene structure challenging. Slower rate of U12 spliceosome processing is thought to contribute to regulation of gene expression. U12 spliceosome, composed of U11, U12, U4atac, U5 and U6atac small nuclear ribonucleoproteins (snRNPs), surprisingly resembles U2 spliceosome in structure and function; however, they seem to evolve independently of each other. U12 spliceosome was initially discovered to operate on AT-AC introns [1,2]. Later, it was shown that GT-AG introns are in fact its major substrate. Sequencing of U11 and U12 snRNAs confirmed that U12 donor (5'-[AG]TATCCTT) and U12 branch point (TCCTTAAC) consensus sequences are remarkably distinct from relatively variable U2 splice sites. The evolution of U12 and U2 introns represents an interesting case study with implications to all gene structures. Burge at al [3] suggested that comparison of orthologous genes from different species could produce the following outcomes: intron conservation, GT-AG and AT-AC subtype conversion, U12/U2 intron conversion and a loss of an intron. We focused our attention mostly on U12-type introns and also introduced the analysis of U12/U2 introns in paralogous genes. We mapped all available human, mouse, chicken and zebrafish ESTs/cDNAs with high accuracy to the corresponding genomes using our new fast algorithm implemented in ssahaEST allowing refined splice site analysis of the genome structure. In this work we focused on detection and evolution of U12 introns in the four eukaryotic genomes. [1]Jackson IJ (1991) Nucleic Acids Res. 19: [2]Hall SL & Padgett RA (1994) J. Mol. Biol. 239: [3]Burge C, Padgett RA & Sharp PA (1998) Molecular Cell 2: [4]Ning Z, Cox AJ and Mullikin JC (2001) Genome Research 11: [5]Levine A & Durbin R (2001) Nucleic Acids Res. 29: [6]Zhu W & Brendel V (2003) Nucleic Acids Res. 31: [7]Abril JF, Castelo R & Guigo R (2005) Genome Res. 15: References COMPARATIVE ANALYSIS of U12 INTRONS Nikolai V. Ivanov, Zemin Ning and Richard Durbin The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SA, UK Discussion I.Analysis of the overall splice site variation in four eukaryotic genomes We have used ssahaEST to map ~11.6 million ESTs/cDNAs from four organisms: human, mouse, chicken and zebrafish to their corresponding genomes using U2 and U12 splice site models. Table 1 shows the outcome of this experiment. All intron counts represent non-redundant introns uniquely mapped to the genome where only one occurrence of intron start and end is taken into account. Not surprisingly, the majority of the introns (>99%) belongs to U2-type introns. We found no significant differences between the splice site matrices from one species to another. In all four sets, GT-AG introns were the dominant (~70%) U12 subtype and AT-AC introns were the minor (~30%) U12 subtype. Out of 404 human U12-type introns reported previously [5] we were able to identify 368 U12-type introns (260 GT-AG and all 108 AT-AC subtypes). II. Cross genome comparison of U12-type introns in four eukaryotic genomes Mapping of 6883 homologous transcripts to four eukaryotic genomes resulted in identification of 90 human, 115 mouse, chicken and zebrafish U12-type introns, for which we can look at homologues. Comparison of human and mouse genes containing these introns is shown in Table 2. Approximately half (53) of the introns were conserved in intron position and remained a U12- type. In this set we have found no examples of GT-AG and AT-AC subtype conversion. Surprisingly, most of the examples listed as U12/U2-type conversion have not been conserved at the position of the intron and therefore, could be considered as a loss of U12 and gain of U2-type intron at a different position. We found some interesting examples of a true U12/U2 type conversion in paralogous genes (Table 3). However, these cases are hard to quantify due to lack of an appropriate database for paralogous genes. Materials and Methods Expressed sequence tags (ESTs) were downloaded from the NCBI dbEST database (July 8 th 2005 release, ftp://ftp.ncbi.nih.gov/repository/dbEST/) for Homo sapiens (~6.1x10 6 ), Mus musculus (~4.3x10 6 ), Danio rerio (~0.63x10 6 ) and Gallus gallus (~0.55x10 6 ). Files containing large numbers of FastA formatted sequences were split into files of manageable size (~0.6x10 6 ). Alignment of the ESTs to corresponding genomes of H. sapiens (NCBI35), M. musculus (NCBI_m34), D. rerio (WTSI Zv5) and G. gallus (WashU ver. 1) was performed using newly developed ssahaEST program on an SGI Altix machine equipped with 16 IA Ghz processors. ssahaEST combines a fast algorithm for k-mer positioning implemented in SSAHA program [4] and an implementation of the banded Smith-Waterman-Gotoh algorithm from phrap/cross_match package [Phil Green] with high-scoring pair (HSP) clustering and accurately trained splice site models for U2-type and U12-type introns. An intron was classified as a U12-type based on thresholds for individual scores for donor, branch point, and acceptor as well as the branch-to- acceptor distance (<50) derived from the training set 1. This set included introns that were experimentally confirmed to be spliced by U12 spliceosome and orthologous genes from closely related genomes. Matrices for U2 and U12 splice sites were generated using ML method with pseudo counts. The score and length thresholds were derived from a training set 2 compiled from the 368 human U12 introns described by Levine & Durbin [5]; 36 U12-type introns were removed as they did not fit our splice site model for the U12-type intron and had patterns different from those in training set 1 (Figure 1); thus, we cannot be confident they are true U12 introns. Similar U12-type intron definitions were described previously [3, 5, 6]. For comparative studies of U12-type introns between four eukaryotic genomes we have remapped 6883 EnsEMBL (ver. 32) homologous genes to the four corresponding genomes and analysed introns homologous in one genome to the U12-type introns in the other genome. We considered only those introns that were adjacent to the conserved exons. Availability: Results 1. We have developed a fast and accurate method for mapping ESTs/cDNAs in finished eukaryotic genomes and for studying gene structure. 2. We have found ~800 U12-type introns in human and mouse genomes and ~400 U12-type introns in chicken and zebrafish genomes. U12 introns seem to constitute ~ 0.3% of all introns. 3. Our study shows that U12/U2-type conversion between homologous introns of the four eukaryotic genomes most likely occurs by loss/gain mechanism with a change in position of the intron. A true conversion was observed only in cases of paralogous genes. Our approach to splice site analysis differs from that of the previous work [5] as we are now able to map all ESTs/cDNAs to the best unique location on the genome avoiding potential ambiguity in splice site confirmation. It should be noted that due to very low frequency of U12-type intron occurrence, we had to make highly specific matrices for different subtypes of U12, thus, leading to potentially lower sensitivity of the method and consequent underestimation. Despite this, the number of U12-type introns in the human genome has doubled compared to the previous work [5], mainly because of the increase in number of human ESTs and improvement in quality of human genome assembly over the last four years. Table 1 shows two major trends found in the first part of the analysis. One is that the total number of non-redundant introns correlates with the length of the sequenced portion of the genome. The other is that the fraction of U12-type introns is ~0.3% of all four species, although it is significantly larger in chicken and zebrafish than in mammalian genomes, indicating that there is some intron type turnover. Comparison of homologous genes containing U12-type introns between human and mouse showed that ~50% of U12 introns are being converted to a different type. Although this trend is significantly higher than the one described by Abril et al [7], the conversion results in introns in close but different positions indicating potential loss/gain mechanism as apposed to replacement. True conversion was observed in a few cases of paralogues (Table 3), however, the study is hampered the lack of reliable database of paralogous genes. U2 intron matrices U12 intron matrices Donor Donor Branch point Acceptor Acceptor Figure 1. Selection of thresholds for U2-type and U12-type donor site definitions. Conclusion HUMAN ENSG (47)ACGgtaagaaagtgccctggacttggtg ctgatgggaccctctttgctggcagGTG8427(110)U2 MOUSE ENSMUSG (51)AAGGTTAgtatccttggtgcgatatgct ctgattggactctttttgctgtcagGTG8637(110)U12 CHICKENSGALG (47)CAGgtaagtatagcctgatctgcttctc atgcttggatttttctttcactcagGTT13544(110)U2 ZFISHENSDARG (47)CAGgtagtgggaccaatctcgcactacg tccattgttttggtgtatttggcagACA14319(110)U2 ========================================================================================================================================= HUMANENST (97)CGTgtatcctttgcctgctggctgacca gaatgaccttaatctggggttctagCCA17194(109)U12 MOUSEENSMUST (97)CGTgtatcctttgcctgctgcctggtgc aaatggccttaatctgtggttctagTCA11804(106)U12 CHICKENSGALT (97)CCTgtatcctttgcagtctgaacccttc aaatgaccttaatctatcattttagCCA3144(109)U12 ZFISHENSDART (97)TATgtatctttttacattttcagctttt ttatatccttgattctctcttgaagTCA29524(109)U12 HUMANENST (97)AAGgtaccgtgcagcaaagtccagatat tgtgcttttcttttgcattctgaagGCA20284(109)U2 MOUSEENSMUST (97)CAGgtaggtgcagccaagtccagttagg tgtgcttctcttttgcattctgaagGCA34964(109)U2 ENSMUST (97)CAGgtacctgccctcaccagcaggcttg ttacttcaccacatgaactttgaagTCA7864(109)U2 ENSMUST (97)CCAgtatcctttgcactgcctggctatg ttctctttacataagcccatcacagCCA24574(109)??? CHICKENSGALT (97)ACGgtatccttaacaagaagtctggaaa ctttattcaccttaatgttccaaagACA11994(109)U12? ZFISHENSDART (97)CAGgtattacagtcttcattttactcca agaagcattttttttctctgtttagTCA3014(109)U2 FUGUSINFRUT (97)CACgtatccttgcaacagctggtggcct agcaccgaccttcactcagcattagACA1404(109)U12 Table 3. An example of U12/U2-type loss/gain and true U12/U2-type conversion paralogues