Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor.

Slides:



Advertisements
Similar presentations
A very short introduction (in plants)
Advertisements

PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Ab initio gene prediction Genome 559, Winter 2011.
Profiles for Sequences
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
1 Alternative Splicing. 2 Eukaryotic genes Splicing Mature mRNA.
CSE182-L12 Gene Finding.
“Gene Finding in Novel Genomes” by Ian Korf Presented by: Christine Lee SoCAL BSI 2004.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Characterizing Alternative Splicing With Respect To Protein Domains BME 220 Project Charlie Vaske.
The Influence of Alternative Splicing in Protein Structure The fact that gene number is not significantly different between mammals and some invertebrates.
Lecture 12 Splicing and gene prediction in eukaryotes
Mutations Section 12–4 This section describes and compares gene mutations and chromosomal mutations.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering.
Figure S1_Yao Qin et al. Figure S1 Occurrence and distribution of trihelix family in different plant species. Red branches in the cladogram indicate that.
Click to edit Master title style Click to edit Master subtitle style CLICKER QUESTIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry,
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
RBP1 Splicing Regulation in Drosophila Melanogaster Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla This presentation available at
Pre-mRNA secondary structures influence exon recognition Michael Hiller Bioinformatics Group University of Freiburg, Germany.
How can we find genes? Search for them Look them up.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Splice Site Recognition in DNA Sequences Using K-mer Frequency Based Mapping for Support Vector Machine with Power Series Kernel Dr. Robertas Damaševičius.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Chapter 3 The Interrupted Gene.
Eukaryotic Genomes 11 November, 2005 Text Chapter 19.
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
WT#3#5#7#9#11#14#15#20#25#30 35S::JAZ13 Root length ratio * * * * * * * * * * Figure S2. Overexpression of native (untagged)
(H)MMs in gene prediction and similarity searches.
Finding genes in the genome
What is BLAST? Basic BLAST search What is BLAST?
Additional figures for Differential expression of 24,426 human alternative splicing events and predicted cis-regulation in 48 tissues and cell lines John.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Features of the genetic code: Triplet codons (total 64 codons) Nonoverlapping Three stop or nonsense codons UAA (ocher), UAG (amber) and UGA (opal)
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
What is BLAST? Basic BLAST search What is BLAST?
bacteria and eukaryotes
Using DNA Subway in the Classroom
Lesson Four Structure of a Gene.
Variation among organisms
Lesson Four Structure of a Gene.
Volume 88, Issue 5, Pages (March 1997)
Folabomi A. Oladosu, BS, William Maixner, PhD, DDS, Andrea G
PlantGDB: Annotation Principles & Procedures
Mark M Metzstein, H.Robert Horvitz  Molecular Cell 
Ab initio gene prediction
There are four levels of structure in proteins
Transcription.
Genome Editing with Apollo
Volume 12, Issue 2, Pages (July 2015)
Ensembl Genome Repository.
Secreted Fringe-like Signaling Molecules May Be Glycosyltransferases
Generalizations of Markov model to characterize biological sequences
Relationship between Genotype and Phenotype
Basic Local Alignment Search Tool
Volume 88, Issue 5, Pages (March 1997)
Introduction to Alternative Splicing and my research report
KIT Gene Deletions at the Intron 10−Exon 11 Boundary in GI Stromal Tumors  Christopher L. Corless, Laura McGreevey, Ajia Town, Arin Schroeder, Troy Bainbridge,
Mutation in pycr1a exon 3 disrupts predicted exonic splicing enhancers
Figure Genetic characterization of the novel GYG1 gene mutation (A) GYG1_cDNA sequence and position of primers used. Genetic characterization of the novel.
Retained introns in AA and EA cases.
Presentation transcript:

Progress report Yiming Zhang 02/10/2012

All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor site GYNGYN AltD Alternative both sites (AltP)

NAGNAG alternative splicing Figure 1. NAGNAG alternative splicing with E and I sites and isoforms. NAGNAG alternaive splicing can result in one of three possibilities (Figure 1) - constitutive use of the first acceptor (the so-called exonic, or “E” variant), constitutive use of the second acceptor (the so-called intronic, or “I” variant), or use of both acceptors, that is,alternative splicing (the “EI” variant). Sinha et al. 2010

GYNGYN alternative splicing Figure 2. GYNGYN alternative splicing with e and i sites and isoforms. Hilller et al. 2006

All introns ConstitutiveNAGNAG-ENAGNAG-IGYNGYN-eGYNGYN-I……AlternativeIntronRExonSAltANAGNAG-ei……AltDGYNGYN-ei……Multiple AS……Unclear

Intron statistics from ASIP ATBDGMLJMTOSPPPTSBSLVVTotal Cons.EST>=4all NAG-E NAG-I GYN-e GYN-i EST>=10all NAG-E NAG-I GYN-e GYN-i Alt.EST>=2IntronR AltA_all AltA_NAG AltD_all ALtD_GYN AltP ExonS Table 1. Intron statistics from ASIP. 4 species which have small amount of data are not listed here. All statistics are intron-based instead of event-based which means redundancy has been removed. The most common type of alternative intron type is IntronR, second common type is ExonS. NAGNAG AS occurs much more frequently in AltA than GYNGYN AS occurs in AltD.

Background NAGNAG alternative splicing which can insert or delete a single amino acid in the protein, is very common and well studied in animals. The NAGNAG motif is present in 30% of human genes and is functional in at least 5% of the genes. Hiller et al NAGNAG AS is frame-preserving, the vast majority of cases should lead to different proteins. Studies so far have found evidence of both cases where such proteins have variations in function, as well as those in which there is no noticeable difference. Akerman et al Iida et al The GO analyses in some studies shows genes with specific GO term DNA binding to be statistically significant and more than half of all AS-NAGNAG events affected polar amino acid residues. Iida et al Sinha et al. 2010

Background The studies of NAGNAG AS in plant is few right now (Only 3 species: Arabidopsis, Rice and Physcomitrala). One study found 321 and 372 AS-NAGNAG events in Arabidopsis and rice, respectively. Another study found 6% of all introns and 21% of all annotated genes in Arabidopsis harbor a genomic NAGNAG acceptor motif. Iida et al Schindler et al In addition, the GO analysis is agreed with previous study in human that the specific GO term DNA binding is statistically significant. Some study indicates that NAGNAG acceptors frequently occur in the Arabidopsis genome and are particularly prevalent in SR and SR-related protein-coding genes. Sinha et al 2010 Schindler et al. 2008

Background The state-of-the-art in silico studies for prediction of NAGNAG splice site are done by Sinha's group for both human and plant species. They achieved high balanced specificity and sensitivity for both human and plant species. The most informative features they found are the nucleotides in the NAGNAG and in its immediate vicinity, along with the splice sites scores. The model they trained on human data also can achieve high AUC on plant data shows that NAGNAG splicing in plants is similar to that in animals. Sinha et al. 2009, 2010

NAGNAG dataset I tried to predict NAGNAG events (thus to predict EI, I or E isoforms) based on the dataset I generated from ASIP using Random Forest. Strict criteria has been used to identify NAGNAG events from ASIP database: For E and I events, at least 10 ESTs or cDNAs support them, and for EI events at lease 2 EST or cDNA support each isoform. After removing redundancy, I got 458 EI form alternative NAGNAG introns, 1988 E form constitutive introns and 685 I form constitutive introns in 15 plant species.

Features Figure 3. A total of 28 features which each represented a nucleotide, and thus had four possible values (A, C, G, T). U1, U2, U3 are the first three nucleotides in the upstream exon. D1, D2, D3 are the first three nucleotides in the downstream exon. A weak polypyrimidine tract (PPT) can contribute to AS. So P1-P20 are PPT upstream of NAGNAG. Finally, I also use intron length as an additional feature.

Classifier evaluation Random Forest with 200 trees has been used and 5 fold cross validation has been applied. TP rateFP ratePrecisionRecallF-measureROC areaClass E I EI The evaluation results strongly agree with Sinha’s paper (For Physcomitrella) in which AUC = 0.96, 0.99 and 0.98 for the EI, E and I forms, respectively.

Figure 4. The EI class, or AS, harder to predict (AUC = 0.967) than the two constitutive variants, E and I (AUC = for both).

Most informative features Figure 5. Most informative features according to information gain.

Sequence Logos Figure 6a. Figure 6b.

Figure 6c. Figure 6d. Figure 6a-6d. Sequence logos of NAGNAG splice sites. 6a: E sites; 6b: I sites; 6c: EI sites; 6d: all splice sites. Position 1-3 is U1-U3. Position 4-24 are P20-P1. Position are D1-D3.

Conclusion NAGNAG-AS can be predicted with high accuracy. Using carefully constructed training and test datasets, an in silico performance of AUC = 0.967, and was achieved for the EI, E and I forms, respectively. The most informative features are the nucleotides in the NAGNAG and in its immediate vicinity. NAGNAG AS in plants is similar to that in animals and is largely dependent on the splice site and its immediate neighborhood.