Download presentation
Presentation is loading. Please wait.
Published byDiane Simpson Modified over 9 years ago
1
Coding Domain Sequence Prediction and Alternative Splicing Detection in Human Malaria Gambiae Jun Li 1, Bing-Bing Wang 2, Jose M. Ribeiro 3, Kenneth D. Vernick 1,4 1. Dept of Microbiology, University of Minnesota, St. Paul, MN. 2. Pioneer Hi-Bred International, Johnston, IA. 3. LMVR/NAID, NIH, MD. 4. UGGIV, Institut Pasteur, Paris, France
2
Introduction Nearly 2/3 of the worlds population are at risk for malaria 1.5 to 2.5 million children die annually A. gambiae is the major malaria vector Genome-wide research needs good CDS structure prediction and alternative splicing information. Current used A. gambiae CDS structures were predicted based on comparative algorithms that are too conserve. A lot of genes are missing. Comparative gene prediction algorithms also have problems in prediction of terminal exons, thus, >40% CDS predicted by this algorithm miss start and/or stop codons. The purpose of this work is to create a A. gambiae specific gene model, fix the incompletion of CDS, and provide the AS information.
3
Combinational Gene Prediction Algorithm Open-Reading-Frame -Selection Algorithm Gold gene set to train GlimmerHMM Exon-Gene-Union Algorithm Where x is the basepair, A is ab initio predicted CDS and P is comparative predicted CDS C is combinational CDS Union CDS Alternative Splicing Any internal Stop? No A frame spanning the whole region of Union CDS? No Multiple CDS found by comparative algorithm The longest transcript No CDS set Multiple CDS found by ab initio algorithm No Yes
4
Combinational algorithm improves single algorithm prediction Sensi- tivity Speci- ficity Com- plete Rate GlimmerH MM 95%90%100% ensembl92%99%60% Combi- national algorithm 96%99%95% Comparison of CDS structure from combinational algorithm and ensembl.
5
Alternative splicing detection in A. gambiae Est-aid AS detection algorithm AS distribution in A. gambiae Conclusion: 1512 CDS have alternative splicing, most of AS happened in CDS region which will enrich protein structure and function. Manual curation shows that the false positive (due to EST contamination) is low (10%). The AS type distribution indicated that mosquito is more close to plants than mammals. Align EST to genome, Processing alignments, extract exon/intron information Upload to MySQL DB Quality control, make EST cluster, merge introns and exons from individual alignments Compare intron/intron and intron/exon, find overlapping event, classify AS event.
6
Software package and web presentation The combinational CDS prediction and alternative splicing detection pipeline have been integrated into our open-source package (welcome collaboration). Results is also accessible through web.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.