Coding Domain Sequence Prediction and Alternative Splicing Detection in Human Malaria Gambiae Jun Li 1, Bing-Bing Wang 2, Jose M. Ribeiro 3, Kenneth D.

Slides:



Advertisements
Similar presentations
EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Genome analysis and annotation. Genome Annotation Which sequences code for proteins and structural RNAs ? What is the function of the predicted gene products.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Gene Finding Charles Yan.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Eukaryotic Gene Finding
Annotating genomes using proteomics data Andy Jones Department of Preclinical Veterinary Science.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Eukaryotic Gene Finding
Genome Annotation BCB 660 October 20, From Carson Holt.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
GeneWise and Artemis Exercises Spliced Alignment using GeneWise Click on the GeneWise hyperlink on the course links page,
1 The Interrupted Gene. Ex Biochem c3-interrupted gene Introduction Figure 3.1.
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
A markovian approach for the analysis of the gene structure C. MelodeLima 1, L. Guéguen 1, C. Gautier 1 and D. Piau 2 1 Biométrie et Biologie Evolutive.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
VectorBase BRC The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Genome Annotation Rosana O. Babu.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
The Havana-Gencode annotation GENCODE CONSORTIUM.
Mark D. Adams Dept. of Genetics 9/10/04
From Genomes to Genes Rui Alves.
Curation Tools Gary Williams Sanger Institute. SAB 2008 Gene curation – prediction software Gene prediction software is good, but not perfect. Out of.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
How can we find genes? Search for them Look them up.
Chapter 2 From Genes to Genomes. 2.1 Introduction We can think about mapping genes and genomes at several levels of resolution: A genetic (or linkage)
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
While replication, one strand will form a continuous copy while the other form a series of short “Okazaki” fragments Genetic traits can be transferred.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Chapter 3 The Interrupted Gene.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
(H)MMs in gene prediction and similarity searches.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Basics of Genome Annotation Daniel Standage Biology Department Indiana University.
Daphnia Genome Annotation & Analysis Notes July 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Annotating The data.
Introduction to Genes and Genomes with Ensembl
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
EGASP 2005 Evaluation Protocol
What is a Hidden Markov Model?
VectorBase genome annotation
Using RNA-seq data to improve gene annotation
EGASP 2005 Evaluation Protocol
PlantGDB: Annotation Principles & Procedures
Genome Editing with Apollo
Genome Annotation w/ MAKER
Introduction to Bioinformatics II
Ensembl Genome Repository.
Chapter 4 The Interrupted Gene.
Introduction to Alternative Splicing and my research report
Presentation transcript:

Coding Domain Sequence Prediction and Alternative Splicing Detection in Human Malaria Gambiae Jun Li 1, Bing-Bing Wang 2, Jose M. Ribeiro 3, Kenneth D. Vernick 1,4 1. Dept of Microbiology, University of Minnesota, St. Paul, MN. 2. Pioneer Hi-Bred International, Johnston, IA. 3. LMVR/NAID, NIH, MD. 4. UGGIV, Institut Pasteur, Paris, France

Introduction Nearly 2/3 of the worlds population are at risk for malaria 1.5 to 2.5 million children die annually A. gambiae is the major malaria vector Genome-wide research needs good CDS structure prediction and alternative splicing information. Current used A. gambiae CDS structures were predicted based on comparative algorithms that are too conserve. A lot of genes are missing. Comparative gene prediction algorithms also have problems in prediction of terminal exons, thus, >40% CDS predicted by this algorithm miss start and/or stop codons. The purpose of this work is to create a A. gambiae specific gene model, fix the incompletion of CDS, and provide the AS information.

Combinational Gene Prediction Algorithm Open-Reading-Frame -Selection Algorithm Gold gene set to train GlimmerHMM Exon-Gene-Union Algorithm Where x is the basepair, A is ab initio predicted CDS and P is comparative predicted CDS C is combinational CDS Union CDS Alternative Splicing Any internal Stop? No A frame spanning the whole region of Union CDS? No Multiple CDS found by comparative algorithm The longest transcript No CDS set Multiple CDS found by ab initio algorithm No Yes

Combinational algorithm improves single algorithm prediction Sensi- tivity Speci- ficity Com- plete Rate GlimmerH MM 95%90%100% ensembl92%99%60% Combi- national algorithm 96%99%95% Comparison of CDS structure from combinational algorithm and ensembl.

Alternative splicing detection in A. gambiae Est-aid AS detection algorithm AS distribution in A. gambiae Conclusion: 1512 CDS have alternative splicing, most of AS happened in CDS region which will enrich protein structure and function. Manual curation shows that the false positive (due to EST contamination) is low (10%). The AS type distribution indicated that mosquito is more close to plants than mammals. Align EST to genome, Processing alignments, extract exon/intron information Upload to MySQL DB Quality control, make EST cluster, merge introns and exons from individual alignments Compare intron/intron and intron/exon, find overlapping event, classify AS event.

Software package and web presentation The combinational CDS prediction and alternative splicing detection pipeline have been integrated into our open-source package (welcome collaboration). Results is also accessible through web.