Download presentation
Presentation is loading. Please wait.
Published byMercy Ellis Modified over 9 years ago
1
Importing Community annotations into VectorBase
2
Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry requirements, be scaleable and (relatively) simple to use
3
Genome annotation First-pass genome annotation is almost always based on “automatic” computational approaches ab initio Similarity based Transcript (ESTs, RNAseq) Protein (nr protein database)
4
Genome assembly Map Repeats Genefinding Protein-coding genes Map Transcripts Map Peptides nc-RNAs Functional annotation Submission to archival databases (Release) Genome annotation - building a pipeline
5
Current VectorBase annotation pipeline MAKER based automatic annotation includes SNAP training and ab initio RNAseq based transcript similarity prediction Taxonomically constrained peptide similarity prediction 2 rounds of prediction refinement & final round includes all peptide similarity Community annotation phase Capture gene structure changes Metadata associated with locus (symbol, description, citation) Submission to INSDC, propagation to UniProt Presentation through VectorBase Start 1.0 set (automatic) 1.1 set (published)
6
Processing submissions 4 phases Capture Moderation Storage Integration
7
Capture: Community annotation decision tree
8
Community annotation decision tree
9
Tool of choice: WebApollo Web-based Eliminates main drawback of deprecated CAP system - GFF3 format validation
10
WebApollo example
11
Community annotation decision tree
13
Tool of choice: Web forms
14
Moderation & Storage Gene metadata captured through forms to spreadsheets Batch submissions use similar spreadsheet format
15
Integration: Dataflow for ‘patch’ build CAP GFF3 WebApollo Reference core Updated geneset TXT Patch Users Stable IDs Reports Updated core IDs Reference core CAP Release core Google Fusion Table Xrefs Release Xrefs Google Form ` Metadata Users } Commit
16
Presentation of community annotation
17
Usage (as of 2015-03-30) 31 WebApollo instances (Organisms) 3,407 gene models Gene metadata (protein-coding loci) 4,987 gene symbols 512 gene synonyms 57,878 gene descriptions 910 loci citations from 208 publications
18
Supplementing annotations Community jamboree’s ‘Standard’ improvement (e.g. Sandfly, snail communities) Glossina community (e.g. March 2015, Kenya) VectorBase Default Xref run includes symbol/description assignment via UniProt Projection of gene description via orthology from key marker species (e.g. An. gambiae). Due to be deployed for June (VB-2015-06) release. Supplemental data from genome papers (e.g. 16 Anopheles spp, Musca)
20
Deprecated CAP system example
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.