Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015
Outline ●Gene annotation o Gene automatic annotations o Gene manual annotation and metadata o Basics: A good vs a bad gene model o Why do we need gene manual annotations and gene metadata? ●Why did we replace the Community Manual Annotation (CAP) with Web Apollo (WA)? o Offline vs. online o Advantages vs disadvantages ●How do we interact with WA developers and outreach representatives? ●How do we get the community to submit data?
Gene annotation
VectorBase gene “automatic” annotations gap 100 Ns Scaffolds or Supercontigs mapping (Optional. Not possible with bioinformatics, must be experimental) Gene prediction: evidence based (BLAST), Ab initio (SNAP), experimental evidence (ESTs, RNAseq, protein or peptide sequencing)
Gene “manual” annotation and metadata
Gene manual annotation and metadata
Metadata -VectorBase gene ID (e.g., AGAP000002) -Organism (species) (e.g., Anopheles gambiae) -Symbol (e.g., para) -Synonym (e.g., kdr, VSC) -Description (e.g., voltage-gated sodium channel) -Comments/notes (e.g., truncated gene, other part on scaffold xxx)
Why do we need gene manual annotations and gene metadata?
Genome Browser: Gene Page
-Homologs and Phylogenetics -Ontology -Variation (e.g., Single Nucleotide Polymorphisms, SNPs) Why do we need gene manual annotations and gene metadata? For downstream analyses of gene(s), gene families or genomes such as:
Homologs and Phylogenetics -wrong assignment of orthologs and paralogs -gene alignment ---> tree -wrong inference evolutionary relationships between genes or species -branches with a wrong length, could lead to misleading lineages changes over time (the longest the branch the larger the amount of change) -wrong estimates about the ancestral and derived states, genes or species -wrong taxonomic interpretations
Ontology GO: biological process (ion transport, sodium i.t., transmembrane transport ) GO: molecular function (ion channel activity, voltage-gated sodium channel activity, calcium ion binding) GO: cellular component (voltage-gated sodium channel, membrane)
Variation (e.g., Single Nucleotide Polymorphisms, SNPs)... T T A T T T... SNP L 1014 F Leucine ---> Phenylalanine Hypothetical example: -User is interested in gene “x” -They download this gene from VB -Start analyses -Finds/reports the presence/absence of the SNP -If the gene of interest is not correctly annotated, e.g., missing an exon or part of an exon, results are going to be wrong
* *
-The size of the genomes -The phylogenetic distance among genomes Number of genomes (genome size): -VB: 37 (110 Mbp – 3,000 Mbp) -EuPathDB: 186 (2 Mbp – 193 Mbp) -PATRIC: 3,481 Bacteria & 186 Archaea (10 kbp – 14 Mbp) -ViPR: 546,381 & IRD: 365,618 (few kbp – 250 kbp)
Why did we replaced the Community Manual Annotation (CAP) with Web Apollo?
Offline vs. online curation Community Manual Annotation (CAP) Web Apollo gene models RNAseq User-created Annotations
Advantages & Disadvantages Community Manual Annotation (CAP) -People had to use Artemis or (Desktop) Apollo: requires downloading scaffolds or supercontigs from VB -VB gene updates can take 2 months or more → more than one person working on the same gene -Most of the time our internal GFF3 validator found issues with submitted data files. Web Apollo -Is web-based, which allows easier collaboration -There is not, however, a clear way to indicate/know when a user is “still working” or “done” with an annotation. -New annotations though are instantaneously visualized by all users of WA.
How do we interact with Web Apollo developers and outreach representatives? -Developers: ○Monthly WA developers open conference call ○ -Outreach: ○Meetings, workshops and conferences ○ or phone We are also subscribed to their user list (help desk).
How do we get the community to submit data? -First invitation comes from genome leaders directly (genome paper) -Users send s to our help desk -During outreach events, such as workshops, meetings and conferences -Social media post (Facebook and Twitter) -Help content: Tutorial page
Genome group manual annotation efforts -Workshops -Annotation jamborees -Webinars -Independent work
Help content: Tutorial page -Decision tree -FAQs -Web Apollo resources: user guide, slides with speaker notes, sample exercises -Documentation about available tracks -Video tutorial (Intro, ~ 50 min) and a video clip (Intron/exon boundaries ~2:45 min)
User’s submission stats and Importation of data to VectorBase To be continued by Daniel Lawson...