Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.

Slides:



Advertisements
Similar presentations
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Advertisements

Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
Centers of Excellence for Influenza Research and Surveillance 6 th Annual Meeting Aug 1, 2012 Status of IRD Development.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Comparative genomics Joachim Bargsten February 2012.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Introduction to Computational Biology Topics. Molecular Data Definition of data  DNA/RNA  Protein  Expression Basics of programming in Matlab  Vectors.
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Genome Annotation BCB 660 October 20, From Carson Holt.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
Editing the Gene Ontology Midori A. Harris GO Editorial Office EBI, Hinxton, UK.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
NGS Bioinformatics Workshop 1.4 Tutorial - Comparative Sequence Analysis and Visualization March 29th, 2012 IRMACS Facilitator: Richard Bruskiewich.
BIOINFORMATIK I UEBUNG 2 mRNA processing.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
VectorBase BRC The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,
+ => Bioinformatics: from Sequence to Knowledge Outline: Introduction to bioinformatics The TAU Bioinformatics unit Useful bioinformatics issues and databases:
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Genomics of Microbial Eukaryotes Igor Grigoriev Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Genome Annotation Rosana O. Babu.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
The Public Face of TAIR User Interface Design Responsiveness to User Input.
Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
Bioinformatics and Computational Biology
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Large-scale Prediction of Yeast Gene Function Introduction to Bio-Informatics Winter Roi Adadi Naama Kraus
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
S. pombe Unicellular archiascomycete Diverged from S. cerevisiae Ma Size ~14 Mb, 3 chromosomes No synteny Data stored in GeneDB.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Accessing and visualizing genomics data
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Protein Evolution Introducing the use of Biology Workbench as a Bioinformatics Tool.
Information Representation Working Group WG Meeting September 5, 2008.
Daphnia Genome Annotation & Analysis Notes July 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University
Web Apollo/JBrowse • JBrowse is a web based genome browser
VectorBase genome annotation
Sequence based searches:
P-POD-PANTHER: update
Pipelines for Computational Analysis (Bioinformatics)
Genome Annotation Continued
Visualization of genomic data
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Strategies for annotation of a genome
Ensembl Genome Repository.
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
1. C. briggsae sequence curation 2. SNP data handling
Follow-up from last night: XSEDE credits
Part II SeqViewer AraCyc Help
Welcome - webinar instructions
Presentation transcript:

Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015

Outline ●Gene annotation o Gene automatic annotations o Gene manual annotation and metadata o Basics: A good vs a bad gene model o Why do we need gene manual annotations and gene metadata? ●Why did we replace the Community Manual Annotation (CAP) with Web Apollo (WA)? o Offline vs. online o Advantages vs disadvantages ●How do we interact with WA developers and outreach representatives? ●How do we get the community to submit data?

Gene annotation

VectorBase gene “automatic” annotations gap 100 Ns Scaffolds or Supercontigs mapping (Optional. Not possible with bioinformatics, must be experimental) Gene prediction: evidence based (BLAST), Ab initio (SNAP), experimental evidence (ESTs, RNAseq, protein or peptide sequencing)

Gene “manual” annotation and metadata

Gene manual annotation and metadata

Metadata -VectorBase gene ID (e.g., AGAP000002) -Organism (species) (e.g., Anopheles gambiae) -Symbol (e.g., para) -Synonym (e.g., kdr, VSC) -Description (e.g., voltage-gated sodium channel) -Comments/notes (e.g., truncated gene, other part on scaffold xxx)

Why do we need gene manual annotations and gene metadata?

Genome Browser: Gene Page

-Homologs and Phylogenetics -Ontology -Variation (e.g., Single Nucleotide Polymorphisms, SNPs) Why do we need gene manual annotations and gene metadata? For downstream analyses of gene(s), gene families or genomes such as:

Homologs and Phylogenetics -wrong assignment of orthologs and paralogs -gene alignment ---> tree -wrong inference evolutionary relationships between genes or species -branches with a wrong length, could lead to misleading lineages changes over time (the longest the branch the larger the amount of change) -wrong estimates about the ancestral and derived states, genes or species -wrong taxonomic interpretations

Ontology GO: biological process (ion transport, sodium i.t., transmembrane transport ) GO: molecular function (ion channel activity, voltage-gated sodium channel activity, calcium ion binding) GO: cellular component (voltage-gated sodium channel, membrane)

Variation (e.g., Single Nucleotide Polymorphisms, SNPs)... T T A T T T... SNP L 1014 F Leucine ---> Phenylalanine Hypothetical example: -User is interested in gene “x” -They download this gene from VB -Start analyses -Finds/reports the presence/absence of the SNP -If the gene of interest is not correctly annotated, e.g., missing an exon or part of an exon, results are going to be wrong

* *

-The size of the genomes -The phylogenetic distance among genomes Number of genomes (genome size): -VB: 37 (110 Mbp – 3,000 Mbp) -EuPathDB: 186 (2 Mbp – 193 Mbp) -PATRIC: 3,481 Bacteria & 186 Archaea (10 kbp – 14 Mbp) -ViPR: 546,381 & IRD: 365,618 (few kbp – 250 kbp)

Why did we replaced the Community Manual Annotation (CAP) with Web Apollo?

Offline vs. online curation Community Manual Annotation (CAP) Web Apollo gene models RNAseq User-created Annotations

Advantages & Disadvantages Community Manual Annotation (CAP) -People had to use Artemis or (Desktop) Apollo: requires downloading scaffolds or supercontigs from VB -VB gene updates can take 2 months or more → more than one person working on the same gene -Most of the time our internal GFF3 validator found issues with submitted data files. Web Apollo -Is web-based, which allows easier collaboration -There is not, however, a clear way to indicate/know when a user is “still working” or “done” with an annotation. -New annotations though are instantaneously visualized by all users of WA.

How do we interact with Web Apollo developers and outreach representatives? -Developers: ○Monthly WA developers open conference call ○ -Outreach: ○Meetings, workshops and conferences ○ or phone We are also subscribed to their user list (help desk).

How do we get the community to submit data? -First invitation comes from genome leaders directly (genome paper) -Users send s to our help desk -During outreach events, such as workshops, meetings and conferences -Social media post (Facebook and Twitter) -Help content: Tutorial page

Genome group manual annotation efforts -Workshops -Annotation jamborees -Webinars -Independent work

Help content: Tutorial page -Decision tree -FAQs -Web Apollo resources: user guide, slides with speaker notes, sample exercises -Documentation about available tracks -Video tutorial (Intro, ~ 50 min) and a video clip (Intron/exon boundaries ~2:45 min)

User’s submission stats and Importation of data to VectorBase To be continued by Daniel Lawson...