The new VectorBase: our improved resource for invertebrate vectors Scott Emrich On behalf of VectorBase “bigger, better, faster” Or “ "consolidate, improve.

Slides:



Advertisements
Similar presentations
ASIAES Project Overview Satellite Image Network for Natural Hazard Management in ASEAN+3 region Pakorn Apaphant Geo-Informatics and Space Technology Development.
Advertisements

SRI International Bioinformatics Comparative Analysis Q
Vector Epidemiology Data Gloria I. Giraldo-Calderon March 31, 2015.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
November 2007BRC5 Bethesda Variation data in VectorBase Dan Lawson, VectorBase EMBL-EBI.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
NGS Analysis Using Galaxy
GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research
Portal User Group Meeting September 14, Agenda Welcome Updates Reminders.
EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI.
Dan Lawson, All Sites VectorBase Releases. 2 VectorBase 2012 A release cycle for VectorBase Regular release every 2 months In place since June 2010 Latest.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics.
RDA Wheat Data Interoperability Cookbook and last developments 9 th March 2015, San Diego.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
Event-Based Model for Reconciling Digital Entries Thesis Proposal Ahmet Fatih Mustacoglu 10/3/20151Ahmet.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
VectorBase PopBio Introduction NIH/NIAID VectorBase site visit March 2015.
Welcome to the Manage Scoping module of the “MIP Release 3 Study Workflow Training” course! This module guides you through the process of managing the.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Open Terminology Portal (TOP) Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
VectorBase BRC The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,
Vectorbase and Galaxy Jarek Nabrzyski On behalf of VectorBase Center for Research Computing University of Notre Dame VectorBase Bioinformatics Resource.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
CaIntegrator2 – Part 1: Create a Study with Clinical Data Fan Lin, Ph. D Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute.
Map-based Exploration of Population Biology Data in VectorBase What is VectorBase? We are a consortium of institutions that hosts the genomes of invertebrate.
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Overview and History of VectorBase Frank Collins March 31, 2015.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
ALA Annual Meeting Claire Cocco Global Product Manager CONTENTdm Users Group June 30th, 2008.
Digital Library Syllabus Uploader Will Cameron CSC 8530 Fall 2006 Presentation 1.
PDS4 Demonstration Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Accessing and visualizing genomics data
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
Local ICTS Mirror of UCSC Genome Browser Local ICTS Mirror of UCSC Genome Browser Lucas Van Tol: Gi-yung Ryu:
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Denise Carvalho-Silva Ensembl Outreach
Hub Updates for Year 3 Carl Kesselman.
VectorBase genome annotation
The EBI Search RESTful API
University of Pittsburgh
Comparative Analysis Q
Intermountain West Data Warehouse
Ensembl Genome Repository.
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Follow-up from last night: XSEDE credits
Case Study 1 By : Shweta Agarwal Nikhil Walecha Amit Goyal
David Cleverly – Development Lead
Presentation transcript:

The new VectorBase: our improved resource for invertebrate vectors Scott Emrich On behalf of VectorBase “bigger, better, faster” Or “ "consolidate, improve and rationalise” (UK)

Full release Pre-released* Organism pages Raw GenBank data from sequencing centers VectorBase has been mostly a collator of g enomes * 3 * * * (our) Annotation

Rapid growth, however, in past 5 years 6 #

VectorBase is also: A service providing tools for browsing and mining vector “-omics” data A content generator – Mostly genome annotation (later talks) Committed to regular releases (5-6 per year) A help desk to help our community on genome informatics and are responsible for facilitating data submission 4

In the end, VectorBase is a t eam 5 And YOU!

Left side: Welcome message Available d ata Tools and Resources Right side: Past jobs Organisms (2) Latest news 7

Left side: Community Right side: Rotating tips Newsletters Upcoming meetings 8

11 This is the new organism page: Collects strain, data, and relevant tools

~ jobs per month Mostly Anopheles but other species

Web development goals (2015) Patching/ upgrading webApollo instances (1) – multiple genomes in one instance – reworked framework to improve performance Integrating subcontractor work with Drupal CMS (2) – Easier releases and better cross site development Sitewide authentication for single user accounts – Drupal – Web Apollo – Galaxy

Modifying webApollo example

Advanced Search Antelmo (ND) is making Advanced Search more stable and intuitive via Drupal and SOLR -> Also allows looking at saved search, for advanced analysis of BRC usage -> Now running 4.x SOLR to further support PopBio

Current VectorBase variation + PopBio dataflows. VCF ISA-TAB Sample + variation set ids Ensembl variation database PopBio Display of variant data in genomic context Display of detailed sample metadata, e.g. geodata

Use of Apache Solr to provide unified search (and thus integration) across the BRC VCF Ensembl variation database PopBio Display of variant data in genomic context Display of detailed sample metadata, e.g. geodata ISA-TAB

PopBio import Current size: 121 projects, samples, 172,636 assays (of which 4,387 are IR) At present loading can be done overnight, but this may change Web interface is not slow due to “pre loading,” which definitely isn’t scalable

PopBio plans Map interface: delivery June release + Kolymbari + ICEMR meetings Spreadsheet submission wizard development scheduled for Fall Year 2: Sample x genotype browser development, including e! REST and variation Solr work. Year 2: Refactor project pages with scalable (but still flexible) data transfer (probably also Solr-driven) & update graphics.

Scaling up to millions of SNPs, thousands of samples Plan to develop or modify something similar to MalariaGen's Panoptes with richer/more flexible metadata capabilities:

Upcoming genome updates June 2015 – sandflies x 2 – anopheles assembly updates x 4 Summer QC of Glossina workshop data, 16G data August 2015 – Release of malariaGen 1000G data (pending publication plans); we expect ~50 million new malaria mosquito variants by the end of summer. October - Glossinas x 6

Updating genes and assemblies We recently supported the Glossina gene annotation workshop held in Kenya (3/2015). The workshop data will be integrated into the existing Glossina databases for release in late A new database for the final species (Glossina palpalis) will also be created for release in late Assembly updates for An. farauti, An. melas, An. merus and An. sinensis have been examined to assess whether we can project gene information onto the new assemblies. Over 90% of transcripts could be projected and we intend to schedule the assembly updates for Q New databases have been proposed for Sarcoptes scabiei var canis, and Aedes albopictus. Emrich, Hahn, Lawnziak and Besansky will submit a new reference genome of An. gambiae (S) for summer 2015.

Improved EBI production Data management systems Webapollo databases have been set up for 32 organisms, and are being actively used by the community for Biomphalaria glabrata (snail), Phlebotamus papatasi and Lutzomyia longipalpis (sandflies), Musca domesticus (house fly) and the five current Glossina (tsete) species. IT infrastructure VectorBase production pipelines are being migrated to the EBI eHive system ( This encourages standardization of our code base, and also allows using EBI parallel computing resources. Analysis tools New pipelines for xrefs, search, protein alignment and exonerate based sequence alignments have been developed using the eHive system. This has allowed us to speed up run times in addition to the advantages above.

Future production work at EBI Search We had previously experienced scaling problems with the generation of Solr indices for the VectorBase search, and have now rewritten the core gene Solr gene index generation for eHive. Updating genome data Projection of gene descriptions between closely related orthologs will be introduced in an attempt to improve basal gene annotation in some of the new species. First deployment of this code is scheduled for June Transcript, genomic sequence and GTF/GFF dumping have been included in the eHivr pipeline, but data files are still updated on the VectorBase drupal site in a manual fashion. Adding the UCSC track hub system to facilitate metadata and additional “-omics” data