The GMOD Project: Creating Reusable Software Components for Genome Data Scott Cain GMOD Project Coordinator Cold Spring Harbor Laboratory.

Slides:



Advertisements
Similar presentations
DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Tony Cox 2, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Ed Griffiths.
Advertisements

May 16, 2005Scott Cain, CSHL. May 16, 2005Scott Cain, CSHL gmod update Gmod RC2 last week New for 0.003: –Generic triggers for Apollo –Greatly enhanced.
Generic model/many/my organism database toolkit Dec 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University GMOD.
Chado Generic model organism database schema Presented at the NESCent GMOD Meeting 20 January, 2005 David Emmert
GMODTools, Argos & cetera A Replicable Genome infOrmation System of Common Components GMOD Meeting, Oct Don Gilbert,
Generic model/many/my organism database Oct/Nov 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University GMOD.
Biopackages.net Operating System Packages for Bioinformatics Allen Day
GMOD Meeting, May 2005 Patent Pending, Caltech Proprietary Textpresso Search engine for Biomedical Literature ~Eimear Kenny~
WormBase Workshop: 2015 International C. elegans Meeting Tools & Resources InterMine / WormMine – Chris Grove JBrowse – Scott Cain The WormBase Ontology.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
GMOD: Building Blocks for a Model Organism System Database Lincoln Stein, CSHL.
WormBase: A Resource for the Biology & Genome of C. elegans Lincoln D. Stein.
GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research
WFleaBase Daphnia Genome Database from Common Components Daphnia Genomic Consortium Meeting, Sept Don Gilbert,
WebGBrowse A Web Server for GBrowse Configuration Ram Podicheti B.V.Sc. & A.H. (D.V.M.), M.S. Staff Scientist – Bioinformatics Center for Genomics and.
{ Web Apollo A Web-based Genomics Annotation Editing Platform Ed Lee, Gregg Helt, Justin Reese, Monica Munoz-Torres*, Christopher Childers, Rob Buels,
Chado and interoperability Chris Mungall, BDGP Pinglei Zhou, FlyBase-Harvard.
Comparative Genomics Tools in GMOD GMOD.org Dave Clements 1, Sheldon McKay 2, Ken Youns-Clark 2, Ben Faga 3, Scott Cain 4, and the GMOD Consortium 1 National.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Curation Editor Flexible web based editor for non gene model data. FlyBase – Harvard University Frank Smutniak.
Jan 20, 2006NESCent GMOD Administrivia Scott Cain, GMOD Cooridinator.
How many vegetarians are there? And... Before I do anything...
Lacey-Anne Sanderson A Toolkit for Construction of Genomic and Genetic Websites.
First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA.
The Hymenoptera Genome Database (HGD, is an informatics resource supporting genomics of hymenopteran insect species. It currently.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
Generic model/many/my organism database Oct 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University GMOD.
Andy Conley 3/26/ James Kent. Know that name. He is one of greatest, perhaps the greatest, bioinformatics programmers ever. He was deeply involved.
WebApollo: A Web-Based Sequence Annotation Editor for Community Annotation Ed Lee, Gregg Helt, Nomi Harris, Mitch Skinner, Christopher Childers, Justin.
GMOD Help Desk Dave Clements. GMOD Help Desk What I've been doing What I'm planning on doing What should I be doing? How am I doing?
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
GMOD: Managing Genomic Data from Emerging Model Organisms Dave Clements 1, Hilmar Lapp 1, Brian Osborne 2, Todd J. Vision 1 1 National Evolutionary Synthesis.
Apollo Future Plans Nomi Harris, BDGP/FlyBase GMOD Meeting, Cambridge April 27, 2004.
Copyright OpenHelix. No use or reproduction without express written consent1.
Common Gene Pages Scott Cain GMOD Coordinator Cold Spring Harbor Laboratory.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Got genom e? Community Meetings GMOD.org The GMOD community meets semi- annually to discuss GMOD components, best practices,
Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase Eric Just dictyBasehttp://dictybase.org Center for Genetic Medicine Northwestern.
Toward a Unified Gene Page GMOD Meeting, April 2004 Don Gilbert,
Bulk data files // TeraGrid uses for Genome Databases GMOD meet, June 2006 Don Gilbert,
Managing Next Generation Sequence Data with GMOD Dave Clements 1, Scott Cain 2, Paul Hohenlohe 3, Nicholas Stiffler 3, Paul Etter 3, Eric Johnson 3, William.
Turnkey for any database schema Allen Day, Sept 2003 generate a web front end.
Digesting the Genome Glut Promoting the Use and Extension of GMOD To Emerging Model Organisms David Clements 1 Brian Osborne 2 Hilmar Lapp 1 Xianhua Liu.
GMOD Meeting August 6-7, 2009 Oxford, UK Scott Cain, PhD. GMOD Project Coordinator Ontario Institute for Cancer Research
GMODWeb, Biopackages, & Virtual Machines Brian O'Connor Nelson Lab, UCLA 1/16/2009.
Copyright OpenHelix. No use or reproduction without express written consent1.
GBrowse Population Display and CMap SMBE 2009 Ben Faga.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
5/8/06 Scott Cain Stein Lab Retreat, 2006 GMOD Update Progress since last year  Software releases  Notable new users  Schema enhancements  New GMOD.
Copyright OpenHelix. No use or reproduction without express written consent1.
GMOD Architecture Working Group GMOD Summer 2006 Prepared for Scott Cain By Eric Just.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
What's new with GMOD Scott Cain GMOD Coordinator
Copyright OpenHelix. No use or reproduction without express written consent1.
What do we already know ? The rice disease resistance gene Pi-ta Genetically mapped to chromosome 12 Rybka et al. (1997). It has also been sequenced Bryan.
GBrowse: Generic Genome Browser May 2003 Update Lincoln Stein, CSHL.
IMDB: A Generic Insertional Mutagenesis Database Xiaokang Pan and Lincoln Stein Cold Spring Harbor Laboratory.
GMOD Meeting San Diego January 15-16, 2009 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Behavior and Phenotype in GMOD Natural Diversity in GMOD
Gramene Technical Improvements
Data Mining with BioMart
Systems Biology Tools for working with BIND data
Bioinformatics Tools for Comparative Genomics of Vectors
Daphnia Genome Preview at wFleaBase.org
got genome? Community Meetings Databases Training GMOD.org
Plant and Animal Genome XVIII
Mark Quirk Head of Technology Developer & Platform Group
Presentation transcript:

The GMOD Project: Creating Reusable Software Components for Genome Data Scott Cain GMOD Project Coordinator Cold Spring Harbor Laboratory

Model Organism Databases  Community-driven compilations of knowledge about one or more model organisms  Genotype/phenotype correlations.  Evolutionary relationships  Shared resources  Genome annotation, stocks  Other key datasets

Three Views of a Gene WormBase SGD TIGR

The GMOD Project  Standardized solutions for model organism databases  Multiple MODs involved  Original participants: Worm, fly, yeast, mouse, arabidopsis, rat, rice, E. coli  Funded by NIH, USDA/ARS, NFS  Programmers, coordinator, help desk, workshops

The Components of GMOD Standard web site Standard file formats Standard browsers & editors Standard ontologies Standard Schema

Sequence Ontology Karen Eilbeck (U. Utah) Slide from Karen Eilbeck

GMOD Schema: Chado David Emmert (FlyBase), Chris Mungall (Berkeley)  Modular and ontology-driven for flexibility and extensibility. gene mRNA protein transcript translation_product genomic location

Central Dogma Slide from Stan Letovsky

Chado – GMOD Schema David Emmert, Chris Mungall Slide from Stan Letovsky

Chado Schema Diagram created by SQL::Translator

What do you need for Chado?  PostgreSQL (Powerful OS RDMS)  BioPerl  go-perl (Gene Ontology consortium’s perl tools)  Optional:  XORT, a perl tool for loading and dumping XML files to/from a database  ModWare, a BioPerl-compatible API built on Class::DBI

Do you need Chado? It depends…  It is the medium of interoperation for many GMOD applications  Chado is very good at capturing complex biological data, but…  It is a data warehouse, and so can be a little slow to query, so…  If you have only features on sequences, you probably want something else (but I’ve got that too)

Standard Browsers & Editors  GBrowse – Web-based genome annotation viewing (Lincoln Stein, Scott Cain, CSHL)  Apollo – Desktop-based genome annotation editing (Nomi Harris, Berkeley; Michelle Clamp, Broad)  CMap – Web-based comparative map viewing (Ken Clark, Ben Faga, CSHL)  GMODWeb – “Skin-able” Chado-based web site (Allen Day, Brian O’Connor, UCLA)  Textpresso – An ontology driven literature search tool (Hans-Michael Mueller, CalTech)

GBrowse—the Generic Genome Browser (L. Stein, S. Cain)  Cross platform, CGI-based sequence feature browser.  Supports multiple database backends (flat files; Bio::DB::GFF,SeqFeature; Chado; BioSQL)  Highly configurable.  User annotations and features.  Plugin architecture for importers, dumpers and drawers.

Lots of glyphs to choose from… Or create your own!

GBrowse moving to web 2.0 From jimwatsonsequence.cshl.edu

A synteny browser in GBrowse From now distributed with GBrowse in the ‘contrib’ directory.

What do you need for GBrowse?  Apache  libgd  BioPerl  Some place to put your data  Data: GFF2 or GFF3, or GenBank records, or something loaded in to Chado or BioSQL.

Installing GBrowse is easy (no, really!)  Get Apache  Get perl (only if on Windows)  Get libgd (only if on a Unix-like)  Get gbrowse-netinstall.pl from  Run (sudo) perl gbrowse-netinstall.pl  See

Getting started with GBrowse is not too hard  Sample data installed so browsing can start right away.  A tutorial is included to cover many aspects of track configuration, including writing perl callbacks to do very sophisticated stuff.  A very active user mailing list.

Apollo ( Nomi Harris, Michelle Clamp, Mark Gibson )  Downloadable Java application for editing genome annotations  Works with GAME-XML, Chado, Chado-xml, GFF, GenBank  for a double-click installer.

Apollo

CMap (Ken Clark, Ben Faga)  Comparative map viewer for physical, genetic and sequence maps  Web based  Developing an application to use as an assembly editor (CMAE)  Requires Apache, an RDMS, and many perl modules (Bundle::CMap)

CMap

GMODWeb—A mod-perl, template driven window into Chado (Allen Day, Brian O’Connor)  Built on Turnkey (an autogenerated MVC website for any “reasonable” DB).  Uses SQL::Translator to create a perl Class::DBI API for a database.  Creates user-customizable templates for tables in the database.

GMODWeb: Basic Skin Slide from Brian O’Connor

GMODWeb: EnsEMBL Skin Slide from Brian O’Connor

ParameciumDB—a ‘Pure’ GMOD DB

ParameciumDB Gene Page

Textpresso  Facilitates full text searches of research papers (search scope from single sentence to full document)  Facilitates keyword and category searches (adds meaning)  Ontology  has set of 50 categories containing 1.1 million terms  consists of scientific part (such as GO) as well as “colloquial” one  C. elegans corpus has 7,800 papers, 22,000 abstracts, updated weekly Slide from Hans-Michael Mueller

Text markup Mark up the whole corpus of papers with terms of categories and index mark-ups for searching. Slide from Hans-Michael Mueller

Textpresso searching Case sensitive searches (will including bracketing in near future) Boolean operations for keywords Phrase searches Lets you query like: I want to learn about all genes that interact with gene x in cell B Slide from Hans-Michael Mueller

Getting started with Textpresso  Linux  Apache  Lots of disk space (~3GB/1000 full text papers)  Full text papers in pdf format 

Other Components  Pathway Tools – metabolic pathways  BioMart – data mining  Ergatis – genome analysis workflow  PubSearch/PubFetch – literature management  Lucegene – keyword search of genome annotations  Sybil – synteny viewer for Chado

Packaging  RPM-based installs: biopackages.net (Fedora and CentOS)  Virtual machines with software (new)  Source-based “make install”  Examples & tutorials  Help desk  Mailing lists

Tangible Benefits  A community-supported platform on which to build genome-scale databases.  New generation of semantically interoperable MODs (DAS2).  ParameciumDB, BeetleBase, BeeBase, VectorBase, BovineBase, GallusDB, AphidBase, Xanthusbase,ToxoDB, GiardiaDB, LIS, KISS, T1Db, T2Db, CNV Browser, SwissRegulon...

More Information  Credits:  Lincoln Stein  Ken Clark  Allen Day  Karen Eilbeck  David Emmert  Ben Faga  Linda Sperling  Olivier Arnaiz  Nomi Harris  Mark Gibson  Sima Mishra  Chris Mungall  Brian O’Connor  Eric Just  Don Gilbert  Peter Karp for: downloads, documentation, mailing lists …and many more