GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM.

Slides:



Advertisements
Similar presentations
Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB
Advertisements

The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
The MGED Ontology Workshop MGED 7 September 8, 2004 Chris Stoeckert Center for Bioinformatics & Dept. of Genetics University of Pennsylvania.
Bioinformatics for genomics Kickoff Bioinformatics Expertise Center 10 November 2009 Judith Boer Dept. of Human Genetics.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Building CryptoDB using GUS Mark Heiges Center for Tropical and Emerging Global Diseases University of Georgia
Computational Analysis of Tissue Specificity: Decoding Promoters Chris Stoeckert, Ph.D. Center for Bioinformatics & Dept. of Genetics University of Pennsylvania.
Integrated Data Systems for Genomic Analysis Genomics and Bioinformatics for the Advancement of Clinical Sciences Thomas Jefferson University, Oct. 14,
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
The MGED Ontology: A framework for describing functional genomics experiments SOFG Nov. 19, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
MARS: Microarray analysis, retrieval, and storage system Albert F. Cervantes.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research
GUS Overview June 18, GUS-3.0 Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses.
INTRODUCTION GOAL: to provide novel types of interaction between classification systems and MIAME-compliant databases We present a prototype module aimed.
Controller View (web) Model Model T HE E U P ATH DB / GUS-WDK S EARCH S TRATEGY S YSTEM Cristina Aurrecoechea 1, Brian P. Brunk 2, Steve Fischer 2, Xin.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
Sharing Microarray Experiment Knowledge Chips to Hits Oct. 28, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for Bioinformatics University of.
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA.
Copyright OpenHelix. No use or reproduction without express written consent1.
Generic model/many/my organism database Oct 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University GMOD.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
GMOD: Managing Genomic Data from Emerging Model Organisms Dave Clements 1, Hilmar Lapp 1, Brian Osborne 2, Todd J. Vision 1 1 National Evolutionary Synthesis.
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Got genom e? Community Meetings GMOD.org The GMOD community meets semi- annually to discuss GMOD components, best practices,
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.
Digesting the Genome Glut Promoting the Use and Extension of GMOD To Emerging Model Organisms David Clements 1 Brian Osborne 2 Hilmar Lapp 1 Xianhua Liu.
2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Generating Useful Information in Toxicogenomics: Focused Efforts: Microarray Standards Feb. 6, 2003, The National Academies Chris Stoeckert, Ph.D. Center.
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
GUS 3.0: Implementation and Dependencies June 19, 2002 Jonathan Crabtree
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
The MGED Ontology W3C Workshop on Semantic Web for life Sciences October 27, 2004 Presented by Liju Fan MGED Ontology Working Group Senior Scientist, KEVRIC.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
A collaborative tool for sequence annotation. Contact:
Cracking the promoter code: Identifying regulatory modules for tissue-specific transcripts. Chris Stoeckert, Ph.D. Center for Bioinformatics & Dept. of.
TEMBLOR mid-term review Participation in DESPRAD project Bernd Drescher Robert Wagner.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
IMDB: A Generic Insertional Mutagenesis Database Xiaokang Pan and Lincoln Stein Cold Spring Harbor Laboratory.
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
ArrayExpress Ugis Sarkans EMBL - EBI
National Cancer Institute Uma Mudunuri ABCC, NCI-Frederick ISRCE Monthly Meeting, Nov 9th 2010 bioDBnet The biological DataBase network.
GUS We have created the Genomic Unified Schema (GUS), a relational database that warehouses and integrates biological sequence, sequence annotation, and.
Behavior and Phenotype in GMOD Natural Diversity in GMOD
Bioinformatics Tools for Comparative Genomics of Vectors
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Rationale for GUS Answer queries:
Current and Future Directions
Information Management Infrastructure for the Systematic Annotation of Vertebrate Genomes V Babenko (1), B Brunk (1), J Crabtree (1), S Diskin (1), Y Kondrahkin.
RAD (RNA Abundance Database)
From EpoDB to EPConDB: Adventures in Gene Expression Databases
Integrating Genomic Databases
Leveraging EST Sequencing, Micro Array Experiments and Database Integration for Gene Expression Analyses The Computational Biology and Informatics Laboratory.
Functional Genomics Consortium: NIDDK (Kaestner) and (Permutt)
Presentation transcript:

GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM Conference on Functional Genomics and Bioinformatics Approaches to Infectious Disease Research October 8, 2004 Portland, Oregon

Database Options for Integrated Functional Genomics Requirements Requirements Covers genomics and functional genomics Covers genomics and functional genomics Active and open developer community Active and open developer community Options Options GUS: Genomics Unified Schema GUS: Genomics Unified Schema Chado: generic model organism database (GMOD Chado: generic model organism database (GMOD

GUS CoreSRESTESSRADDoTS Oracle RDBMS Object Layer for Data Loading Java Servlets A Few GUS Web Sites Sanger Institute U. Georgia Flora Centromere Database U. Chicago U. Penn U. Toronto Phytophthora sojae genome Virginia Bioinformiatics Insitiute

GUS (Genomics Unified Schema) MIAME/MAGE-OM Gene ExpressionRAD EST clusters Gene models Sequence and annotation DoTS DocumentationData ProvenanceCore Ontologies Shared Resources Sres TFBS organization Gene RegulationTESS FeaturesDomainNamespace

RAD EST clustering and assembly DoTS Genomic alignment and comparative sequence analysis Identify shared TF binding sites TESS BioMaterial annotation SRES

Examples of GUS users Large sequencing center Large sequencing center GeneDB: Pathogen Sequencing Unit at the Sanger Institute GeneDB: Pathogen Sequencing Unit at the Sanger Institute Lightly staffed genomics project Lightly staffed genomics project CryptoDB: Kissinger Lab, University of Georgia CryptoDB: Kissinger Lab, University of Georgia Data mining project Data mining project Multiple plant species: Brett Tyler, Virginia Bioinformatics Institute and collaborators Multiple plant species: Brett Tyler, Virginia Bioinformatics Institute and collaborators Expression based project Expression based project dbDirt: Allen Okey, University of Toronto dbDirt: Allen Okey, University of Toronto Bioinformatics Core Facility Bioinformatics Core Facility University of Pennsylvania Bioinformatics Core Facility University of Pennsylvania Bioinformatics Core Facility

GUS Project Goals Provide: Provide: A platform for broad genomics data integration A platform for broad genomics data integration An infrastructure system for functional genomics An infrastructure system for functional genomics Support: Support: Websites with advanced query capabilities Websites with advanced query capabilities Research driven queries and mining Research driven queries and mining

GUS components Warehouse (Oracle or PostgreSQL) Perl Object Layer Web Development Kit Queries And analysis Your data GenBank NRDB dbEST SNPs Genetraps MicroArrays Phenotypes Pathways Orthologs Taxonomy GO SO EC More… Data Load API Pipeline API Plugins (data loaders)

Functional genomics with GUS Sequence & Features Study Functional Annotation of the Genome Central Dogma Regulation (TESS) Expression (RAD) Sample Image Analysis Statistical Processing Interaction Study Proteomics Sample Image Analysis Statistical Processing Study In Situ Hybridization ImmunoHistChem Sample Image Analysis Statistical Processing MIAME MIAPE psidev.sf.net MISFISHIE

GUS versus chado GUS represents biology in the database tables GUS represents biology in the database tables Forces applications to load and retrieve data consistently Forces applications to load and retrieve data consistently Chado represents biology in the applications Chado represents biology in the applications Allows flexibility in what can be stored but applications may not be consistent Allows flexibility in what can be stored but applications may not be consistent

Central dogma and sequences NA Sequence Gene Feature RNA Feature Protein Feature AA Sequence

Central dogma and sequences GeneRNAProtein NA SequenceAA Sequence Gene Feature RNA Feature Protein Feature

Central dogma and sequences GeneRNAProtein NA SequenceAA Sequence genome Multiple sequences (experimental variety) Gene 1Gene 2 RNA Multiple genes

Central dogma and sequences GeneRNAProtein NA SequenceAA Sequence Gene Instance RNA Instance Protein Instance Gene Feature RNA Feature Protein Feature

Obtaining and Using GUS More info at More info at Active gusdev mailing list Active gusdev mailing list Relatively straightforward to install Relatively straightforward to install Loading data a struggle for new users Loading data a struggle for new users Growing number of tools available Growing number of tools available Addressing how to use and write tools with visits Addressing how to use and write tools with visits Web Development Kit (WDK) to generate web sites on GUS Web Development Kit (WDK) to generate web sites on GUS

Current GUS Developers At Penn Steve Fischer: Project manager, WDK, Steve Fischer: Project manager, WDK, Elisabetta Manduchi: RAD project manager, RAD study annotator Elisabetta Manduchi: RAD project manager, RAD study annotator Angel Pizarro: Schema development, proteomics, MAGE export Angel Pizarro: Schema development, proteomics, MAGE export Mike Saffitz: DBA, web services, Postgres Mike Saffitz: DBA, web services, Postgres Dave Barkan: WDK, GO pipeline, Apollo interface Dave Barkan: WDK, GO pipeline, Apollo interface Thomas Gan: WDK, genomic alignments pipeline Thomas Gan: WDK, genomic alignments pipeline John Iodice: ApiDoTS pipeline, data loading John Iodice: ApiDoTS pipeline, data loading Li Li: OrthoMCL pipeline Li Li: OrthoMCL pipeline Junmin Liu: RAD websites, expression displays Junmin Liu: RAD websites, expression displays Debbie Pinney: Data loaders, Hum and MusDoTS pipeline Debbie Pinney: Data loaders, Hum and MusDoTS pipeline Jonathan Schug: TESS, architecture and schema development Jonathan Schug: TESS, architecture and schema development Trish Whetzel: Data loading, RAD, schema development Trish Whetzel: Data loading, RAD, schema development Plus rest of group contributes through various GUS-based projects Plus rest of group contributes through various GUS-based projects Pathogen Sequencing Unit, Sanger Institute Kissinger Group, U. of Georgia Terry Clark, U. of Chicago

WDKTestSite Developed in collaboration with Adrian Tivey& Marie-Adele Rajandream (PSU, Sanger Institute)

The PlasmoDB Team Shailesh Date Kobby Essien Martin Fraunholz Bindu Gajria Greg Grant John Iodice Jessie Kissinger Philip Labo Li Jules Milgram David Roos Chris Stoeckert Trish Whetzel NIAID grant: R01 AI058515

GUS supports a wide variety of queries

Suppose you want to find all kinases in P. falciparum

Gene Report Pages Integrate Genomics and Functional Genomics

RAD Study-Annotator Covers the MIAME checklist and exploits the MGED Ontology Allows entering of very specific details of an experiment Web-based forms: Modular structure Written in PHP Front-end data integrity checks using JavaScript Manages Data Privacy based on Project/Group selections present in GUS schema Manduchi et al Bioinformatics 20:

Vision for GUS Installable for every lab Installable for every lab Improve install scripts, documentation Improve install scripts, documentation Postgres version Postgres version Extendable to all areas of functional genomics Extendable to all areas of functional genomics Sequence, array-based expression experiments Sequence, array-based expression experiments Array CGH, 2-D gel electrophoresis, mass spectrometry, yeast 2-hybrids Array CGH, 2-D gel electrophoresis, mass spectrometry, yeast 2-hybrids In situ hybridizations, metabolites In situ hybridizations, metabolites Interoperable with other GUS installations and with common tools Interoperable with other GUS installations and with common tools Exchange files and scripts, MAGE-ML (use community standards) Exchange files and scripts, MAGE-ML (use community standards) Web services (exchange objects) Web services (exchange objects) Interface with open source tools such as Gbrowse, Artemis, Apollo Interface with open source tools such as Gbrowse, Artemis, Apollo

Standards and Ontologies for Functional Genomics 2 October 23-26, 2004 held at the University of Pennsylvania Medical School Funded in part by NHGRI NCRR NERC GSK Co-Hosted by The Jackson Laboratory University of Pennsylvania European Bioinformatics Institute Student Scholarships Available Photo by R. Kennedy, B Trist, R. Tarver, for GPTMC