Presentation is loading. Please wait.

Presentation is loading. Please wait.

GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM.

Similar presentations


Presentation on theme: "GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM."— Presentation transcript:

1 GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM Conference on Functional Genomics and Bioinformatics Approaches to Infectious Disease Research October 8, 2004 Portland, Oregon

2 Database Options for Integrated Functional Genomics Requirements Requirements Covers genomics and functional genomics Covers genomics and functional genomics Active and open developer community Active and open developer community Options Options GUS: Genomics Unified Schema GUS: Genomics Unified Schema Chado: generic model organism database (GMOD http://www.gmod.org) Chado: generic model organism database (GMOD http://www.gmod.org)

3 GUS CoreSRESTESSRADDoTS Oracle RDBMS Object Layer for Data Loading Java Servlets A Few GUS Web Sites Sanger Institute U. Georgia Flora Centromere Database U. Chicago U. Penn U. Toronto Phytophthora sojae genome Virginia Bioinformiatics Insitiute

4 GUS (Genomics Unified Schema) http://www.gusdb.org MIAME/MAGE-OM Gene ExpressionRAD EST clusters Gene models Sequence and annotation DoTS DocumentationData ProvenanceCore Ontologies Shared Resources Sres TFBS organization Gene RegulationTESS FeaturesDomainNamespace

5 RAD EST clustering and assembly DoTS Genomic alignment and comparative sequence analysis Identify shared TF binding sites TESS BioMaterial annotation SRES

6 Examples of GUS users Large sequencing center Large sequencing center GeneDB: Pathogen Sequencing Unit at the Sanger Institute GeneDB: Pathogen Sequencing Unit at the Sanger Institute Lightly staffed genomics project Lightly staffed genomics project CryptoDB: Kissinger Lab, University of Georgia CryptoDB: Kissinger Lab, University of Georgia Data mining project Data mining project Multiple plant species: Brett Tyler, Virginia Bioinformatics Institute and collaborators Multiple plant species: Brett Tyler, Virginia Bioinformatics Institute and collaborators Expression based project Expression based project dbDirt: Allen Okey, University of Toronto dbDirt: Allen Okey, University of Toronto Bioinformatics Core Facility Bioinformatics Core Facility University of Pennsylvania Bioinformatics Core Facility University of Pennsylvania Bioinformatics Core Facility

7 GUS Project Goals Provide: Provide: A platform for broad genomics data integration A platform for broad genomics data integration An infrastructure system for functional genomics An infrastructure system for functional genomics Support: Support: Websites with advanced query capabilities Websites with advanced query capabilities Research driven queries and mining Research driven queries and mining

8 GUS components Warehouse (Oracle or PostgreSQL) Perl Object Layer Web Development Kit Queries And analysis Your data GenBank NRDB dbEST SNPs Genetraps MicroArrays Phenotypes Pathways Orthologs Taxonomy GO SO EC More… Data Load API Pipeline API Plugins (data loaders)

9 Functional genomics with GUS Sequence & Features Study Functional Annotation of the Genome Central Dogma Regulation (TESS) Expression (RAD) Sample Image Analysis Statistical Processing Interaction Study Proteomics Sample Image Analysis Statistical Processing Study In Situ Hybridization ImmunoHistChem Sample Image Analysis Statistical Processing MIAME www.mged.org MIAPE psidev.sf.net MISFISHIE www.scgap.org

10 GUS versus chado GUS represents biology in the database tables GUS represents biology in the database tables Forces applications to load and retrieve data consistently Forces applications to load and retrieve data consistently Chado represents biology in the applications Chado represents biology in the applications Allows flexibility in what can be stored but applications may not be consistent Allows flexibility in what can be stored but applications may not be consistent

11 Central dogma and sequences NA Sequence Gene Feature RNA Feature Protein Feature AA Sequence

12 Central dogma and sequences GeneRNAProtein NA SequenceAA Sequence Gene Feature RNA Feature Protein Feature

13 Central dogma and sequences GeneRNAProtein NA SequenceAA Sequence genome Multiple sequences (experimental variety) Gene 1Gene 2 RNA Multiple genes

14 Central dogma and sequences GeneRNAProtein NA SequenceAA Sequence Gene Instance RNA Instance Protein Instance Gene Feature RNA Feature Protein Feature

15 Obtaining and Using GUS www.gusdb.org www.gusdb.org www.gusdb.org More info at www.gusdb.org/documentation More info at www.gusdb.org/documentationwww.gusdb.org/documentation Active gusdev mailing list Active gusdev mailing list Relatively straightforward to install Relatively straightforward to install Loading data a struggle for new users Loading data a struggle for new users Growing number of tools available Growing number of tools available Addressing how to use and write tools with visits Addressing how to use and write tools with visits Web Development Kit (WDK) to generate web sites on GUS Web Development Kit (WDK) to generate web sites on GUS

16 Current GUS Developers At Penn Steve Fischer: Project manager, WDK, Steve Fischer: Project manager, WDK, Elisabetta Manduchi: RAD project manager, RAD study annotator Elisabetta Manduchi: RAD project manager, RAD study annotator Angel Pizarro: Schema development, proteomics, MAGE export Angel Pizarro: Schema development, proteomics, MAGE export Mike Saffitz: DBA, web services, Postgres Mike Saffitz: DBA, web services, Postgres Dave Barkan: WDK, GO pipeline, Apollo interface Dave Barkan: WDK, GO pipeline, Apollo interface Thomas Gan: WDK, genomic alignments pipeline Thomas Gan: WDK, genomic alignments pipeline John Iodice: ApiDoTS pipeline, data loading John Iodice: ApiDoTS pipeline, data loading Li Li: OrthoMCL pipeline Li Li: OrthoMCL pipeline Junmin Liu: RAD websites, expression displays Junmin Liu: RAD websites, expression displays Debbie Pinney: Data loaders, Hum and MusDoTS pipeline Debbie Pinney: Data loaders, Hum and MusDoTS pipeline Jonathan Schug: TESS, architecture and schema development Jonathan Schug: TESS, architecture and schema development Trish Whetzel: Data loading, RAD, schema development Trish Whetzel: Data loading, RAD, schema development Plus rest of group contributes through various GUS-based projects Plus rest of group contributes through various GUS-based projects Pathogen Sequencing Unit, Sanger Institute Kissinger Group, U. of Georgia Terry Clark, U. of Chicago

17 WDKTestSite Developed in collaboration with Adrian Tivey& Marie-Adele Rajandream (PSU, Sanger Institute)

18 The PlasmoDB Team Shailesh Date Kobby Essien Martin Fraunholz Bindu Gajria Greg Grant John Iodice Jessie Kissinger Philip Labo Li Jules Milgram David Roos Chris Stoeckert Trish Whetzel NIAID grant: R01 AI058515

19 GUS supports a wide variety of queries

20 Suppose you want to find all kinases in P. falciparum

21 Gene Report Pages Integrate Genomics and Functional Genomics

22 RAD Study-Annotator Covers the MIAME checklist and exploits the MGED Ontology Allows entering of very specific details of an experiment Web-based forms: Modular structure Written in PHP Front-end data integrity checks using JavaScript Manages Data Privacy based on Project/Group selections present in GUS schema Manduchi et al. 2004 Bioinformatics 20:452-459.

23 Vision for GUS Installable for every lab Installable for every lab Improve install scripts, documentation Improve install scripts, documentation Postgres version Postgres version Extendable to all areas of functional genomics Extendable to all areas of functional genomics Sequence, array-based expression experiments Sequence, array-based expression experiments Array CGH, 2-D gel electrophoresis, mass spectrometry, yeast 2-hybrids Array CGH, 2-D gel electrophoresis, mass spectrometry, yeast 2-hybrids In situ hybridizations, metabolites In situ hybridizations, metabolites Interoperable with other GUS installations and with common tools Interoperable with other GUS installations and with common tools Exchange files and scripts, MAGE-ML (use community standards) Exchange files and scripts, MAGE-ML (use community standards) Web services (exchange objects) Web services (exchange objects) Interface with open source tools such as Gbrowse, Artemis, Apollo Interface with open source tools such as Gbrowse, Artemis, Apollo

24 Standards and Ontologies for Functional Genomics 2 October 23-26, 2004 held at the University of Pennsylvania Medical School www.jax.org/courses/events Funded in part by NHGRI NCRR NERC GSK Co-Hosted by The Jackson Laboratory University of Pennsylvania European Bioinformatics Institute ------------------------ Student Scholarships Available -------------------------------------------------------- Photo by R. Kennedy, B Trist, R. Tarver, for GPTMC


Download ppt "GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM."

Similar presentations


Ads by Google