First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA.

Slides:



Advertisements
Similar presentations
Introduction to BioConductor Friday 23th nov 2007 Ståle Nygård Statistical methods and bioinformatics for the analysis of microarray.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
Chapter 14 The Second Component: The Database.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
User Group 2015 Version 5 Features & Infrastructure Enhancements.
GUS Overview June 18, GUS-3.0 Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses.
INTRODUCTION GOAL: to provide novel types of interaction between classification systems and MIAME-compliant databases We present a prototype module aimed.

Cytoscape A powerful bioinformatic tool Mathieu Michaud
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Database System Concepts and Architecture
GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM.
Introduction to MDA (Model Driven Architecture) CYT.
BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004.
Configuration Management (CM)
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Generic model/many/my organism database Oct 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University GMOD.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
WDK Overview How the WDK implements MVC and provides a base from which custom sites can be created.
GMOD: Managing Genomic Data from Emerging Model Organisms Dave Clements 1, Hilmar Lapp 1, Brian Osborne 2, Todd J. Vision 1 1 National Evolutionary Synthesis.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
SiD Workshop October 2013, SLACDmitry Onoprienko SiD Workshop SLAC, October 2013 Dmitry Onoprienko SLAC, SCA FreeHEP based software status: Jas 3, WIRED,
Web Development Kit (WDK) Y. Thomas Gan
Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
FuGE: A framework for developing standards for functional genomics Angel Pizarro Univesrity of Pennsylvania Andrew Jones University of Manchester.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 Slides from Michael Dicuccio’s Genome Workbench.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
GUS 3.0: Implementation and Dependencies June 19, 2002 Jonathan Crabtree
GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree
EB3233 Bioinformatics Introduction to Bioinformatics.
Mining the Biomedical Research Literature Ken Baclawski.
Web Technologies for Bioinformatics Ken Baclawski.
High throughput biology data management and data intensive computing drivers George Michaels.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
International Planetary Data Alliance Registry Development and Coordination Project Report 7 th IPDA Steering Committee Meeting July 13, 2012.
Canadian Bioinformatics Workshops
Presenter: Bradley Green.  What is Bioinformatics?  Brief History of Bioinformatics  Development  Computer Science and Bioinformatics  Current Applications.
CIS 375 Bruce R. Maxim UM-Dearborn
Linux Standard Base Основной современный стандарт Linux, стандарт ISO/IEC с 2005 года Определяет состав и поведение основных системных библиотек.
Behavior and Phenotype in GMOD Natural Diversity in GMOD
Information Systems Today: Managing in the Digital World
University of Pittsburgh
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Analysis models and design models
Information Management Infrastructure for the Systematic Annotation of Vertebrate Genomes V Babenko (1), B Brunk (1), J Crabtree (1), S Diskin (1), Y Kondrahkin.
Functional Genomics Consortium: NIDDK (Kaestner) and (Permutt)
Metadata The metadata contains
Supporting High-Performance Data Processing on Flat-Files
SDMX IT Tools SDMX Registry
Presentation transcript:

First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA

Workshops Goals Work through issues –Installing GUS –Loading data into GUS –Analyzing and viewing data in GUS Coordinate future development –Changes to schema and application framework –New plug-ins –New application adapters

A Brief History of GUS Genomics Unified Schema –V1.0 in 2000 –Previously had separate databases for: Genome annotation EST assemblies (DoTS) Microarrays and SAGE (RAD) Transcription element search software (TESS) –Strengthen each effort by providing deep annotation e.g., cDNAs on microarray in RAD get annotation from assemblies in DoTS –Learn and store relationships between genes, RNAs, and proteins Strong typing: meaningful relationships

RAD EST clustering and assembly DoTS Genomic alignment and comparative sequence analysis Identify shared TF binding sites TESS BioMaterial annotation SRES

GUS versus Chado GUS represents biology in the database tables –Forces applications to load and retrieve data consistently Chado represents biology in the applications –Allows flexibility in what can be stored but applications may not be consistent

GUS Project Goals Provide: –A platform for broad genomics data integration –An infrastructure system for functional genomics Support: –Websites with advanced query capabilities –Research driven queries and mining

SchemasDomainFeatures DoTSSequence and annotation EST clusters Gene models RADGene expressionMIAME ProtProtein expression Mass spec mzdata StudyExperimentsFuGE TESSGene RegulationTFBS organization SResShared resources Ontologies CoreAdministrationDocumentation, Data Provenance GUS 3.5 Schemas

DoTS: Central dogma and relating biological sequences NA Sequence Gene Feature RNA Feature Protein Feature AA Sequence Load GenBank, NRDB, sequencing center files, dbEST entries

DoTS: Central dogma and relating biological sequences GeneRNAProtein NA SequenceAA Sequence Gene Feature RNA Feature Protein Feature Concepts that are independent of any individual sequence because sequences may be incomplete, a variant, or not well annotated.

DoTS: Central dogma and relating biological sequences GeneRNAProtein NA SequenceAA Sequence genome Multiple sequences (experimental variety) Gene 1Gene 2 RNA Multiple genes Concepts may be related to multiple sequences due to biology, experiments, or computational predictions.

DoTS: Central dogma and relating biological sequences GeneRNAProtein NA SequenceAA Sequence Gene Instance RNA Instance Protein Instance Gene Feature RNA Feature Protein Feature Instances reflect our understanding of sequence associations.

GUS::Supported::LoadArrayDesign GUS::Supported::Plugin::LoadArrayResults Or GUS::Community::Plugin::LoadBatchArrayResults GUS::Supported::Plugin::InsertRadAnalysis Load Array Info Create new study (web) Create assays, acquisitions and quantifications Load quantification data Load processed data or analysis results End RAD::StudyAnnotator::Module II RAD::StudyAnnotator::Module III Annotate experimental design and biomaterials (web) RAD::StudyAnnotator::Module I (all software) Or (some software) GUS::Community::Plugin::InsertMAS5Assay2Quantification or GUS::Community::Plugin::InsertGenePixAssay2Quantification RAD::StudyAnnotator::Study Form RAD: Loading/Annotation

Prot and Study: Generalization of RAD to other technologies RAPAD prototype made a copy of RAD and dropped/inserted tables for 2-D gels and mass spec. –Jones et al. Bioinformatics In GUS 3.5, Study contains descriptions of samples (BioMaterials), sample protocols, and experimental design. –Technology-specific protocols are in RAD, Prot. In GUS 3.5, Prot is now based on standard mzdata output of mass spectrometers –To add soon, Peptide identification from programs like Sequest and MASCOT (held in DoTS currently)

TESS: TF to binding site relationships in the context of computational models

Sequence & Features Functional Annotation of the Genome Central Dogma (DoTS) Regulation (TESS) Expression (RAD) Image Analysis Statistical Processing Interaction Proteomics (Prot) Image Analysis Statistical Processing MIAMEMIAPE Experimental Design and Samples (Study) New schemas for additional domains

Future Schemas Population genetics –Relate polymorphisms, genotypes, phenotypes –Currently in DoTS Comparative genomics –Syntenies, phylogenies –Currently in DoTS Metabolomics –Small molecules –Use Study and adapt Prot In situs / Immunohistochemistry –Use Study and adapt RAD

GUS Components Schema Application Framework –Object/Relational Layer –Plugin API –Pipeline API Plug-ins Web Development Kit (WDK)

GUS Application Framework Motivation: Consistent and reusable access and manipulation of data Object Relational: 1:1 Mapping between tables and language objects Provides –Relationship Management –Cascading Operations –Cache Management –Basic Access Control Automation of Data Provenance and Evidence With APIs, foundation for advanced tools and applications.

Web Development Kit (WDK) Database Independent Facilitates development of data mining oriented websites: –Multiple parameterized canned queries –Sophisticated records –Graphical views –Boolean query facility –Query history –Session management, process pooling, flow control Model, View, Controller (MVC) Design –Separates application logic (Model) from website layout (View) and application flow (Controller) –Model: XML-based queries and records –View: JSP –Controller: Struts

GUS Version Caveat GUS 3.0 ~ 12/02 GUS 3.1 ~ 12/03 GUS 3.2 ~ 02/04 –Concrete Schema Versions –Application Code in Flux GUS /05 –First concrete release with distributable Proposal: Separate versioning for Schema and Application Framework

GUS 3.5 Improved Distribution –Installer, DBAdmin Tools –Bootstrap Data -- Algorithm Parameters, Core.TableInfo –Plugin Quality -- “New” API, Tested –Documentation -- Install, User’s, and Developer’s Guides –Requisite jars Included -- Oracle, PostgreSQL Extended Support –PostgreSQL Compatible –Java Object Model -- Consistently Compiles Schema Improvements –Proteomics Support –Standard Study Support –Schema Cleanup Requested schema fixes primarily to DoTS Removal of deprecated tables -- Workflow

GUS 3.? -> 3.5 Migration Not Trivial –Many potential starting points –Not all data has a migration path Upgrade Possibilities –In Place Upgrade –Data load and transform –Start New Possible Routes –GUS DBAdmin Tools –Third party (OEM) Tools –Everyone for themselves

GUS Small Schema Changes –TESS, Attribute Changes Improved Developer’s and User’s Guides Additional Supported Plug-ins DBAdmin Code Cleanup Upgrade Scripts Expected early August

GUS 4.0 and beyond Object Layer Improvements –Class::DBI-- Perl O/R Layer –Hibernate -- Java O/R Layer Improved Subclassing –Multiple Layers –Eliminate Performance Issues Refactor DoTS Redistribute tables between RAD, Prot, and Study Additional Biological Domains

GUS Project Resources Website –News, Documentation, Distributable, GUS-based Projects

GUS Project Resources Mailing List –~ 90 Subscribers –1700 Messages over 3 years GUS Wiki –User Notes and Documentation Central Dogma Schema Design Subclassing System Data Provenance Development Tracking: 3.5 Roadmap, 4.0 Schema Ideas WDK Documentation

GUS Project Resources Subversion Source Control System –Anonymous Read Access for “Bleeding Edge” releases –Web-based Code Review –“Commits” Mailing List Schema Browser –Online Schema and Relationships Review GUS Issue Tracker –Bugzilla Based

GUS Project Coordination - Areas of Focus Administration –Installer, Data Bootstrapping, dba Utilities Schema –Data model, Subclassing Techniques, Data Provenance Framework –Object/Relational Technologies, Plugin & Pipeline APIs Plug-in –Data loading mechanisms

GUS Project Coordination - Areas of Focus Documentation –Installation, User’s, and Developer’s Guides –Wiki Web Development Kit –Well established working group Tool adapters –GBrowse, Apollo, etc. Integration Later: Development Priorities Discussion –Where should we focus our efforts?