Rahul Raman, Ram Sasisekharan Bioinformatics Core Massachusetts Institute of Technology Glue Grants Bioinformatics Meeting April 22-23, 2004 San Diego, CA Consortium for Functional Glycomics
Program Goal /Organization Goal: Define Paradigms by which Protein - Glycan Interactions Mediate Cell- Cell Communications
Bioinformatics Core Goals and Approach The over reaching goal for Core B is to provide the ‘face’ of the consortium to the outside world Accomplished using –Web site that provides updated information Consortium’s progress –Complex object-based Relational Databases that facilitate integration of diverse data sets in a structured and meaningful fashion –User interface for data entry, data dissemination and data analysis
Core B: Organization Core Operations & Management Ram Sasisekharan, Rahul Raman Ada Ziolkowski [staff] Bioinformatics Scientific liaisons, User specifications, Bioinformatics applications Nishla Keiser Chipong Kwan Ishan Capila Information Technology Database, web, software applications, user interface development Maha Venkataraman Wei Lang Subu Ramakrishnan Eric. Berry [and other IT members]
Demo Web Site Updates on the Consortium Data from Cores Data uploaded by the Cores Data from Cores Data uploaded by the Cores Molecule Pages Presentation Interface
Core Information Organization Glycan Binding Proteins Glycoenzymes Glycans Experiments MALDI-MS Glycoprofiling (Core C) Gene Microarray (Core E) Mouse Phenotpying (Core G) Screening GBP-Glycan Interactions (Core H) Samples Tissues from WT and KO Mice Cell Lines (Mouse/Human) Glycan Binding Proteins Data files/formats Gene Microarray Readout/Raw Data (formatted ASCII/DAT) MALDI-MS Raw/Annotated data (formatted ASCII) Mouse Phenotyping (formatted Excel worksheets, high-res images) Glycan Screening (formatted ASCII) Protocols Sample Preparation Experiment Methods
Molecule Page Components Automated Acquisition Data from Public databases, links to Public resources Data from Cores Links to Resource and Data IDs for viewing Consortium resources Contribution from Experts Filling out fields as experts on the molecule
Consortium Database Development Key Components Consortium DB Relational DB Oracle 9.2 DB Maintenance, Backup/Security Software Application 3-tier Architecture, Object-oriented java code, Servlets, JSPs User Interface Data Acquisition Web-based Forms Annotation Tools for Databasing User Interface Dissemination Molecule Pages Querying and Visualization Tools Public Databases CBP, Carbohydrates, Glycoenzymes Security/Access Control Authentication of Data Entry
User Interactions Administrative (A) Web Site Administration and Tracking Progress Bioinformatics Core (B) Resource Generating Cores Glycan Synthesis/Protein Expression (D) Mouse Transgenics (F) Resource Tracking, Protocols for Generating Resources Data Generating Cores Glycan Analysis (C) Gene Microarray (E) Hematology (G) Histology (G) Immunology (G) Metabolism (G) GBP-Glycan Interaction (H) Participating Investigators (over 90) Data definitions and requirements are more open ended comppared to Cores
Software Application Development Implementation Application Development Code Development Source Code Control System Testing Bug Tracking Acceptance Testing User Requirements Specify data Specify operations Specify features Database Requirements Data model, Schemas, Ontologies User Interactions Software Requirements Functionalities, Error Handling, Scenarios Application Release Training Phase User Familiarity Pre-Production Phase Real User Interaction Production Phase Feature Enhancements
Discussion Data Acquisition –Better interfaces such as LIMS to minimize human error during data capture Annotation/Databasing –Strategies for annotating data from Participating Investigators where data definition is more open ended Integrated Dissemination of Data –Strategies for optimally maintaining relational links to the most current information in Public Databases and integration with data generated by Consortium
Discussion Data Analysis –Tools developed by other Glue Grants for analyzing common data sets such as Gene Microarray, MALDI-MS, etc. Data Mining –Strategies for finding meaningful patterns across diverse datasets –Building pathways and networks of interacting components based on the data that best explain a biologically driven hypothesis