NMRbox Data-as-a-Service Overview

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

VIEWS / TSS Overview. End-to-end Air Quality Data and Decision Support VIEWS / TSS Vision Acquisition Import Unification Management Manipulation Retrieval.
Nmrbox.org NMRbox: TRD 3 A probabilistic core as a coherent inference engine PINE+ Core Extend functionality through the new core PINE+: Assignment, use.
Proposal for a Standard Representation of the Results of GC-MS Analysis: A Module for ArMet Helen Fuell 1, Manfred Beckmann 2, John Draper 2, Oliver Fiehn.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
16 months…. The Visibility Information Exchange Web System is a database system and set of online tools originally designed to support the Regional Haze.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Automatic assignment of NMR spectral data from protein sequences using NeuroBayes Slavomira Stefkova, Michal Kreps and Rudolf A Roemer Department of Physics,
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Submitted by: Madeeha Khalid Sana Nisar Ambreen Tabassum.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Nmrbox.org TRD 2 Update: An annotation scheme to foster reproducible NMR data analysis Matt Fenwick, Eldon Ulrich, Michael Gryk.
Common parameters At the beginning one need to set up the parameters.
Biomolecular Nuclear Magnetic Resonance Spectroscopy BASIC CONCEPTS OF NMR How does NMR work? Resonance assignment Structure determination 01/24/05 NMR.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Scientific Data and Electronic Publishing Renze Brandsma, Head, Digital Production Centre University of Amsterdam Maarten Hoogerwerf, Project Manager,
The european ITM Task Force data structure F. Imbeaux.
SQL Reporting Services From a Developers Perspective Adam Calderon Principal Engineer Interknowlogy LLC
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Nmrbox.org Budget: Operating very lean staffing UCHC 2 FTEs Sysadmin (operations; web site; TRD 1; training & dissemination) Developer (TRD 2) Schuyler.
17 th October 2005CCP4 Database Meeting (York) CCP4(i)/BIOXHIT Database Project: Scope, Aims, Plans, Status and all that jazz Peter Briggs, Wanjuan Yang.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
L JSTOR Tools for Linguists 22nd June 2009 Michael Krot Clare Llewellyn Matt O’Donnell.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Taming the Big Data in Computational Chemistry #euroCRIS2015 Barcelona 9-11-XI-2015 Carles Bo ICIQ (BIST) -
Worldwide Protein Data Bank wwPDB Common D&A Project Full Project Team Meeting Rutgers March 16-19, 2010.
An Ontological Approach to Financial Analysis and Monitoring.
Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.
June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science.
High throughput biology data management and data intensive computing drivers George Michaels.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Information Retrieval in Practice
CIS 375 Bruce R. Maxim UM-Dearborn
User Characterization in Search Personalization
An Overview of Data-PASS Shared Catalog
A Web Mining Platform for Enhancing Knowledge Management on the Web KOK-LEONG ONG WEE-KEONG NG EE-PENG LIM Center for Advanced Information Systems,
VI-SEEM Data Repository
Exploitation of ISS Scientific data - sustainability
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
VI-SEEM Data Repository
Project tracking system for the structure solution software pipeline
CT NMRbox Workshop 2018 UConn Health
Matt Masson Software Development Engineer Microsoft Corporation
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
VIEWS / TSS Overview.
Web Mining Department of Computer Science and Engg.
Malte Dreyer – Matthias Razum
SDMX in the S-DWH Layered Architecture
Supporting High-Performance Data Processing on Flat-Files
NIEM Tool Strategy Next Steps for Movement
Proteins Have Too Many Signals!
Metadata supported full-text search in a web archive
Presentation transcript:

NMRbox Data-as-a-Service Overview data archival and retrieval software integration data interchange Synergy between BMRB & CONNJUR. BMBR handles data archival & retrieval (among other things). CONNJUR’s goal is software integration. They have in common the task of data management & interchange. Projects Analysis-as-a-service

Objectives 2 1 3 1. CONNJUR: capture metadata to save the state of NMR study. 2. CONNJUR as a deposition engine to BMRB. 3. M2M communication services between NMRbox and BMRB. The four aims of TRD2 & how they (a) are related and (b) unify the missions of CONNJUR & BMRB.

Approach: CONNJUR Workflow Builder Spectrum Translator Graphical software integration platform for spectral reconstruction Spectrum Translator Command-line tool for translating time and frequency domain data. Integral component of Workflow Builder. Sparky “R” Extension Annotation for reproducibility NMR-STAR Parser Translation tool CONNJUR Database MySQL database managing datasets used by Workflow Builder

Approach: BMRB Application Program Interface (API) Allows for software access to the BMRB database, both for data retrieval and deposition Data Format Translators CONNJUR, NMR-STAR, XML, JSON, NEX Data Analysis & Visualization DEVise visualization tool, Libraries in R language, Validation tools Deposition Engine CONNJUR integration, automatic gathering and deposition of data and important meta-data, including workflow specs

Workflow Builder

Time-domain and other files Approach: NMRbox M2M data exchange API Query response BMRB servers Auto-query generator NMRbox user CONNJUR database CONNJUR data harvester Time-domain and other files Spectral processing Peak lists Auto assignments Restraints Structure models NMR spectrometer NMRPipe Sparky ABACUS TALOS+ CNS

Time-domain and other files Content Harvesting for Deposition BMRB Deposition constructor API NMRbox user wwPDB CONNJUR data harvester DRCC Time-domain and other files Spectral processing Peak lists Auto assignments Restraints Structure models NMR spectrometer NMRPipe Sparky ABACUS TALOS+ CNS CONNJUR workflow manager

NMRbox/CONNJUR Deposition Service Dynamics Chemistry Interactions NMR-STAR Raw data Spectral data Derived data Data annotation CONNJUR Structure & related data Metabolomics results

NMR & supplemental data Approach: NMRbox Data Mining – BMRB Archive Content Metadata chemical structure, natural source, sample, experimental detail Imported data coordinates, restraints, phi-psi angles Validation results LACS, AVS, PANAV, SPARTA+, CING, MolProbity Biological NMR & supplemental data Derived data back calculated chemical shifts, BLAST alignments Data interpretation citations External data links PDB, UniProt, KEGG, PubChem

Approach: NMRbox BMRB Data Mining Exploring the BMRB archive for new knowledge Expose the BMRB relational database and additional value added data for query and analysis from within the NMRbox platform Develop information search and analysis tools that encompass the breadth of the BMRB archive Brief general examples Prediction and analysis of intrinsically disordered protein conformational space from NMR spectral parameters and derived data Search for links between NMR parameters, low population biopolymer conformers, and biopolymer interactions with other biopolymers and ligands Extract RNA chemical shifts and statistics for improving automated chemical shift assignment methods and structure analysis Integration of molecular dynamics simulations with NMR experimental results to understand biopolymer conformational sampling

Data mining and visualization on BMRB – R libraries CA-CB Chemical shift Distibution in BMRB per residue

Data mining and visualization on BMRB – R libraries Comparing HSQC spectra for homologous entries

Data mining and visualization on BMRB – DEVise Comparing HSQC spectra for homologous entries

Impacts (CONNJUR) 1- Additional metadata is critical to foster reproducibility. It serves dual purpose of allowing us to populate new instances of NMRbox. 2- Eases the burden on the NMR community for submitting data to the BMRB. As CONNJUR is capable of tracking larger amounts of intricate data than the spectroscopist is likely to be willing to provide – the BMRB depositions will be fuller.

Impacts (BMRB) 1 - BMRB content relevant to the NMRbox users, and possibly unknown to them, will be exposed and presented without the need for user knowledge of the BMRB archive architecture or content or user training. 2 – New possibly unexpected correlations between NMRbox user data and the full BMRB archive (experimental, derived and/or predicted, validation, and other kinds of data) will be advanced. 3 – Workflow and preservation meta-data archived for reproducibility.

Thank you! Any questions?

Data mining and visualization on BMRB – R libraries TOCSY EXAMPLE

Personnel UConn Health Wisconsin Admin Infra Train Dissem CS DBPs TRD1 Hoch Maciejewski Schuyler Gryk Ulrich Eghbalnia Gilman Gorbatyuk Moraru Livny Maziuk TBN TBN1 TBN2 TBN3 TBN4 TBN5 UConn Health Wisconsin

Metadata Examples for M2M and Data Mining Applications Biopolymer sequence, natural source including location Mining Intermediate data (restraints, chemical shifts, peak lists) Value added data (secondary structure elements, physical properties, etc.) Sample conditions (pH, temperature, pressure, ionic strength) Selection Validation report content User process annotations Best practices Software application parameter files Pulse programs Spectrometer field strength Sample contents (buffers, salts, stabilizing agents, others) Author names Keywords Descriptive User text annotations

Personnel Personnel Effort Role Gryk 2.4 Co-leader of TRD2 Extend CONNJUR data model Ulrich 0.84 Livny 0.24 Collaborator – systems design TBN1 9.6 Application architect CONNJUR software components Query Engine design Maziuk 1.2 Systems administration TBN3 8.4 Researcher/programmer BMRB software components TBN5 6 Programmer

CONNJUR Schema Expansion (Aim 2.1) Current CONNJUR strengths Spectrometers Pulse programs Parameters Output data Processing software Fully extended CONNJUR schema Current NMR-STAR strengths Citation Molecular system Sample Conditions Spectral data Derived data Current NEF strengths Structure software Input restraints data parameters

NMR Computational Pipeline 1 2 3 4 + L10 A5 < 5Ǻ Four broad phases of computation. 1st is on spectrometer – we don’t touch that. 2nd is handled by CWB. 3rd & 4th is the realm of peak lists, resonance, spin systems – semi-automated peak pickers, assignment, NOE assignment & structure determination. Spectrometer Acquisition Spectral Reconstruction Spectral Analysis Biophysical Characterization