Download presentation
Presentation is loading. Please wait.
1
NMRbox Data-as-a-Service Overview
data archival and retrieval software integration data interchange Synergy between BMRB & CONNJUR. BMBR handles data archival & retrieval (among other things). CONNJUR’s goal is software integration. They have in common the task of data management & interchange. Projects Analysis-as-a-service
2
Objectives 2 1 3 1. CONNJUR: capture metadata to save the state of NMR study. 2. CONNJUR as a deposition engine to BMRB. 3. M2M communication services between NMRbox and BMRB. The four aims of TRD2 & how they (a) are related and (b) unify the missions of CONNJUR & BMRB.
3
Approach: CONNJUR Workflow Builder Spectrum Translator
Graphical software integration platform for spectral reconstruction Spectrum Translator Command-line tool for translating time and frequency domain data. Integral component of Workflow Builder. Sparky “R” Extension Annotation for reproducibility NMR-STAR Parser Translation tool CONNJUR Database MySQL database managing datasets used by Workflow Builder
4
Approach: BMRB Application Program Interface (API)
Allows for software access to the BMRB database, both for data retrieval and deposition Data Format Translators CONNJUR, NMR-STAR, XML, JSON, NEX Data Analysis & Visualization DEVise visualization tool, Libraries in R language, Validation tools Deposition Engine CONNJUR integration, automatic gathering and deposition of data and important meta-data, including workflow specs
5
Workflow Builder
6
Time-domain and other files
Approach: NMRbox M2M data exchange API Query response BMRB servers Auto-query generator NMRbox user CONNJUR database CONNJUR data harvester Time-domain and other files Spectral processing Peak lists Auto assignments Restraints Structure models NMR spectrometer NMRPipe Sparky ABACUS TALOS+ CNS
7
Time-domain and other files
Content Harvesting for Deposition BMRB Deposition constructor API NMRbox user wwPDB CONNJUR data harvester DRCC Time-domain and other files Spectral processing Peak lists Auto assignments Restraints Structure models NMR spectrometer NMRPipe Sparky ABACUS TALOS+ CNS CONNJUR workflow manager
8
NMRbox/CONNJUR Deposition Service
Dynamics Chemistry Interactions NMR-STAR Raw data Spectral data Derived data Data annotation CONNJUR Structure & related data Metabolomics results
9
NMR & supplemental data
Approach: NMRbox Data Mining – BMRB Archive Content Metadata chemical structure, natural source, sample, experimental detail Imported data coordinates, restraints, phi-psi angles Validation results LACS, AVS, PANAV, SPARTA+, CING, MolProbity Biological NMR & supplemental data Derived data back calculated chemical shifts, BLAST alignments Data interpretation citations External data links PDB, UniProt, KEGG, PubChem
10
Approach: NMRbox BMRB Data Mining
Exploring the BMRB archive for new knowledge Expose the BMRB relational database and additional value added data for query and analysis from within the NMRbox platform Develop information search and analysis tools that encompass the breadth of the BMRB archive Brief general examples Prediction and analysis of intrinsically disordered protein conformational space from NMR spectral parameters and derived data Search for links between NMR parameters, low population biopolymer conformers, and biopolymer interactions with other biopolymers and ligands Extract RNA chemical shifts and statistics for improving automated chemical shift assignment methods and structure analysis Integration of molecular dynamics simulations with NMR experimental results to understand biopolymer conformational sampling
11
Data mining and visualization on BMRB – R libraries
CA-CB Chemical shift Distibution in BMRB per residue
12
Data mining and visualization on BMRB – R libraries
Comparing HSQC spectra for homologous entries
13
Data mining and visualization on BMRB – DEVise
Comparing HSQC spectra for homologous entries
14
Impacts (CONNJUR) 1- Additional metadata is critical to foster reproducibility. It serves dual purpose of allowing us to populate new instances of NMRbox. 2- Eases the burden on the NMR community for submitting data to the BMRB. As CONNJUR is capable of tracking larger amounts of intricate data than the spectroscopist is likely to be willing to provide – the BMRB depositions will be fuller.
15
Impacts (BMRB) 1 - BMRB content relevant to the NMRbox users, and possibly unknown to them, will be exposed and presented without the need for user knowledge of the BMRB archive architecture or content or user training. 2 – New possibly unexpected correlations between NMRbox user data and the full BMRB archive (experimental, derived and/or predicted, validation, and other kinds of data) will be advanced. 3 – Workflow and preservation meta-data archived for reproducibility.
16
Thank you! Any questions?
17
Data mining and visualization on BMRB – R libraries
TOCSY EXAMPLE
18
Personnel UConn Health Wisconsin Admin Infra Train Dissem CS DBPs TRD1
Hoch Maciejewski Schuyler Gryk Ulrich Eghbalnia Gilman Gorbatyuk Moraru Livny Maziuk TBN TBN1 TBN2 TBN3 TBN4 TBN5 UConn Health Wisconsin
19
Metadata Examples for M2M and Data Mining
Applications Biopolymer sequence, natural source including location Mining Intermediate data (restraints, chemical shifts, peak lists) Value added data (secondary structure elements, physical properties, etc.) Sample conditions (pH, temperature, pressure, ionic strength) Selection Validation report content User process annotations Best practices Software application parameter files Pulse programs Spectrometer field strength Sample contents (buffers, salts, stabilizing agents, others) Author names Keywords Descriptive User text annotations
20
Personnel Personnel Effort Role Gryk 2.4 Co-leader of TRD2
Extend CONNJUR data model Ulrich 0.84 Livny 0.24 Collaborator – systems design TBN1 9.6 Application architect CONNJUR software components Query Engine design Maziuk 1.2 Systems administration TBN3 8.4 Researcher/programmer BMRB software components TBN5 6 Programmer
21
CONNJUR Schema Expansion (Aim 2.1)
Current CONNJUR strengths Spectrometers Pulse programs Parameters Output data Processing software Fully extended CONNJUR schema Current NMR-STAR strengths Citation Molecular system Sample Conditions Spectral data Derived data Current NEF strengths Structure software Input restraints data parameters
22
NMR Computational Pipeline
1 2 3 4 + L10 A5 < 5Ǻ Four broad phases of computation. 1st is on spectrometer – we don’t touch that. 2nd is handled by CWB. 3rd & 4th is the realm of peak lists, resonance, spin systems – semi-automated peak pickers, assignment, NOE assignment & structure determination. Spectrometer Acquisition Spectral Reconstruction Spectral Analysis Biophysical Characterization
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.