PepcDB Reporting at CESG: More Trials and Fewer Tribulations PPCW Bottlenecks Meeting 20 March 2007 Craig A. Bingman (U54 GM074901-01.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

TU e technische universiteit eindhoven / department of mathematics and computer science Modeling User Input and Hypermedia Dynamics in Hera Databases and.
1 Distributed Deadlock Fall DS Deadlock Topics Prevention –Too expensive in time and network traffic in a distributed system Avoidance.
ARCH-05 Application Prophecy UML 101 Peter Varhol Principal Product Manager.
Starting an Innovation Process Life of any business is finite. For companies to endure, the drive for efficiency must be combined with excellence in.
SRDC Ltd. 1. Problem  Solutions  Various standardization efforts ◦ Document models addressing a broad range of requirements vs Industry Specific Document.
ONC Standards and Interoperability Framework Use Case Simplification Key Steps Forward 3 November 2011.
Elliott Bays, Taylor Ivy, Mark Sarosky, David Martin, Ovidiu Ravasan.
ONC Standards and Interoperability Framework Use Case Simplification Key Steps Forward 27 October 2011.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Representation of Web Data in a Web Warehouse Ragini A.S. & Shipra Dutta November 20 th, 2001.
Common Mechanisms in UML
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
WWW and Internet The Internet Creation of the Web Languages for document description Active web pages.
REFLECTIONS ON NOTECARDS: SEVEN ISSUES FOR THE NEXT GENERATION OF HYPERMEDIA FRANK G. HALASZ.
Software Configuration Management
Implementation of Project Governance at the Center Level
Professional Informatics & Quality Assurance Software Lifecycle Manager „Tools that are more a help than a hindrance”
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
GMD German National Research Center for Information Technology Innovation through Research Jörg M. Haake Applying Collaborative Open Hypermedia.
THE NEXT STEP IN WEB SERVICES By Francisco Curbera,… Memtimin MAHMUT 2012.
©2007 · Georges Merx and Ronald J. NormanSlide 1 Chapter 12 Software Integration and Deployment.
OSLC Working group meeting1 PLM extensions proposal feedback Updated from OSLC workgroup call 18/10/11.
SOFTWARE ENGINEERING BIT-8 APRIL, 16,2008 Introduction to UML.
LexEVS 6.0 Overview Scott Bauer Mayo Clinic Rochester, Minnesota February 2011.
Digital Object Architecture
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.
Benjamin Gamble. What is Time?  Can mean many different things to a computer Dynamic Equation Variable System State 2.
Object-Oriented Analysis and Design An Introduction.
Component frameworks Roy Kensmil. Historical trens in software development. ABSTRACT INTERACTIONS COMPONENT BUS COMPONENT GLUE THIRD-PARTY BINDING.
MD – Object Model Domain eSales Checker Presentation Régis Elling 26 th October 2005.
P1516.4: VV&A Overlay to the FEDEP 20 September 2007 Briefing for the VV&A Summit Simone Youngblood Simone Youngblood M&S CO VV&A Proponency Leader
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Data Integration and Management A PDB Perspective.
Abstracting and alternatives for XBRL implementation Abstracting the XBRL Formula Piotr Malczak (GPM Systemy) April 22, 2010.
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
SWT - Diagrammatics Lecture 4/4 - Diagramming in OO Software Development - partB 4-May-2000.
1/22/08 RTR Project Presentation to TPTF RTR Project Michael Daskalantonakis & Brian Cook.
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
John Maciejewski INPO I am John Maciejewski
1Mr.Mohammed Abu Roqyah. Database System Concepts and Architecture 2Mr.Mohammed Abu Roqyah.
Your name here SPA: Successes, Status, and Future Directions Terence Critchlow And many, many, others Scientific Process Automation PNNL.
1 Class Diagrams. 2 Overview Class diagrams are the most commonly used diagrams in UML. Class diagrams are for visualizing, specifying and documenting.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
ProActive components and legacy code Matthieu MOREL.
Ch- 8. Class Diagrams Class diagrams are the most common diagram found in modeling object- oriented systems. Class diagrams are important not only for.
Interchange vs Interoperability Main Entry: in·ter·op·er·a·bil·i·ty : ability of a system... to use the parts or equipment of another system Source: Merriam-Webster.
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
Project18’s Communication Drawing Design By: Camilo A. Silva BIOinformatics Summer 2008.
1 Technical & Business Writing (ENG-715) Muhammad Bilal Bashir UIIT, Rawalpindi.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.
Nigel Baker UWE & CERN/EP-CMA Design Patterns for Integrating Product and Process Models The C.R.I.S.T.A.L. Project ( C ooperative R epositories & I nformation.
GT3 Index Services Lecture for Cluster and Grid Computing, CSCE 490/590 Fall 2004, University of Arkansas, Dr. Amy Apon.
3/14/20161 SOAR CIS 479/579 Bruce R. Maxim UM-Dearborn.
YANG Background and Discussion: Why we need a new language for NETCONF configuration modeling The YANG Gang IETF 70 Vancouver, Canada.
Metayogi Increasing the Accessibility of the Semantic Web Karim Tharani Doug Macdonald Rachel Heidecker.
WELCOME TO OUR PRESENTATION UNIFIED MODELING LANGUAGE (UML)
PSI Materials Repository S torage and distribution of materials generated by PSI centers PSI Materials Repository S torage and distribution of materials.
Optimizing Your Localization Pipeline for a Dynamic Universe David Lakritz President & CEO Language Automation, Inc.
Algorithms and Problem Solving
SysML 2.0 Requirements for Visualization
Twin Cities Business Architecture Forum 1/19/2016
Database Design Hacettepe University
Towards an Open Meta Modeling Environment
TargetDB and PEPCDB •
Presentation transcript:

PepcDB Reporting at CESG: More Trials and Fewer Tribulations PPCW Bottlenecks Meeting 20 March 2007 Craig A. Bingman (U54 GM P50 GM JLM, P.I.)

CESG Bioinformatics George N. Phillips Jr.:Faculty Executive Craig Bingman: Section leader Xiaokang Pan: PepcDB, domains Gary Wesenberg:Scoring, RT, PDB Bryan Ramirez:System administrator Tony Kamenick:Assistant sysadmin Sesame John L. Markley:CESG P.I. Zsolt Zolnai:Sesame Project Managment John Primm:Project Manager David Aceti:QA, Sesame “Lab Master” All CESG Team Members

TargetDB vs. PepcDB TargetDB was conceived early/pre-PSI-1 as a mechanism for avoiding duplication of effort between structural genomics centers. –Asynchronous communication between centers and NIH. TargetDB communicates project status of target only. TargetDB is single-threaded. TargetDB was not meant to communicate information to the outside scientific community. PepcDB was conceived as a mechanism for communication of scientific details between centers and the outside world. –Asynchronous communication with the outside world. PepcDB communicates target status, protocols and timeline of efforts. PepcDB is multi-threaded. PepcDB is a contractural obligation for all PSI-2 centers. Along with structures deposited in PDB, and the materials repository, PepcDB will be one of the enduring legacies of PSI.

CESG PepcDB, Past and Present Year 2-3 data –Successful implementation of Sesame (hierarchical relationships between db items.) –TargetDB-centric, single-threaded view –Targets were constrained to exist in one workgroup from selection to structure solution. –Protocols were primitive. Year 4-5 data –Protocols became more descriptive. –Protocols described multiple pipeline stages. –Targets moved through multiple workgroups –Pipeline was assumed to move unidirectionally from Selection->Deposition PSI-2 data –Atomic protocols describing single pipeline stage. –Pipeline is multipass, multithreaded, characterized by extensive salvage. –Targets move back to vector selection, from initial selection, PCR and entry vector –Pipeline is non-deterministic, adaptive, dynamic to maximize success.

Failure of CESG PepcDB, Mark 1 Codebase had grown by accretion, not design. Code assumed linear, forward progression through pipeline stages. More than half of the code was devoted to data entry error trapping/handling. Global reset was required to handle new pipeline practices, dominated by multipath cloning strategy, multipath expression strategy, salvage intensive operation. New conceptualization of our PepcDB reporting was required. Core concept: Well-formed PepcDB = finite, directed, acyclic graph. Database items = nodes Directed links = edges Data in Sesame needed to be corrected.

Visualization Tool for Graphs dot, a language for describing graphs dot has a very simple syntax digraph G { A -> B -> D; A -> C; } dot has powerful layout minimizers to display hierarchical graphs Implementations are available for perl, python, java, others CESG has used the perl variant of dot/Graphviz to produce plots of linkages between database items.

Digraph G { A -> B; A -> C -> D; } Digraph G { A -> B -> D; A -> C -> D; } Digraph G { A -> B -> D; D -> A; }

CESG PepcDB Stats Protocols68 Targets7553 Trials14044 Protocol Instances57195 Each target has on average two trials Each trial has on average about four protocols

PSI PepcDB Toolkit Project database capable of establishing hierarchical relationships between units of work. Establish master database that manages unique keys for work units. Implement barcodes (e.g. ZPL) that extend database to physical items. Implement atomic protocols and associated actions. Develop tool set for visualizing data. Develop code capable of assembling lists of parent-child units of work, protocols, actions. Rehearse data entry prior to pipeline implementation of new techniques. Reach project-wide agreement on definition of actions and how to link units of work.

Future Push towards zero errors in PSI-2 PepcDB. Continue correcting PSI-1 data. Implement data visualization tools in Sesame. Expand the scope of data reported to PepcDB. –Report all crystallization trials (year 5-> now) –Consolidate and report data for new tags (elemental analysis, mass, etc.) Switch over to Sesame for PepcDB report generation.