Worldwide Protein Data Bank www.wwpdb.org wwPDB Common D&A Project January 28, 2010 Steering Committee Project Update.

Slides:



Advertisements
Similar presentations
© 2007 BigVisible Solutions, Inc. All Rights Reserved Coaching Solutions Agile Project Start v
Advertisements

Business logic for annotation workflow Tom Oldfield July 21, 2010.
State of Indiana Business One Stop (BOS) Program Roadmap Updated June 6, 2013 RFI ATTACHMENT D.
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP1. Project Management.
Trnsport Test Suite Project Tony Compton, Texas DOT Charles Engelke, Info Tech.
Lecture # 2 : Process Models
Alternate Software Development Methodologies
® IBM Software Group © 2006 IBM Corporation PRJ480 Mastering the Management of Iterative Development v2 Module 4: Phase Management - Elaboration.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Technology Steering Group January 31, 2007 Academic Affairs Technology Steering Group February 13, 2008.
APPLICATION DEVELOPMENT BY SYED ADNAN ALI.
Technology Steering Group January 31, 2007 Academic Affairs Technology Steering Group February 13, 2008.
Chapter 9: Moving to Design
COMP8130 and 4130Adrian Marshall 8130 and 4130 Test Management Adrian Marshall.
Professional Informatics & Quality Assurance Software Lifecycle Manager „Tools that are more a help than a hindrance”
PopMedNet Software Development Life Cycle Chayim Herzig-Marx Harvard Pilgrim Health Care Institute Daniel Dee Lincoln Peak Partners.
User Group 2015 Version 5 Features & Infrastructure Enhancements.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 3rd Edition Copyright © 2009 John Wiley & Sons, Inc. All rights.
ArcGIS Workflow Manager An Introduction
Web Development Process Description
Joel Bapaga on Web Design Strategies Technologies Commercial Value.
Chapter 1: Introduction to Systems Analysis and Design
CS 360 Lecture 3.  The software process is a structured set of activities required to develop a software system.  Fundamental Assumption:  Good software.
Volunteer Management System Presented by Team SE18-08S SE18-T08S - Jan 2012.
CEN th Lecture CEN 4021 Software Engineering II Instructor: Masoud Sadjadi Software Project Planning.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
Relationships July 9, Producers and Consumers SERI - Relationships Session 1.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
U.S. Department of Agriculture eGovernment Program Design Approach for usda.gov April 2003.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Slide 12.1 Chapter 12 Implementation. Slide 12.2 Learning outcomes Produce a plan to minimize the risks involved with the launch phase of an e-business.
17 th October 2005CCP4 Database Meeting (York) CCP4(i)/BIOXHIT Database Project: Scope, Aims, Plans, Status and all that jazz Peter Briggs, Wanjuan Yang.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Introduction to soarchitect. agenda SOA background and overview transaction recorder summary.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
1 CMPT 275 High Level Design Phase Modularization.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
I Copyright © 2007, Oracle. All rights reserved. Module i: Siebel 8.0 Essentials Training Siebel 8.0 Essentials.
Chapter 6 CASE Tools Software Engineering Chapter 6-- CASE TOOLS
Connecting with Computer Science2 Objectives Learn how software engineering is used to create applications Learn some of the different software engineering.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
Mantid Stakeholder Review Nick Draper 01/11/2007.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Rational Unified Process Fundamentals Module 4: Core Workflows II - Concepts Rational Unified Process Fundamentals Module 4: Core Workflows II - Concepts.
Condor Technology Solutions, Inc. Grace Performance Chemicals HRIS Intranet Project.
Process Asad Ur Rehman Chief Technology Officer Feditec Enterprise.
Worldwide Protein Data Bank wwPDB Common D&A Project Full Project Team Meeting Rutgers March 16-19, 2010.
1 Pioneer Investments Legal and Compliance System Assessment Weekly Status Update June 23, 2005.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
Software Development Process CS 360 Lecture 3. Software Process The software process is a structured set of activities required to develop a software.
RUP RATIONAL UNIFIED PROCESS Behnam Akbari 06 Oct
V7 Foundation Series Vignette Education Services.
 Project Team: Suzana Vaserman David Fleish Moran Zafir Tzvika Stein  Academic adviser: Dr. Mayer Goldberg  Technical adviser: Mr. Guy Wiener.
9 Systems Analysis and Design in a Changing World, Fifth Edition.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GOCDB4 Gilles Mathieu, RAL-STFC, UK An introduction.
Process 4 Hours.
Chapter 1: Introduction to Systems Analysis and Design
Health Ingenuity Exchange - HingX
Chapter 1: Introduction to Systems Analysis and Design
NIEM Tool Strategy Next Steps for Movement
DBOS DecisionBrain Optimization Server
Executive Project Kickoff
Chapter 1: Introduction to Systems Analysis and Design
Software Development Process Using UML Recap
Presentation transcript:

Worldwide Protein Data Bank wwPDB Common D&A Project January 28, 2010 Steering Committee Project Update

Worldwide Protein Data Bank Common D&A Project January 2010 Update Update report  Status of D&A initial production deliverable: –Sequence Editor tool development –Integration within existing pipelines  Status of WF infrastructure initial implementation: –Sequence Processing components (external search, internal analysis etc) integrated by WF engine and manager into the “new” Sequence Processing Module. –Integration of Sequence Processing Module into existing pipeline. RECONSIDER Timeline Estimate and Strategy  Next Phase –Ligand Processing: Planning

Worldwide Protein Data Bank Common D&A Project January 2010 Update Overview of deliverable status for: Sequence Editor tool Deliverable timelines have been extended to enable full response to user testing input (expanded requirements) and to ensure development to agreed upon design.  Completion of Interface with additional prioritized requirements - projected Feb 15  Integration within current production pipelines –Initial implementation of Master Format and format conversion support  In Use by annotators by Feb 25

Worldwide Protein Data Bank Common D&A Project January 2010 Update Sequence Editor Tool Technologies and Standards  Model View Controller (MVC) Design – –Separates data/application from presentation as much as possible  Client/Server protocol –  AJAX using JSON protocol  REST style service definitions  Server –Apache with embedded WSGI (mod_wsgi)  Application – –Python with C++ extensions (Boost/Python) All the good acronyms!

Worldwide Protein Data Bank Common D&A Project January 2010 Update Sequence Editor Tool Architecture for Current and Future Deployment Sequence Data Store Current DP Pipeline WFE/WFM Sequence Editor Tool Annotated Sequence Data Future Workflow DP Pipeline PDB/FASTA PDBx/PreBlast PDB/PDBx WFE/WFM Sequence Editor

Worldwide Protein Data Bank Common D&A Project January 2010 Update Accomplishments  Annotator graphical interface for Sequence Editing –Prototype evaluation and prioritization of additional requirements by Annotators at all sites completed Jan 12 –Expanded functionality development expected to be completed and available for user testing Feb. 15, including:  Implements the capability to incrementally undo a process step (UNDO)  Summarization of sequence conflicts  Global editing features  Integration of this Sequence Editor tool (interface) into the existing data processing pipelines (Feb 26) –Input accepts existing sequence data files at PDBe and RCSB (e.g. PDBx + Blast report or PDB + FASTA) –Output integration via intermediate file to be integrated via Maxit

Worldwide Protein Data Bank Common D&A Project January 2010 Update Accomplishments  Master Format implementation (for current data model) –PDB to Master Format translation working with MAXIT  Final Test at PDBe –Validation and testing at all sites. –PDBj creation of new tool for Master Format Validation with extended diagnostics. –Issues with Master Format will be ongoing - with evolution of the PDB format, Hybrid methods etc.

Worldwide Protein Data Bank Common D&A Project January 2010 Update Sequence Editor Tool Development Lessons Learned  Iterative development and active Annotator involvement is essential – and takes time.  Addressing integration issues with existing systems in terms of modularity, process ordering and data availability poses significant challenges.  Agile process of development and planning supports adaptation to evolving requirements.  We will need to further consider the most efficient level of granularity for the deployment of new functionality in existing systems in future planning.

Worldwide Protein Data Bank Common D&A Project January 2010 Update Design Convergence Accomplishments Master Format, API, WFM, WFE, UI Distributed development on a complex project is challenging Tag team development of WFE and API’s  Straw men articulation – flush out WFE/API requirements for representative Use Cases  WFE pseudo code developed against straw men.  API integration layer will be developed against this pseudo code.  WFE will then be implemented against the API

Worldwide Protein Data Bank Common D&A Project January 2010 Update Accomplishments: WF infrastructure - Integration of Sequence Processing  Tracking and Status DB developed and installed at RCSB and PDBe for development purposes.  Work Flow Manager (WFM) –Prototype user testing on-going –Requirements refined and prototype updated –Infrastructure complete – to be deployed for testing this week  Work Flow Manager User Interface (WFM UI) –User prototype created, input received and prototype enhanced –Initial Level 1 annotator interface signed off by annotators –Level 2/3/4 interfaces prototyped and under review –Level 3 /4 under further development

Worldwide Protein Data Bank Common D&A Project January 2010 Update PDBe resource  Workflow XML –Luana/Tom : 1 day total to complete annotator requirements  WFE component supporting Sequence Processing : –Tom, 1-2 days per week ongoing, estimating 5-6 days (3 actual weeks) to complete after all api’s are in place  WFM –Luana : currently full time – work is being prioritised to define the subset of requirements to be delivered in March.  Web resources : interfaces and WFM –External services –technology requirements have been defined. Timeline tbd. Critical Path.  Other resources –Wim : python expertise –Swanand : python expertise (after 13 th Feb) – fall-back

Worldwide Protein Data Bank Common D&A Project January 2010 Update RCSB Resources  Web Tools - –Currently supporting development and alpha-testing sites –Will add production site for Feb deployment  Database Support – –MySQL database server for status and tracking database  Application Support –Project SVN code repository –JIRA issue tracking system –Project documentation and information site (Drupal) –Automated build system for API and application tools  People – –Vladimir – API and build system (Python/C++) –Li – DB system and status and tracking API (Python/SQL) –Rahip – Sequence Editor Tool (Javascript/CSS) –Zukang/Raul/John – DP applications (C++/Python)

Worldwide Protein Data Bank Common D&A Project January 2010 Update Updated Timeline Summary Sequence Processing 1. Sequence Editor Tool –Completion of Interface with prioritized additional requirements and beginning of final user testing - projected Feb 15 –Integration with current pipelines using Master Format In test by annotators by Feb 25 –In production – best estimate early March 2. Integration of Sequence processing components with new architecture (WFE/API and WFM) –User testing – April 3. Integration of module into Pipeline –Plan by end of March

Worldwide Protein Data Bank Common D&A Project January 2010 Update Competing/Complementary Priorities  Address On-going data quality issues and remediation  Three Validation task forces –Implementation of recommendations  New PDB Format – with the next 6 months?  De-programming Kim –For Ligand Processing: timeline end of March – early April Other strategic considerations  Stakeholders –Stress testing of new solutions against expectations and existing solutions must be managed and will take some time.

Worldwide Protein Data Bank Common D&A Project January 2010 Update Next Phase - Timeline Ligand Processing  Requirements –Plans in place for Annotator exchange –March requirements consolidation, initial design plan –March create overview plan and initial timeline  Kick off development  Deployment –Strategy to be defined based on current and ongoing lessons learned.

Worldwide Protein Data Bank Common D&A Project January 2010 Update Things that have kept us up at night  These are cornerstone deliverables requiring intense study and design consideration – beyond the proof of concept. –Organization of data, communication protocols, etc. –Clear consensus of design features has required an evolution of understanding – requiring wetting of hands  Ramp up of skill sets: Python, mmCIF (PDBe),  EBI External services: web-service set up  Site specific integration challenges  Resource issues

Worldwide Protein Data Bank Common D&A Project January 2010 Update BACK UP SLIDES

Worldwide Protein Data Bank Common D&A Project January 2010 Update Data and Application API Design  Unified Python language implementation  Provides all access to data and applications for the workflow manager and workflow engine  Subcomponents of the API provide access to: –Data objects and data values –Applications and tools –Tracking and status information –Site level configuration information

Worldwide Protein Data Bank Common D&A Project January 2010 Update Deliverable update: WFM Design Functional Architectural design  Will present progress and tracking information  Will start/stop and restart the workflow engine in executing data processing tasks  Will work in a fully distributed web-based mode  Will provide a launch point for tasks requiring interactive or graphical interactions. Two modes defined – Immediate mode – all processing occurs in a single session (simple case). Deferred mode – requests for input are registered with the workflow manager for later processing by annotator

Worldwide Protein Data Bank Common D&A Project January 2010 Update Process Overview With GO BACK functionality