INTERPROSCAN 5 Analyses, Architecture and JMS. Introduction to InterProScan: automatic annotation of protein sequence Protein Sequence Protein Sequence.

Slides:



Advertisements
Similar presentations
CIM2564 Introduction to Development Frameworks 1 Overview of a Development Framework Topic 1.
Advertisements

Presentation 7 part 2: SOAP & WSDL. Ingeniørhøjskolen i Århus Slide 2 Outline Building blocks in Web Services SOA SOAP WSDL (UDDI)
Technical Architectures
Next Generation Node (NGN) Technical Overview April 2007.
OpenJMS Presentation March 2000 © Copyright Exolab 2000.
1 JBus, A Platform Independent Publish/Subscribe Bus for CWave 2000 M.S. Thesis Defense Joseph W. Longson March 30, 2000.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/27 A Control Software for the ALICE High Level Trigger Timm.
The Architecture of Transaction Processing Systems
Object Based Operating Systems1 Learning Objectives Object Orientation and its benefits Controversy over object based operating systems Object based operating.
LHCbPR V2 Sasha Mazurov, Amine Ben Hammou, Ben Couturier 5th LHCb Computing Workshop
Secure Systems Research Group - FAU Web Services Standards Presented by Keiko Hashizume.
Messaging Technologies Group: Yuzhou Xia Yi Tan Jianxiao Zhai.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Selected Topics in Software Computing Distributed Software Development CVSQL Final Project Presentation.
Christopher Jeffers August 2012
Java Message Service - What and Why? Bill Kelly, Silvano Maffeis SoftWired AG, Zürich
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Client Server Technologies Middleware Technologies Ganesh Panchanathan Alex Verstak.
1 G52IWS: Distributed Computing Chris Greenhalgh.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
Java Message Service Sangeetha Chavala. What is Messaging? Method of Communication between software components/applications peer-to-peer facility Not.
Message Driven Beans & Web Services INFORMATICS ENGINEERING – UNIVERSITY OF BRAWIJAYA Eriq Muhammad Adams J
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Network Services Networking for Home and Small Businesses – Chapter 6.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
Lecture 15 Introduction to Web Services Web Service Applications.
Agenda 1.Implementation of CustomerService. CustomerService wrapper SOAP → ESB internal format Abstract → Concrete XML syntax ESB internal format → HTTP.
Architecting Web Services Unit – II – PART - III.
CHEN Ge CSIS, HKU March 9, Jigsaw W3C’s Java Web Server.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
OCT 1 Master of Information System Management Organizational Communications and Distributed Object Technologies Lecture 5: JMS.
Java Messaging Service. An Abstraction for using Messaging Oriented Middleware Purpose is to provide a sophisticated, yet straightforward way to exchange.
1 Java Message Service Манин П Enterprise messaging Key concept: 1. Messages are delivered asynchronously 2. Sender is not required to wait for.
Web Services Based on SOA: Concepts, Technology, Design by Thomas Erl MIS 181.9: Service Oriented Architecture 2 nd Semester,
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Ipgdec5-01 Remarks on Web Services PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce, Shrideep Pallickara, Choonhan Youn Computer Science,
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
S O A P ‘the protocol formerly known as Simple Object Access Protocol’ Team Pluto Bonnie, Brandon, George, Hojun.
An Introduction to Web Services Web Services using Java / Session 1 / 2 of 21 Objectives Discuss distributed computing Explain web services and their.
1 Microsoft Windows 2000 Network Infrastructure Administration Chapter 4 Monitoring Network Activity.
Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Spring RabbitMQ Martin Toshev.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Java Message Service Introduction to JMS API. JMS provides a common way for Java programs to create, send, receive and read an enterprise messaging system’s.
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
High-throughput parallel pipelined data processing system for remote Earth sensing big data in the clouds Высокопроизводительная параллельно-конвейерная.
Business Process Execution Language (BPEL) Pınar Tekin.
Servlets.
CMS High Level Trigger Configuration Management
z/Ware 2.0 Technical Overview
Architecting Web Services
GWE Core Grid Wizard Enterprise (
Consulting Services JobScheduler Architecture Decision Template
Architecting Web Services
PHP / MySQL Introduction
#01 Client/Server Computing
Component-Based Software Engineering: Technologies, Development Frameworks, and Quality Assurance Schemes X. Cai, M. R. Lyu, K.F. Wong, R. Ko.
Chapter 2: System Structures
Lecture 1: Multi-tier Architecture Overview
Inventory of Distributed Computing Concepts
Lecture Topics: 11/1 General Operating System Concepts Processes
Production Manager Tools (New Architecture)
#01 Client/Server Computing
Presentation transcript:

INTERPROSCAN 5 Analyses, Architecture and JMS

Introduction to InterProScan: automatic annotation of protein sequence Protein Sequence Protein Sequence Predictive Models Predictive Models Analysis algorithm Analysis algorithm Reported Matches Reported Matches

Protein Sequence Protein Sequence Predictive Models Predictive Models Analysis algorithm Analysis algorithm “Raw” Matches “Raw” Matches Filtering algorithm Filtering algorithm Reported Matches Reported Matches Introduction to InterProScan: automatic annotation of protein sequence

Scale problem: computational load >25 million Protein Sequences in UniParc >25 million Protein Sequences in UniParc Single set of models, e.g. TIGRFAM Run analysis using HMMER 2 on a single desktop PC? No chance - would take several years to run to completion.

Scale problem: complexity (this is just a sub-set!) pirsf pantherScore assignment HMMER 2 PfamGene3DSMART SUPERFAMILY TIGRFAMPIRSFPANTHER GA cut- off TC cut- off E-value cut-off clan nested threshold (kinase) domainFinder sequence Raw matches Filtered matches HMMER 3

80% overlap in functionality InterProScan 5 : Why build another one? InterPro internal analysis Pipeline (Onion) Java Not portable Legacy architecture / code Matches stored: UniParc all member DBs. InterPro internal analysis Pipeline (Onion) Java Not portable Legacy architecture / code Matches stored: UniParc all member DBs. InterProScan 4.0 Perl Portable Some problems with local configuration. Not modular. Lack of resource for maintenance InterProScan 4.0 Perl Portable Some problems with local configuration. Not modular. Lack of resource for maintenance Maintainable Easy to add new model sets Modular architecture Back-end for new InterPro web site Consistent results Release developer time Reliable / auditable No redundant calculations Incorporate new data model / XML exchange format Easy to port on to different architectures: Single machine Simple LAN LSF PBS Sun Grid Engine...cloud? GRID? Supports: Onion & InterProScan 4.0 functionality metagenomic data analysis genomic sequence analysis (ORF prediction etc.) Maintainable Easy to add new model sets Modular architecture Back-end for new InterPro web site Consistent results Release developer time Reliable / auditable No redundant calculations Incorporate new data model / XML exchange format Easy to port on to different architectures: Single machine Simple LAN LSF PBS Sun Grid Engine...cloud? GRID? Supports: Onion & InterProScan 4.0 functionality metagenomic data analysis genomic sequence analysis (ORF prediction etc.) InterProScan 5.0

Design for modularity – ease of maintenance Oracle MySQL PostgreSQL HSQLDB Oracle MySQL PostgreSQL HSQLDB XML Data Model Data Access Layer Database I/O Data Access Layer Database I/O Input / Output Layer File I/O Input / Output Layer File I/O “Business Logic” Layer Performing analyses “Business Logic” Layer Performing analyses Job Management Layer Scheduling analyses Job Management Layer Scheduling analyses JMS (Java Messaging Service) Layer XML Reading / Writing Cluster Platform Queues & monitors analysis steps Dependencies, represented by: Are all one-way, resulting in low-coupling between the layers. Each layer can be replaced relatively easily (especially layers at the top of the stack) improving maintainability Web Services Java API InterPro website

Java Messaging Service: ease of development and platform flexibility Simple and robust programming model – quite easy to code against! JMS is mature and stable – current version released in 2002 Guaranteed message delivery to a single worker Easy to monitor Flexible – easy to implement on multiple platforms “Master” Schedules tasks / sub- tasks and places them on a JMS queue “Master” Schedules tasks / sub- tasks and places them on a JMS queue JMS Broker Manages JMS queues / topics. JMS Broker Manages JMS queues / topics. “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker Monitoring / Management Application Web application or stand-alone application to monitor and manage InterProScan Monitoring / Management Application Web application or stand-alone application to monitor and manage InterProScan Broker starts workers on demand Workers take tasks off queues

Community standard → many implementations. Mature and stable – version 1.1, Can write pure JMS vendor extensions (tie-in). We are not using any of these… Why JMS?

Have a header and body Can be filtered by the recipient Body may consist of: TextMessage (just a String) BytesMessage (for legacy messaging system interoperability) MapMessage StreamMessage ObjectMessage (anything Serializable ) What are messages?

Message Modes Point-to-point. Guarantees delivery to... Zero or one client (non-persistent message) Exactly one client (persistent message) Publish / Subscribe (pub/sub) 'Multicast' messages Message Transport Options In-JVM, TCP/IP, HTTP, HTTPS, RMI......

Use destinations called queues Acknowledgement: AUTO_ACKNOWLEDGE CLIENT_ACKNOWLEDGE DUPS_OK_ACKNOWLEDGE Point-to-Point Messages

Uses destinations called Topics Pub/Sub

JMS Objects

Reliability Configurable – for some systems (e.g. news broadcast) reliability is not so important Persistent messages (p2p): guaranteed delivery Re-delivery Message header includes redelivery information Configurable – 'try 3 times' 'Dead letter' queue – manage failure. Time-to-live

JMS BrokerMasterWorker (n of these) workerJobRequestQueue jobResponseQueue Work Scheduler Job request Response Monitor (runs in own thread) > Job result WorkerRunner Job result Job request JMS Architecture in I5

Jobs and Steps Jobs Holder for all Job instances Job Binds together Steps Step Defines how to perform a Step StepInstance Defines what to perform the Step upon – the intent to run a Step. StepExecution Captures an actual attempt to run a StepInstance. * * * * ** Depends upon Jobs – the full set of workflows defined by the system Job – a single workflow (e.g. an analysis) Step – e.g. defines how to “run HMMER3” (concrete Step instances implement an execute() method) StepInstance – e.g. “Run HMMER3 for proteins 101 – 200”. Describes the intent to run a Step for a particular set of proteins or models. StepExecution – e.g. “First attempt to run HMMER3 for proteins 101 – 200”. Describes an attempt at running a StepInstance. Dependencies: Defined at the Step level. As StepInstances are created, these dependencies cascade down to the StepInstance level as illustrated: Step dependency: “Pfam run HMMER3” depends upon “write fasta file” StepInstance dependency: “Pfam run HMMER3 for proteins 101 – 200” depends upon “write fasta file for proteins 101 – 200”.

Dependencies in a Workflow Write FASTA File Run HMMER3 Binary Delete FASTA file Parse / store HMMER3 Output Delete HMMER3 Output Perform Pfam Post Processing The arrows represent the “depends upon” relationship, pointing to the Steps that must complete prior to the Step being considered for execution. (This may seem counter-intuitive, but is the way in which it is implemented).

Data Model (Simplified) ProteinMatch Protein