July 27, 2005High Performance Distributed Computing 05 Recording and Using Provenance in a Protein Compressibility Experiment Paul Groth, Simon Miles,

Slides:



Advertisements
Similar presentations
Database System Concepts and Architecture
Advertisements

An Open Provenance Model for Scientific Workflows Professor Luc Moreau University of Southampton
Provenance: concepts, architecture and envisioned tools Professor Luc Moreau University of Southampton
UK e-Science All Hands Meeting 2005 Paul Groth, Simon Miles, Luc Moreau.
Architecture Tutorial Summary and Conclusions. Architecture Tutorial The Provenance Architecture.
Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,
LUNARC, Lund UniversityLSCS 2002 Transparent access to finite element applications using grid and web technology J. Lindemann P.A. Wernberg and G. Sandberg.
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
IMS1907 Database Systems Week 5 Database Systems Architecture.
Distributed Application Management Using PLuSH Jeannie Albrecht, Christopher Tuttle, Alex C. Snoeren, and Amin Vahdat UC San Diego CSE {jalbrecht, ctuttle,
Computational Physics Kepler Dr. Guy Tel-Zur. This presentations follows “The Getting Started with Kepler” guide. A tutorial style manual for scientists.
Rutgers PANIC Laboratory The State University of New Jersey Self-Managing Federated Services Francisco Matias Cuenca-Acuna and Thu D. Nguyen Department.
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
VMweb Team Members: Matthew Fusaro, Brendan Heckman, Ryan Mcgivern.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
This chapter is extracted from Sommerville’s slides. Text book chapter
Abstract Cloud data center management is a key problem due to the numerous and heterogeneous strategies that can be applied, ranging from the VM placement.
Annotating Search Results from Web Databases. Abstract An increasing number of databases have become web accessible through HTML form-based search interfaces.
Abstract Load balancing in the cloud computing environment has an important impact on the performance. Good load balancing makes cloud computing more.
Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau
Cross-Domain Privacy-Preserving Cooperative Firewall Optimization.
Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.
UK e-Science All Hands Meeting 2005 Paul Groth, Simon Miles, Luc Moreau.
BestPeer++: A Peer-to-Peer Based Large-Scale Data Processing Platform.
Thesis Committee: Craig W. Thompson Dale R. Thompson Amy Apon GRINDEX: Framework and Prototype for a Grid-based Index Jonathan Schisler June 30 th, 2005.
Provenance Aware Service Oriented Architecture (1 year on) Professor Luc Moreau University of Southampton
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Optimal Client-Server Assignment for Internet Distributed Systems.
Running Kuali: A Technical Perspective Ailish Byrne - Indiana University Jay Sissom - Indiana University Foundation.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Content Sharing over Smartphone-Based Delay- Tolerant Networks.
Agent-Oriented Data Curation in Bioinformatics Simon Miles University of Southampton PASOA project:
Security Issues in a SOA- based Provenance System Victor Tan, Paul Groth, Simon Miles, Sheng Jiang, Steve Munroe, Sofia Tsasakou and Luc Moreau PASOA/EU.
Keyword Query Routing.
Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
A High Performance Middleware in Java with a Real Application Fabrice Huet*, Denis Caromel*, Henri Bal + * Inria-I3S-CNRS, Sophia-Antipolis, France + Vrije.
Facilitating Document Annotation using Content and Querying Value.
Logical view –show classes and objects Process view –models the executables Implementation view –Files, configuration and versions Deployment view –Physical.
Transparently Gathering Provenance with Provenance Aware Condor Christine Reilly and Jeffrey Naughton Department of Computer Sciences University of Wisconsin.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
Metadata Mòrag Burgon-Lyon University of Glasgow.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
OPODIS'04 A protocol for recording provenance in service-oriented Grids Paul Groth, Michael Luck, Luc Moreau University of Southampton.
Formalising a protocol for recording provenance in Grids Paul Groth – University of Southampton.
DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency in Wireless Mobile Networks.
Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
Privacy-Preserving and Content-Protecting Location Based Queries.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
Timer Alarm. What Is The Timer Alarm? The Timer Alarm provides a method for triggering time- based alarms during script playback.
Principles of High Quality Documentation for Provenance: A Philosophical Discussion Paul Groth, Simon Miles, Steve Munroe University of Southampton.
Load Rebalancing for Distributed File Systems in Clouds.
Facilitating Document Annotation Using Content and Querying Value.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
MONITORING CMS TRACKER CONSTRUCTION AND DATA QUALITY USING A GRID/WEB SERVICE BASED ON A VISUALIZATION TOOL G. ZITO, M.S. MENNEA, A. REGANO Dipartimento.
A GOS Interoperate Interface's Design & Implementation GOS Adapter For JSAGA Meng You BUAA.
Provenance: Problem, Architectural issues, Towards Trust
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
EIN 6133 Enterprise Engineering
MIK 2.1 DBNS - introduction to WS-PGRADE, 2013
Database System Concepts and Architecture
Introduction to Apache
The Design of a Grid Computing System for Drug Discovery and Design
Problem Statement To transform the traditional system to automated system(ERP). No centralized control. Low data privacy. Huge amount of human resources.
Presentation transcript:

July 27, 2005High Performance Distributed Computing 05 Recording and Using Provenance in a Protein Compressibility Experiment Paul Groth, Simon Miles, Weijian Fang, Sylvia C. Wong, Klaus-Peter Zauner and Luc Moreau University of Southampton

July 27, 2005High Performance Distributed Computing 05 Outline Biology The Workflow Use Cases Provenance Implementation Evaluation Conclusion

July 27, 2005High Performance Distributed Computing 05 Biology Determine how protein sequences (chains of amino acids) fold into a 3D structure? Which part of DNA translates into one protein sequence? Structure of protein sequences may help to answer these questions. Structure can be quantified by textual compressibility. Determine the amino acid groupings that maximize compressibility?

July 27, 2005High Performance Distributed Computing 05 The Workflow Get Sequences Make a Sample Recode Sample Compress and Measure Shuffle the sample Compress and Measure each permutation Collate all measures Produce the average compressibility

July 27, 2005High Performance Distributed Computing 05 Use Case (1) A bioinformatician, A, downloads sequence data of microbial proteins from the database RefSeq. Runs the compressibility experiment. A later performs the same experiment on the same sequence data, again downloaded from RefSeq. A compares the two experiment results and notices a difference. A determines whether the difference was caused by the algorithms changing

July 27, 2005High Performance Distributed Computing 05 Use Case (2) A bioinformatician performs an experiment on a FASTA sequence encoding a protein. A reviewer, later determines whether or not the sequence was in fact processed by a service that meaningfully processes protein sequences only.

July 27, 2005High Performance Distributed Computing 05 Provenance Use case’s related to process Provenance Definition:  The provenance of a result is the process that led to that result. o This is a conceptual definition.

July 27, 2005High Performance Distributed Computing 05 Documentation of Process Conceive a computer based representation of provenance We represent the provenance of some data by documenting the process that led to the data: documentation can be complete or partial; it can be accurate or inaccurate; it can present conflicting or consensual views of the actors involved; it can provide operational details of execution or it can be abstract.

July 27, 2005High Performance Distributed Computing 05 Heterogeneity This is a heterogeneous application Has shell scripts, java programs, web services Heterogeneity is common in Grid based apps LCG Atlas - Athena & VDT coexist Support for plugging-in different execution environments

July 27, 2005High Performance Distributed Computing 05 Provenance “Lifecycle” Application Results Provenance Store Record Documentation of Process Query to retrieve the provenance of a result

July 27, 2005High Performance Distributed Computing 05 Use Case 1: Do services differ between experiments? Provenance Store Retrieve documentation of experiments Service A ………. ……… …………….. Service A ………. ……… …………….. …. Highlight differences in services between experiments

July 27, 2005High Performance Distributed Computing 05 Implementation  Implemented as a VDT workflow  Scheduled by Condor  Each service, script, command records process documentation into a provenance store.  Uses PReServ: a web services implementation of a provenance store

July 27, 2005High Performance Distributed Computing 05 Axis Handler Axis Handler Provenance Service Backend Store Interface Database Store In-Memory Store … Backend Stores PS Client Side Library PS Client Side Library Web Service WS Client Query Actor WS PS Client Side Library WS Calls Java Calls PReServ Implementation Diagram

July 27, 2005High Performance Distributed Computing 05 Evaluation Deployment Runs on VMWare deployment consistency ease of development Workflow is executed on one machine PReServ runs on another machine

July 27, 2005High Performance Distributed Computing 05 Recording Performance

July 27, 2005High Performance Distributed Computing 05 Query Performance

July 27, 2005High Performance Distributed Computing 05 Both recording and query times are linear 10% overhead for asynchronous recording Our provenance concept / system are grounded in a number of use cases The experiment is ready to be moved to a cluster or a grid Southampton Cluster A Grid Will allow us to test scalability Conclusion

July 27, 2005High Performance Distributed Computing 05 Contact Info Paul Groth - use case descriptions - papers - PReServ software

July 27, 2005High Performance Distributed Computing 05 Configuration Redhat Linux 9.1 on VMWare on Windows XP Pentium P4 2.8 GHZ 1.5 GB RAM PReServ on another machine Database backend Berkley JDB 100 Mb local ethernet