Agent-Oriented Data Curation in Bioinformatics Simon Miles University of Southampton PASOA project: www.pasoa.org.

Slides:



Advertisements
Similar presentations
The following 10 questions test your knowledge of desired configuration management in Configuration Manager Configuration Manager Desired Configuration.
Advertisements

Design by Contract.
The Electronic Office Some supplementary information Corporate websites Office automation Company intranet.
Direct Instruction Also called explicit instruction Widely applicable strategy that can be used to teach both concepts and skills Uses teacher explanation.
FT228/4 Knowledge Based Decision Support Systems Knowledge Engineering Ref: Artificial Intelligence A Guide to Intelligent Systems, Michael Negnevitsky.
Usage of the memoQ web service API by LSP – a case study
UK e-Science All Hands Meeting 2005 Paul Groth, Simon Miles, Luc Moreau.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Extracting data from reports into Excel What is involved in mining report data for Excel? What is involved in mining report data for Excel? Why export.
Universal Search and Social Networking Exploiting the features of each to enhance the other and the tools that make it possible Peter Wallqvist Ravn Systems.
Learning Objectives Explain similarities and differences among algorithms, programs, and heuristic solutions List the five essential properties of an algorithm.
Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.
Quicktime Howell Istance School of Computing De Montfort University.
What is workflow?  A workflow is a structured way of defining and automating structures and procedures within an organization. What is workflow management.
Requirements Engineering Processes
Overview of Software Requirements
Agent-Based Acceptability-Oriented Computing International Symposium on Software Reliability Engineering Fast Abstract by Shana Hyvat.
Sharif University of Technology Session # 7.  Contents  Systems Analysis and Design  Planning the approach  Asking questions and collecting data 
“GENERIC SCRIPT” Everything can be automated, even automation process itself. “GENERIC SCRIPT” Everything can be automated, even automation process itself.
Software Re-engineering
The chapter will address the following questions:
Electronically Querying for the Provenance of Entities Simon Miles Provenance-Aware Service-Oriented Architectures.
Requirements Engineering Processes
S/W Project Management
LOG O Development of a diagnostic system using a testing-based approach for strengthening student prior knowledge Computers & Education (September 2011)
Extending the Discovery Environment: Tool Integration and Customization.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
UK e-Science All Hands Meeting 2005 Paul Groth, Simon Miles, Luc Moreau.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 7 Slide 1 Requirements Engineering Processes.
Kick off Meeting. Discussion Points Definitions Education Communication Training Project Goals Project Scope Training Scope Learning Objectives Roles.
What is a Business Analyst? A Business Analyst is someone who works as a liaison among stakeholders in order to elicit, analyze, communicate and validate.
1 Knowledge & Knowledge Management “Knowledge is power” to “Sharing K is power” Yaseen Hayajneh, PhD.
Distributed Aircraft Maintenance Environment - DAME DAME Workflow Advisor Max Ong University of Sheffield.
MERCURY BUSINESS PROCESS TESTING. AGENDA  Objective  What is Business Process Testing  Business Components  Defining Requirements  Creation of Business.
Knowledge Management …basic principles and practices.
Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.
Databases. What is a database?  A database is used to store data. The word DATA is actually Latin for FACTS. A database is, therefore, a place, or thing.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Security Issues in a SOA- based Provenance System Victor Tan, Paul Groth, Simon Miles, Sheng Jiang, Steve Munroe, Sofia Tsasakou and Luc Moreau PASOA/EU.
Developing Policy and Procedure Management System إعداد برنامج سياسات وإجراءات العمل 8 Safar February 2007 HERA GENERAL HOSPITAL.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Health eDecisions Use Case 2: CDS Guidance Service Strawman of Core Concepts Use Case 2 1.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
July 27, 2005High Performance Distributed Computing 05 Recording and Using Provenance in a Protein Compressibility Experiment Paul Groth, Simon Miles,
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 28Slide 1 CO7206 System Reengineering 4.2 Software Reengineering Most slides are Slides.
A Metrics Program. Advantages of Collecting Software Quality Metrics Objective assessments as to whether quality requirements are being met can be made.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Recording the Context of Action for Process Documentation Ian Wootten Cardiff University, UK
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
 Programming - the process of creating computer programs.
Why A Software Review? Now have experience of real data and first major analysis results –What have we learned? –How should that change what we do next.
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
The MEDIN stylesheet and ESRI Arc 10: metadata format conversion MEDIN Workshop BGS, Edinburgh, June 2015.
Chapter Eight Questionnaire Design Chapter Eight.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
SCIENTIFIC METHOD What is the Scientific Method? This is a process by which scientists go about answering questions and solving problems. The process includes.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
CS223: Software Engineering Lecture 34: Software Maintenance.
Introducing User’s Role concept Group Name: WG2(ARC) and WG4(SEC) Source: Shingo Fujimoto, FUJITSU, Meeting Date:
The Role of Tool Support in Public Policies and Accessibility
Software Engineering (CSI 321)
Upgrading from r4.1.4 to r7: Making a Smooth Transition
The MEDIN stylesheet and ESRI Arc 10: metadata format conversion
The MEDIN stylesheet and ESRI Arc 10: metadata format conversion
Software Maintenance.
Ramesh Baral Team: Marjani Peterson, Andre Guerrero
Selenium HP Web Test Tool Training
Software Re-engineering and Reverse Engineering
Presentation transcript:

Agent-Oriented Data Curation in Bioinformatics Simon Miles University of Southampton PASOA project:

Main Hypothesis Agent-oriented software development is well suited to the design of manageable, re- usable software in the bioinformatics domain. To be generally true, this must be due to the broad characteristics of the bioinformatics domain. We illustrate our argument for this hypothesis with the design of a software tool for data curation.

Structure Bioinformatics Domain Characteristics Example Scenario Data Curation Tool –Simple service-oriented design –Agent-oriented design MAS Properties to Domain Characteristics

Bioinformatics Characteristics Openness in Sharing Data and Tools Rapid Increase in Field Size Variation in Expertise Desire for Automation Heterogeneous Data Formats

Example Scenario Protein Compressibility Experiment with Provenance

While an experiment runs, the actors involved record documentation about the process being executed. The documentation is recorded in provenance stores Documentation of process includes the data exchanged between actors Experiment and Provenance Download Protein Sequences Find Compressed Size Average Over Permutations Find Compressed Size Encode & Permute

Desire for Automation: Provenance The provenance of data item X is the process that led to X We support documenting process to help answer questions about past experiments, e.g. –Given that the results of two runs of an experiment were different, was this caused by a difference in the input data or because different versions of analysis tools were used? –What input data contributed to the production of this result? –Was any data source used in this experiment licensed such that the result cannot be patented? –Did the experimental process follow the plan as originally conceived?

Variation in Expertise Organisation Novice Expert Organisations contain people with varying expertise Experts are researchers with plenty of bioinformatics experience They want ‘full’ control over their work environment Novices are researchers without this experience Novices become expert over time

Problems of Access

Heterogeneous Data Formats In order to use the contents of provenance (or other data) stores to answer questions, the experiment data needs to be parsed A wide variety of formats exist for the same data, with new tools often using new formats Given that experiments may have been run a long time before the questions are asked, the data formats may be obsolete and so unparseable This is a problem of curation

Desire for Automation: Data Curation Tool Given that data in provenance stores may become unparseable, we require a tool to search for data in obsolete formats and translate it to new formats for the same type of data

Simple Service-Oriented Design Current tool implementations scripted or as service workflows A service design for converting obsolete formats The process is recorded in provenance stores, so the provenance of converted data is available C to G … Curation Tool Get data in format C C to G Converter Convert C to G Save data in format G Conversion List

Limitations Does not take account of Desire for Automation or Variation in Expertise: everyone is assumed to be expert enough to manually construct their own conversion list and apply the tool in the best circumstances The domain characteristics call for –Sharing distributed experts’ opinions about obsoletion of data formats with novices, –Applying that knowledge in curating data automatically in the best circumstances, –Retaining full control for experts over how data is converted

Agent-Oriented Design Administration Role Standard Administration Agent Curation Role Expert’s Curator Novice’s Curator Responsibilities Ensure that data is not solely in obsolete formats Behaviour On suggestion from Administration, Ignore On suggestion from Scientist, send to Administration include in list Responsibilities Ensure that conversion suggestions are distributed Behaviour On suggestion, propagate to Curators Behaviour On suggestion from Administration, include in list On suggestion from Scientist, include in list

Limitations For every agent behaviour, we need an explicit functional design to implement it, so more is required to completely specify the system: greater possibility of mistakes Less support than traditional design methods Benefit of agent-oriented design must be clear to convince developers to use it

Characteristics to Properties 1 Variation in Expertise: –Localised Control: Give full control to experts while allowing more automation for novices –Social Ability: Communication between scientists and organisations allows experts’ knowledge to influence novices’ work –Role-Based Design: As novices become more expert, the behaviour of their agents can be changed

Characteristics to Properties 2 Openness in Sharing Data and Tools: –Role-Based Design: Swap in new and better agent-based tools –Social Ability: Allows automatic exchange of information about new sources Desire for Automation: –Pro-activity: Agent-oriented design assumes tools fulfil goals where the context is correct

Conclusions The requirements placed on tools because of the characteristics of the bioinformatics domain are exactly those that are met by the properties of an agent-based system This is not just an interesting fact: it means that bioinformatics tools developed using an agent-oriented approach will assume the desirable properties from the start