Presentation is loading. Please wait.

Presentation is loading. Please wait.

Agent-Oriented Data Curation in Bioinformatics Simon Miles University of Southampton PASOA project: www.pasoa.org.

Similar presentations


Presentation on theme: "Agent-Oriented Data Curation in Bioinformatics Simon Miles University of Southampton PASOA project: www.pasoa.org."— Presentation transcript:

1 Agent-Oriented Data Curation in Bioinformatics Simon Miles University of Southampton PASOA project: www.pasoa.org

2 Main Hypothesis Agent-oriented software development is well suited to the design of manageable, re- usable software in the bioinformatics domain. To be generally true, this must be due to the broad characteristics of the bioinformatics domain. We illustrate our argument for this hypothesis with the design of a software tool for data curation.

3 Structure Bioinformatics Domain Characteristics Example Scenario Data Curation Tool –Simple service-oriented design –Agent-oriented design MAS Properties to Domain Characteristics

4 Bioinformatics Characteristics Openness in Sharing Data and Tools Rapid Increase in Field Size Variation in Expertise Desire for Automation Heterogeneous Data Formats

5 Example Scenario Protein Compressibility Experiment with Provenance

6 While an experiment runs, the actors involved record documentation about the process being executed. The documentation is recorded in provenance stores Documentation of process includes the data exchanged between actors Experiment and Provenance Download Protein Sequences Find Compressed Size Average Over Permutations Find Compressed Size Encode & Permute

7 Desire for Automation: Provenance The provenance of data item X is the process that led to X We support documenting process to help answer questions about past experiments, e.g. –Given that the results of two runs of an experiment were different, was this caused by a difference in the input data or because different versions of analysis tools were used? –What input data contributed to the production of this result? –Was any data source used in this experiment licensed such that the result cannot be patented? –Did the experimental process follow the plan as originally conceived?

8 Variation in Expertise Organisation Novice Expert Organisations contain people with varying expertise Experts are researchers with plenty of bioinformatics experience They want ‘full’ control over their work environment Novices are researchers without this experience Novices become expert over time

9 Problems of Access

10 Heterogeneous Data Formats In order to use the contents of provenance (or other data) stores to answer questions, the experiment data needs to be parsed A wide variety of formats exist for the same data, with new tools often using new formats Given that experiments may have been run a long time before the questions are asked, the data formats may be obsolete and so unparseable This is a problem of curation

11 Desire for Automation: Data Curation Tool Given that data in provenance stores may become unparseable, we require a tool to search for data in obsolete formats and translate it to new formats for the same type of data

12 Simple Service-Oriented Design Current tool implementations scripted or as service workflows A service design for converting obsolete formats The process is recorded in provenance stores, so the provenance of converted data is available C to G … Curation Tool Get data in format C C to G Converter Convert C to G Save data in format G Conversion List

13 Limitations Does not take account of Desire for Automation or Variation in Expertise: everyone is assumed to be expert enough to manually construct their own conversion list and apply the tool in the best circumstances The domain characteristics call for –Sharing distributed experts’ opinions about obsoletion of data formats with novices, –Applying that knowledge in curating data automatically in the best circumstances, –Retaining full control for experts over how data is converted

14 Agent-Oriented Design Administration Role Standard Administration Agent Curation Role Expert’s Curator Novice’s Curator Responsibilities Ensure that data is not solely in obsolete formats Behaviour On suggestion from Administration, Ignore On suggestion from Scientist, send to Administration include in list Responsibilities Ensure that conversion suggestions are distributed Behaviour On suggestion, propagate to Curators Behaviour On suggestion from Administration, include in list On suggestion from Scientist, include in list

15 Limitations For every agent behaviour, we need an explicit functional design to implement it, so more is required to completely specify the system: greater possibility of mistakes Less support than traditional design methods Benefit of agent-oriented design must be clear to convince developers to use it

16 Characteristics to Properties 1 Variation in Expertise: –Localised Control: Give full control to experts while allowing more automation for novices –Social Ability: Communication between scientists and organisations allows experts’ knowledge to influence novices’ work –Role-Based Design: As novices become more expert, the behaviour of their agents can be changed

17 Characteristics to Properties 2 Openness in Sharing Data and Tools: –Role-Based Design: Swap in new and better agent-based tools –Social Ability: Allows automatic exchange of information about new sources Desire for Automation: –Pro-activity: Agent-oriented design assumes tools fulfil goals where the context is correct

18 Conclusions The requirements placed on tools because of the characteristics of the bioinformatics domain are exactly those that are met by the properties of an agent-based system This is not just an interesting fact: it means that bioinformatics tools developed using an agent-oriented approach will assume the desirable properties from the start


Download ppt "Agent-Oriented Data Curation in Bioinformatics Simon Miles University of Southampton PASOA project: www.pasoa.org."

Similar presentations


Ads by Google