Presentation is loading. Please wait.

Presentation is loading. Please wait.

Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK

Similar presentations

Presentation on theme: "Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK"— Presentation transcript:

1 Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK

2 2 A UK e-Science project to build middleware for in silico experiments by individual life scientists, stuck in under-resourced labs, who use other people’s applications. Sequence analysis, microarray analysis, proteomics, chemoinformatics, image processing, rendering Dilbert cartoons. 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt

3 3 Two tiers of services my Grid services for workflow, data management, provenance management, browser clients, service discovery etc Open extensible SO architecture: Web services, APIs, e-Science events, messages, plug-in framework, information model Neat and controlled Domain services BioMART, BioMOBY, NCBI, EMBL-EMBL, R package, Seqhound, EMBOSS, PubMed, caBIG etc 3000+ of these. None of them ours. Scruffy and independent. And not much WSDL.

4 4 Open World Burden Independent third party service providers Independent, unknown users No compatibility compliance between domain services expected No one application (data pipeline focused) No common domain data model Lightweight + Jam today

5 5 Explicit exposed description for the scientist about how to do stuff …and what you did…and the provenance of what you got. Easier to explain, share, relocate, reuse and repurpose. User viewpoint. Pattern books and workflow catalogues A market of workflows Workflows

6 6

7 7 How to hide the complexity of interoperating these domain services? Bury it Freefluo Workflow enactor Processor Plain Web Service Soap lab Processor Local Java App Processor Enactor Processor Bio MOBY Processor WSRF Processor Bio MART Styx client Processor R package

8 8 How to cope with data incompatibility between services? Fix up the services to be compatible Shims – libraries of adapters.

9 9 Experience Report Workflows and bits of workflow are popular and get exchanged. Buy-in depends on MY service’s availability. User-oriented workflow language hides a multitude of sins. Shims are ok. And we should hide ‘em. Results management is killer. Need workflow patterns and best practice. Did not use BPEL.

10 10 3000+ services? 100s of workflows? How do I find anything? How do I know what works with what and what it does? Service ModelOntology

11 11 Experience Report OWL Reasoning to classify and match services Capturing and curating content bottleneck. People vs machine descriptions. For people - a little semantics goes a long way. Don’t be too clever. Semantic Web Service models (OWL-S, WSMO, WSDL-S) immature

12 12 Workflow outcomes A record of outcome data and its provenance. Store data outcomes with a unique id, link together in a typed graph. In fact store all provenance as graph! Life Science Identifier

13 13 urn:data:f2 urn:data1 urn:data2 urn:compareinvocation3 urn:data12 Blast_report [input] [output] [input] [distantlyDerivedFrom] SwissProt_seq [instanceOf] Sequence_hit [hasHits] urn:hit2…. urn:hit1… urn:hit50….. [instanceOf] [similar_sequence_to] Data generated by services/workflows Concepts [ ] [performsTask] Find similar sequence [contains] Services urn:data:3 urn:hit8…. urn:hit5… urn:hit10….. [contains] [instanceOf] urn:BlastNInvocation3 urn:invocation5 urn:data:f1 [output] New sequence Missed sequence [hasName] literals DatumCollection [type] LSDatum [type] Properties [instanceOf] [output] [directlyDerivedFrom] Concept Data

14 14 Fusion between different data models using shared concepts or data outputOf createdFrom contains_similiar_seq_t o urn:genbank2 … urn:genbank1 … urn:genbank5 0… Blast_reportDNA_sequence urn:BlastNInvocation3 urn:data:3 urn:data2 inputOf Blast_servic e instanceOf urn:williams A urn:run5 urn:data2 urn:run7 urn:williamsB GenBankUniProt runOf inputOf runOf createdBy LSID createdB y urn:data: f2 urn:data1 urn:data2 urn:compareinvocation3 urn:data1 2 Blast_report [input] [output] [input] [distantlyDerivedFrom] SwissProt_seq [instanceOf] Sequence_hit [hasHits] urn:hit2…. urn:hit1… urn:hit50 ….. [instanceOf] [similar_sequence_to] Data generated by services/workfl ows Concepts [ ] [performsTask] Find similar sequence [contains] Services urn:data:3 urn:hit8…. urn:hit5… urn:hit10 ….. [contains] [instanceOf] urn:BlastNInvocation 3 urn:invocation 5 urn:data: f1 [output] New sequence Missed sequence [hasName ] literals DatumCollection [type] LSDatum [type] Properties [instanceOf] [output] [directlyDerivedFro m] Add assertions, Add rules Reason over assertions

15 15 Experience Report Classification and reasoning over results. Graph matching. User provenance + machine provenance Extensible non-prescriptive model Maturity of standards – LSID . Scalability and maturity of tools. RDF graphs are not for humans. Customised presentation tools.

16 16 Take home Workflows and semantic web technologies powerful tools. Especially for scruffies. Both about description. Both help us be flexible.

Download ppt "Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK"

Similar presentations

Ads by Google