Suggestions for Galaxy Workflow Design Using Semantically Annotated Services Alok Dhamanaskar, Michael E. Cotterell, Jessica C. Kissinger, and John Miller. University of Georgia Jie Zheng and Christian J. Stoeckert, Jr. University of Pennsylvania Presented by : Jie Zheng FOIS 2012: 7th International Conference on Formal Ontology in Information Systems
Outline 1.Web Service Composition Issues 2.Semantic Annotation of Web Services 3.Semi-Automatic Workflow Composition: Service Suggestion Engine (SSE) Input-Output Matching Algorithms Objective Specification Compliance Calculation of Concept Similarity 5.SSE Evaluation Interfacing SSE with the Galaxy workflow editor 6.Summary and Future Work
Web Service Workflow Composition Issues Web services are developed by different contributors Not developed to work with one another Web services are described using either a WSDL (Web service Description Language) document or a WADL (Web Application Development Language) document. No standard naming conventions, e.g., operation, input Text descriptions are inherently ambiguous Hard to identify whether inputs/outputs are compatible
Semantic Web Service frameworks OWL-S Upper level ontology for Web services Top-down approach Difficult to model the large number of existing Web services SAWSDL Bottom-up approach Provides extension attributes for adding semantics to Web services: sawsdl:modelReference Easy to semantically model existing Web services No specific semantic model or ontology language required to use W3C recommended web services semantic annotation mechanism OWL-S Upper level ontology for Web services Top-down approach Difficult to model the large number of existing Web services SAWSDL Bottom-up approach Provides extension attributes for adding semantics to Web services: sawsdl:modelReference Easy to semantically model existing Web services No specific semantic model or ontology language required to use W3C recommended web services semantic annotation mechanism
Bioinformatics Web Service Ontology (OBI-WS) OBI WS v 1.0 released Bioportal: Supports annotation of ~100 operations from 19 different web services including sequence analysis and utility web services Sequence Similarity Web Services WU-BLAST, NCBI-BLAST Multiple Sequence Alignment Web services Clustal W, T-coffee Protein Functional Analysis Web services SignalP Phylogenetic Analysis Web services Phylip
Service Suggestion Engine The Service Suggestion Engine (SSE) is a semi-automatic workflow composition system. What it does: Facilitates the construction and extension of workflows by providing suggestions to the user for the next step. It is capable of doing Forward Suggestions, Backward Suggestions and Bi-directional Suggestions. SSE returns a ranked list of web service operations (from a candidate list of available operations) that could be used for the next, previous or intermediate step. SSE ranking is based on Compatible score 1 2 ? ? ? 2
The SSE scores the candidate operations depending on How well the inputs of the candidate operation can be fed using the outputs of the operation(s) in the workflow. Input-Output Compatibility Score S io How well the desired functionality if supplied by the user aligns with the “Objective specification” of the operation. Objective Specification Compliance Score S obj Ontology Service Suggestion Engine Workflow ws ? ?? ? Candidate Web services Web Service Operations Compatible Scores Calculation
SSE Sub-scores When calculating these sub-scores, the system calls a “ Concept similarity “ module to find out how similar two concepts are. R. Wang, et al.Ranking-Based Suggestion Algorithms for Semantic Web Service Composition, ICWS 2010, pp : weight of objective specification (1- ); weight of input-output :weight of semantic (1- ):weight of syntactic Sum of weights = 1
Input-Output Compatibility Model inputs/outputs using a Directed Acyclic Graph (DAG) Transforms the I-O matching problem into a graph pattern matching problem SSE supports two algorithms to calculate s io sub-score path based mapping algorithm p-Homomorphism matching algorithm
A D CB E P S RQ Output of Web service 1 Input to Web service 2 Identify Paths 1. A – B – D 2. A – B – E 3. A – C 1’ P – Q 2’ P – R – S The goal is to match the input paths of Web service 1 to output paths of Web service 2. So in this case we need to find best matching path for 1’ P – Q 2’ P – R – S Path Based Mapping Algorithm
A D CB E P S RQ Output of Web service 1 Input to Web service 2 Finding best matching path for 2’ P – R – S WSDL Element + Ontology Concept Comparing paths 2’ – 1 P – R – S A – B – D 2’ – 2 P – R – S A – B – E 2’ – 3 P – R – S A – C PSRA D B Leaf Nodes Score for MatchPath (2’ – 1) Matching 2’ – 1 P – R – S A – B – D Path Based Mapping Algorithm
P-Homomorphism Input-Output Matching Path-Based matching decomposes the input-output DAGs into individual paths thus losing some structural Information. A homomorphism is a structure-preserving map between two algebraic structures (such as groups, rings, or vector spaces) Finding an exact Homomorphism mapping between inputs and output DAGs is remote Important to consider similarity between the vertices when considering the mapping
P-Homomorphism Input-Output Matching A D CB E P S RQ P Q R S A B C D E Threshold 0.7 Good Matches A -> P B -> R C -> D -> Q E -> S Input DAG Output DAG
Objective Specification Compliance User can provide the desired functionality as: keywords or a concept in the ontology that he feels closely describes the functionality desired If semantics are available from the user as well as Annotation: ConceptSimilarity( ObjectiveSpecification, DesiredFunctionality ) In absence of semantics from the user: SyntacticSimilarity (ConceptDefinition, Keywords ) In absence of semantics from the user as well as annotation SyntacticSimilarity (OperationName, Keywords )
Workflow Management System Two popular tools that provide a GUI for creating workflows Galaxy easy to use, open-source, Web-based platform that provides multiple tools for bioinformatics data analysis. provides an easy way to construct workflows using existing tools in a very simple fashion using a Yahoo pipes-based graphical designer Previous work allows adding WebServices as tools to Galaxy Taverna open source is integrated with BioCatalogue, and supports the invocation of web services and their use in workflows. Galaxy easy to use, open-source, Web-based platform that provides multiple tools for bioinformatics data analysis. provides an easy way to construct workflows using existing tools in a very simple fashion using a Yahoo pipes-based graphical designer Previous work allows adding WebServices as tools to Galaxy Taverna open source is integrated with BioCatalogue, and supports the invocation of web services and their use in workflows.
Evaluation Scenario Find out more information about a protein sequence and its evolutionary relationships to other protein sequences. The user might be aware that he wants to first search a database for similar sequences, then perform multiple sequence alignment and finally perform phylogenetic analysis to construct phylogenetic trees. Web services already exist for each of the above. We utilize semantically annotated versions of their descriptions for our example.
Evaluation Set up We have evaluated SSE for Suggestions provided for each step against a ranking by a human expert. When suggesting from 101 Web service Operations Precision: Recall: F-Measure:
Evaluation: Adding Web Services to Galaxy Screen shot Output
Evaluation: Workflow Construction Step 1
Evaluation: Extend Workflow
Evaluation: Forward Suggestions (Path Based) = 0.65 = 0.67 = 0.69 = 0.1
Conclusions & Future Work The use case demonstrated, that of choosing from 101 operations for a 9-step workflow, presents a challenging task for a user (without a tool support). Semantics from ontology can definitely help in different aspects of Web service Compositions. SSE can help the user considerably narrow down the choices for the next or previous step. The availability of a SSE as a Web service will facilitate easier integration with existing workflow composition tools. Future work Other kind of web services, like secondary structure analysis, sequence annotation web services, etc. plug-in for other workflow editors, e.g., Taverna Code and files available at:
Acknowledgements Alok Dhamanaskar: University of Georgia Michael E. Cotterell Jessica C. Kissinger John Miller University of Pennsylvania Jie Zheng Christian J. Stoeckert, Jr. Funding: NIH R01 GM093132
Adding ontology annotation into WSDL / WADL files (expectation value) sawsdl:modelReference= " OBIws_ ">
RadiantWeb Annotation Tool Manually annotation is labor intensive and error prone RadiantWeb – suggest ontology terms, generate SAWSDL file