Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, COPPE – Federal University of Rio de Janeiro - Brazil UFRJ
2 Summary Motivation Introduction & Background Goal Approach & Implementation Conclusion COPPE/UFRJ
3 Motivation Pieces of workflows that occurred in the past may occur again in the future. COPPE/UFRJ
4 Motivation The number of services and bioinformatics operations are growing: Taverna has over 3500 (2007). VisTrails has over 1200 Modules (2008). Workflow Services Workflow Services Workflow Services Workflows and WF Services COPPE/UFRJ
5 Motivation How can we find the pieces or services that are useful during the design of a new workflow in an automatic and systematic way? COPPE/UFRJ
6 Software Reuse Is the process of creating software systems from existing software [Krueger, 1992]. Quality ReliabilityReduced Cost Productivity Software Reuse COPPE/UFRJ
7 Recommendation Systems E-Commerce: Apply data mining techniques to the problem of helping user finding the items they would like to purchase. DomainConcepts E-commerceCustomerProduct*CartPreference Scientific Experiment ScientistComponent / Actor Workflow (Goble, 2007) Context E-commerce concepts mapped into scientific experiment concepts * what is recommended by e-commerce sites COPPE/UFRJ
8 Goal Propose a proactive recommendation service that aims at suggesting frequent combinations of scientific programs for reuse. COPPE/UFRJ
9 Approach Workflow specification DB Design Design for reuse and recommendation Provenance COPPE/UFRJ
10 Approach Workflow specification DB Design Proactive Recommendation Design with reuse and recommendation Provenance COPPE/UFRJ
11 Implementation Populating the database: VisTrails workflows: -Parse provenance xml files to extract the relations. MySQL database: -The relations are mapped into a database. -Each relation contains the modules and how they are connected. COPPE/UFRJ
12 Implementation VisTrails workflow design with recommendation SourceDestinationSource PortDest Port HmmBuildHmmCalibrateDestinationDirSourceDir HmmBuildCatDestinationDirDir HmmBuildHmmCalibrateDestinationDirHmmPath HmmBuildHmmCalibrateStdOutHmmPath HmmBuildHmmCalibrateStdOutHmmPath Ports 1 and 2 are the output ports DestinationDir and StdOut, respectively. Ports 3, 4 and 5 are the input ports SourceDir, HmmPath and Dir, respectively Recommendation Metric: From the example, we can infer that port StdOut of HmmBuild has been connected to port HmmPath of HmmCalibrate in 40% of previously designed workflows. COPPE/UFRJ
13 Implementation VisTrails workflow design with recommendation COPPE/UFRJ
14 Conclusion We expect that this approach may help to propagate the benefits of software reuse to the context of scientific workflows. Reduce the time to design workflows. Increase the quality of workflows designed. COPPE/UFRJ
15 Conclusion Limitations: The current version of our prototype recommends only a subsequent component based on previously used connection. Future works: Improve the approach recommending a component investigating the whole path. Specify a context to each workflow. Apply weight to each relation based on workflow usage. COPPE/UFRJ
16 Using Provenance to Improve Workflow Design UFRJ Frederico Tosta Leonardo Murta Claudia Werner Marta Mattoso {ftoliveira, murta, werner, COPPE – Federal University of Rio de Janeiro - Brazil