WDO-It! 102 Workshop: Using an abstraction of a process to capture provenance UTEP’s Trust Laboratory NDR HP MP
GRAVITY MAP: A SCIENTIFIC SYSTEM
Understanding the Gravity Map System (1/3) Longitude Latitude OBS ncols 5 nrows 5 cellsize= set gravity-dataset-path=gravityDataset.txt set gridded-dataset-path=esriGrid.txt set contoured-dataset-path=gravityContourMap.ps call GetGravityData %gravity-dataset-path% call GridGravityData %gravity-dataset-path% %gridded-dataset-path% call ContourGriddedData %gridded-dataset-path% %contoured-dataset-path% - Shell scripting - Web services - GMT
Understanding the Gravity Map System (2/3) Longitude Latitude OBS ncols 5 nrows 5 cellsize=
Installing and Running the Gravity Map System (1/3) Download scripts to generate a contour map here: Uncompress download file Requirements: – OS: Windows or Unix/Linux/MacOS – Web access – Java 6 – Recommendation: Clear your Java Cache Go to Control Panel > Java Under “Temporary Internet Files”, Click on Settings. Click on Delete Files
Installing and Running the Gravity Map System (2/3) Open command prompt or terminal window Navigate to “scripts” directory Run ContourMapWorkflow script – Windows C:\homedir\scripts\ContourMap – Unix/Linux\MacOS chmod 775 /homedir/scripts/*.sh./ ContourMapWorkflow.sh
Installing and Running the Gravity Map System (3/3) The master script “ContourMapWorkflow” calls three other scripts that in turn invoke a Java 6 client that consumes a Web service All the core functionality in this example is provided through a Web service, for all the activities whether it is gridding a dataset or accessing the gravity datastore
Notes about the outputs This small workflow or pipeline is built on the three activities “getGravityData”, “gridGravityData”, and “contourGriddedData,” each writes its output on the file system for the next activity to consume The outputs are “gravityDatabase.txt”, “esriGrid.txt”, and “gravityContourMap.ps” (final output) and their names can be changed in the script “ContourMapWorkflow” The final output is “gravityContourMap.ps”
GENERATING CODE TO CAPTURE PROVENANCE: THE DATA ANNOTATOR
Running WDO-It! The WDO-It! tool can be run at:
WDO-It! interface
Loading a SAW WDO-It! supports both an open function by local file or URL. In addition, there is a bookmarks option. In order to start, Load the Gravity Contour Map Workflow from the bookmarks From the top menu open bookmarks > CreateGravityContourMapSAW
WDO-It! Gravity Contour Map Workflow
Data Annotators Specialized modules used to capture provenance at a specific time in the execution of a scientific process Data Annotators do not take part on the processing of information of the original system Data Annotators are created using the WDO-It! tool based on a SAW that represents the scientific process DA implementation depends on: – PML encoder agent – Scientific system’s execution platform
Data Annotators System instruction 1 instruction 2. instruction n PML Encoder Agent DA XML Config file (based on process knowledge from SAW) call read configuration use Data + Provenance Links result
Data Annotator Wizard In order to start exporting data annotators, click File > Export Workflow > Generate Data Annotators from the top menu.
Select SAW A prompt will ask the user to select which SAW will be used to generate the data annotators, select the SAW for which you would like to capture provenance. NOTE: It is important that there is a master script available for the SAW being selected. For this example, select CreateGravityContourMapSAW.owl By clicking OK, WDO-It! will open the Data Annotator Wizard to configure the scripts and configuration files to be generated.
Data Annotator Wizard The first screen of the DA wizard asks the user to set: -Output directories -Annotator Agents -Targeted System How to enter each will be covered in the following slides.
DA Configuration: Setting Output paths Select any directory in your file system where you wish to save all generated files (DAs). Annotator Directory – scripts and configuration files. Annotator Output Directory – dump all scientific process data. PML Output directory – dump PML files generated. NOTE: do not try to set the PML dump path with the Browse button as we are currently configuring the CI-Server widget and it may produce some exceptions.
DA Configuration: Targeted System For the targeted system, a user must select what type of scripts he or she requires. Shell scripts (.sh) files for UNIX, Linux or Mac systems Batch scripts (.bat) files for Microsoft Windows systems. Choose the correct option for the system you have. And click on Next
DA Configuration: Bindings The second screen of the DA wizard will ask the user to specify details regarding the semantic abstract workflow. The interface looks like this.
DA Configuration: Sources Select an engine from the drop- down menu for each of the listed sources. For this example choose “UTEPGISGravityDatabase”
DA Configuration: Data The data section specifies what data will be created, transformed, and dumped during the scientific process. A user must specify the format of each by selecting the format type from the drop-down menu. In addition, the user must also specify how the data annotators will know about the data itself Pass by Reference – link the data file Pass by Value – include value in PML file
DA Configuration: Data For this example, select the following values Data Formats: For ContourMap use “ps3” For GravityDataset use “tab-delimited-dataset” For GriddedDataset use “esriGrid” Data Filenames*: For ContourMap use “gravityContourMap.ps” For GravityDataset use “gravityDataset.txt” For GriddedDataset use “esriGrid.txt” * These files are given with the example bundle. They are generated once the main process was run once.
DA Configuration: Methods The final section the bindings tab required the user to set the engine for each method listed. For this example, choose the following: Method Engines: For Contouring use “contour” For Gridding use “gridder” Once selected, you are now finished setting up the Data Annotator wizard. This is a good time to check that everything was set up correctly. Click on the “Generate” button to create the scripts and the configurations files.
The Generated Data Annotator Go to the Annotator Output directory you selected in the DA Wizard. You should see the following directories: – Pml – Data – mappings You should also see the following files: – Contouring* – Contouring.xml – Gridding* – Contouring.xml – WebServicePACES* – WebServicePACES.xml – environmentVariables* *.bat or.sh depending on which system you selected.
INSTRUMENT GRAVITY MAP SYSTEM FOR PROVENANCE CAPTURE
Instrument Gravity Map scripts We need to modify the gravity map client script to invoke the appropriate data annotator at the appropriate time. Each DA script is named after a SAW activity or source of information, but a system analyst must find the appropriate location in the system that “maps” to it. A call to the corresponding DA script must be inserted in the system directly after the execution of the workflow step it is annotating.
Instrument Gravity Map Scripts For example, in order to capture the “gridding” step in our gravity map example, we would add a call to the “Gridding.sh” in the gravityMapClient workflow: export gravitydatasetpath=gravityDataset.txt export griddeddatasetpath=esriGrid.txt export contoureddatasetpath=gravityContourMap.ps./GetGravityData.sh $gravitydatasetpath./GridGravityData.sh $gravitydatasetpath $griddeddatasetpath./Gridding.sh./ContourGriddedData.sh $griddeddatasetpath $contoureddatasetpath
Instrument Gravity Map Scripts A call to an initialization script is also needed../environmentVariables.sh export gravitydatasetpath=gravityDataset.txt export griddeddatasetpath=esriGrid.txt export contoureddatasetpath=gravityContourMap.ps./GetGravityData.sh $gravitydatasetpath./WebServicePACES.sh./GridGravityData.sh $gravitydatasetpath $griddeddatasetpath./Gridding.sh./ContourGriddedData.sh $griddeddatasetpath $contoureddatasetpath./contouring.sh
CAPTURING AND USING GRAVITY MAP PROVENACE
Rerunning Gravity Map Script To generate PML documenting the derivation of the postscript gravity map, simply rerun the workflow script and PML will be dumped in the directories which you specified in the DA wizard.
Browsing Gravity Map Provenance Running Probe-It!
Thank you! For more information please contact: Leonardo Salayandia, Paulo Pinheiro da Silva, Nick Del Rio, Antonio Garza, Aida Gandara,