Download presentation
Presentation is loading. Please wait.
Published byBarnaby Washington Modified over 9 years ago
1
Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL
2
●What are VLs? ●What is provenance? ●How do we represent VLs using standardised provenance? Outline
3
From https://nectar.org.au/virtual-laboratories-1, they are:https://nectar.org.au/virtual-laboratories-1 ●data repositories and computational tools and streamlining research workflows What are VLs?
4
Connecting the commons with VHIRL and Provenance
5
From http://en.wikipedia.org/wiki/Provenance#Computer_Science:http://en.wikipedia.org/wiki/Provenance#Computer_Science What is provenance? “Computer science uses the term provenance to mean the lineage of data or processes, as per data provenance. However there is a field of informatics research within computer science called provenance that studies how provenance of data and processes should be characterised, stored and used. Semantic web standards bodies, such as the World Wide Web Consortium, ratified a standard for provenance representation in 2014, known as PROV.”Computer science informatics Semantic web World Wide Web Consortium PROV
6
Do you make decisions? Yes. Should someone remember how you made those decisions? Yes = PROV
7
Data Services Data Layers discovered Layers consist of numerous remote data services PROV: a)Service captures data service informatio n (hosted on RDS) a)Captures subset details of data selected Subset Selected for processing
8
Compute/Storage Services Flexibility in what compute provider to utilise PROV: Captures job details, login info, where/what/ when/how computed etc Includes all relevant NeCTAR details for cloud processing
9
Available Toolboxes TCRM – estimate wind speed from cyclone and severe wind ANUGA – estimate inundation from riverine floods, tsunami, dam break and storm surge PROV: Captures code utilised along with “how” it is used (template/input files)
10
Example for tsunami inundation PROV: Captures location (PID) of where input files/scripts are persisted
11
Processing Services The steps so far have been building an environment to run a processing script Either write your own script......or build from existing templates...when you’re done, it will be submitted for processing on the Cloud! PROV: Captures location (PID) of where input files/scripts are persisted
12
PROV: Finalised outputs are persisted with PIDs on RDS and captured in prov information
13
PROV: After job is completed – finalised Prov record is published to provenance store PROV record endpoints could be registered in ANDS RDA along side output data!!!
15
Components of the Virtual Hazard Impact & Risk Laboratory (VHIRL) Data Services Processing Services Compute Services Enablers Virtual Laboratories/Ap ps Data Analytics Magnetics Gravity DEM eScript ANUGA NCI Petascale NCI Cloud NeCTAR Cloud Amazon Cloud Desktop Service Orchestration Provenance Metadata Auth. Coastal Inundation Tsuanmi Inundation Scenario Cyclone Wind Path Calculation Landsat Bathymetry Cyclone Wind Model Surface Wave Propagation (earthquake) TCRM
16
Basic scientific data processing model - 1 Input Data Process Output Data
17
Background: How do we represent VLs using standardised provenance?
18
Basic scientific data processing model - 2 Code Process Output Data Config Input Data input item Roles
19
Basic scientific data processing model - 3, PROV Code Process Output Data Config Input Data Who/ which system Who wasGeneratedBy wasAttributedTo wasAssociatedWith used Entity Activity Agent PROV classes:
20
Basic scientific data processing model - 4, PROMS Report N Entity Activity Agent PROV classes: PROMS classes: hadStartingActivity / hadEndingActivity Reporting System X reportingSystem R.S. Report
21
Basic scientific data processing model - 5, Storage Report N Entity Activity Agent PROV classes: PROMS classes: Reporting System X R.S. Report Report N Report M Report N Reporting System Y Report N Organisational Provenance Store reported and stored
22
managed data web service data user supplied data managed code user supplied code Data Management VL ID’d and persisted output data cited using PROMS-O format soon to be VL ID’d and persisted, with minimal metadata recorded too SSSC ID’s and persisted perhaps SSSC ID’s and persisted, perhaps VL managed soon to be VL ID’d and persisted, if required, perhaps with time limits
23
managed data web service data user supplied data managed code user supplied code Data Management VL ID’d and persisted output data cited using PROMS-O format soon to be VL ID’d and persisted, with minimal metadata recorded too SSSC ID’s and persisted perhaps SSSC ID’s and persisted, perhaps VL managed soon to be VL ID’d and persisted, if required, perhaps with time limits Virtual Labs Service Citation Example [{ref}]{service title} {service endpoint URI} {query} {time queried} {cached copy ID} [1]“Subset of elevation” http://pid.csiro.au/service/anuga-thredds “bussleton.nc?var=elevation&spatial=bb& north=-33.06495205829679&south=- 33.551573283840156&west=114.849678 74597227&east=115.70661233971667&t emporal=all&time_start=&time_end=&hor izStride” “2014-12-15T13:15:11” http://pid.csiro.au/dataset/abcd1234
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.