EarthCube Layered Architecture Concept Award Interoperability Mechanisms
Layered Architecture Concept Award Reagan Moore (UNC-CH/DICE)Collaboration environments Ilkay Altintas (UCSD)Workflows David Arctur (OGC)Web services Lawrence Band (UNC-CH/IE)Eco-hydrology modeling Liping Di (GMU)Geospatial knowledge building Janet Fredericks (WHOI)Data quality Jeff Horsburgh (Utah State University)CUAHSI / DataONE Yong Liu (UIUC / NCSA)Workflows / Cyberintegrator Chris MacDermaid (Colorado State Univ.)Physics model frameworks Brian Miles (UNC-CH/IE) Eco-hydrology workflows Michael Schoffner (RENCI)Web service integration Antoine de Torcy (UNC-CH/DICE)Workflow integration Weiguo Han(GMU)GeoBrain
Research Environment - Applications, Workflows Collaboration Environment – Data Grids, Portals Protocols Web Services Protocols Web Services Protocols Brokers / Messaging / Structured Object Manipulation Protocols Web Services Protocols Web Services Protocols Community Resources Policies Loosely Coupled – Layered Architecture EarthCube Infrastructure
Interoperability Mechanisms For interactions with a collaboration environment – Register a remote file into the collaboration space Collection of linksSoft link – Operations on a remote file THREDDS, OpeNDAP, NetCDF, HDF5, FITS, …Posix I/O extensions – Access (get, put) a remote file Web service invoking remote protocolMicro-services – Asynchronously post and read messages Message queue forwarding (AMQP)Queuing – Operations on aggregations of files Operations associated with a collectionPosix I/O extensions – Operations on aggregations of procedures Workflows and structured information exchangeRule exchange – Policy enforcement Policy-encoded objectsPolicy exchange
Use Case Collaborations – Register a remote file into the collaboration space DataNet Federation Consortium data grid (DFC) – Operations on a remote file THREDDS, OpeNDAP, NetCDF, HDF5, storage drivers for OOI – Access (get, put) a remote file DataONE, CUAHSI, (OGC, Data Conservancy) – Asynchronously post and read messages SEAD - VIVO – Operations on aggregations of files OOI time series archive – Operations on aggregations of procedures Kepler (Gulf of Mexico hypoxia), NCSA Cyberintegrator (Texas drought) – Policy enforcement Research Data Alliance policy sharing
Use Cases Demonstrate reproducible science. A use case could include the registration, storage, sharing, and re-execution of a workflow. The hypoxia use case from the Cross-Domain and Brokering Concept groups could be used as an example. Automate data retrieval. A use case could demonstrate remote access to a data collection, retrieval of desired data sets, transformation, and use in an analysis workflow. An eco-hydrology example that automates access to digital elevation maps and land use coverage is being built. Integrate community resources with collaboration environments. An example would be use of the DAB protocol to identify and cache local copies of relevant data sets for local analysis. Integrate multiple community resources. A use case could be demonstration of invocation of multiple workflow systems within the same analysis. An example is the integration of Cyberintegrator workflow with collaboration environments to support drought prediction.
Eco- Hydrology Choose gauge or outlet (HIS) Extract drainage area (NHDPlus) Digital Elevation Model (DEM) Worldfile Flowtable RHESSys Slope Aspect Streams (NHD) Roads (DOT) Strata Hillslope Patch Basin Stream network Nested watershed structure Land Use Leaf Area Index Phenology Soil Data NLCD (EPA) Landsat TM MODIS USDA Soil and vegetation parameter files RHESSys workflow to develop a nested watershed parameter file (worldfile) containing a nested ecogeomorphic object framework, and full, initial system state.
iRODS Rule for RHESSys main { getExtentForGageReachcode(*gageReachcode, *extentInNHD_Vect_Coords); convertExtentToNHD_DEM(*extentInNHD_Vect_Coords, *extentInNHD_DEM_Coords); extractTileFromNHD_DEM(trimr(*extentInNHD_DEM_Coords, "\n")); importDEMTileIntoNewGRASSLocationAsUTM(*extentInNHD_Vect_Coords, *newLocPhysPath, *newLocObjPath); delineateWatershedForNHDGage(*nhdStreamGageID, *newLocPhysPath, *newLocObjPath); } Modular workflow composed by chaining basic transformation Define input variables Call functions to apply each transformation step Store results in shared collection
extractTileFromNHD_DEM(*extentCoords) { # Split path to object into collection and name msiSplitPath(*nhdDEMObjPath, *nhdDEMObjColl, *nhdDEMObjName); writeLine("serverLog", *nhdDEMObjColl); writeLine("serverLog", *nhdDEMObjName); # Build query to discover physical path msiAddSelectFieldToGenQuery("DATA_PATH", "null", *genQInp); msiAddConditionToGenQuery("DATA_NAME", "=", *nhdDEMObjName, *genQInp); msiAddConditionToGenQuery("COLL_NAME", "=", *nhdDEMObjColl, *genQInp); msiAddConditionToGenQuery("DATA_RESC_NAME", "=", *rescName, *genQInp); # Run query msiExecGenQuery(*genQInp, *genQOut); # Extract path from query result foreach (*genQOut) {msiGetValByKey(*genQOut, "DATA_PATH", *filePath); } writeLine("serverLog", *filePath); # Determine physical path of input directory msiSplitPath(*filePath, *inFileDir, *headerFileIgnore); # Generate physical path of output file msiSplitPath(*inFileDir, *inFileParentDir, *rasterDatasetName) *tileFileName = "SUBSET-"++*rasterDatasetName++".img" *tileFilePath = *inFileParentDir++"/"++*tileFileName; # Generate iRODS path of output msiSplitPath(*nhdDEMObjColl, *nhdDEMObjCollParent, *junk) *tileObjPath = *nhdDEMObjCollParent++"/"++*tileFileName *args = "-of HFA -projwin "++*extentCoords++" "++"'*inFileDir'"++" "++"'*tileFilePath'"; writeLine("serverLog", *args); msiExecCmd("gdal_translate", *args, "", "null", "null", *cmd_out); writeLine("serverLog", *cmd_out); # Register tile file with iRODS msiPhyPathReg(*tileObjPath, *rescName, *tileFilePath, "null", *status); }
Event-Driven Real-Time Drought Analysis/Prediction Workflow Data Grid – Collaboration Environment RAPID (river routing model) RAPID (river routing model) NASA NLDAS-2 Other data sources Invoke Monitor Output Store Visualization NCSA Cyberintegrator
Management of Workflows Workflow components – File containing input parameters, input file names, output file names – Input files – File containing workflow language – Output files Each invocation of the workflow generates versioned instance – Compare results across input file versions – Share workflows – Re-execute workflows Automatically associates input parameters with each workflow invocation and with resulting output files
Workflow Management eCWkflow.mss Workflow file /earthCube/eCWkflow Directory holding all input and output files associated with workflow file (mounted collection that is linked to the workflow file) eCWkflow.mpf Input parameter file, lists parameters and input and output file names /earthCube/eCWkflow/eCWkflow.runDir0 Directory holding all output files generated for invocation of, the version number is incremented Automatically generated run file for Executing each input file Outfile Output file created for eCWKflow.mpf eCWkflow2.mpf /earthCube/eCWkflow/eCWkflow2.runDir0 Newfile Output file created for eCWKflow2.mpf
Workflow Re-execution & Sharing eCWkflow.mss /earthCube/eCWkflow eCWkflow.mpf /earthCube/eCWkflow/eCWkflow.runDir0 Outfile /hydrology/myWkflow myWkflow.mpf /hydrology/myWkflow/myWkflow.runDir0 Outfile …. imcoll /earthCube/eCWkflow/eCWkflow.runDir1 Outfile /hydrology/myWkflow/myWkflow.runDir1 Outfile
DFC + DataONE Interoperability Goal: support interoperability between a DFC data grid and DataONE Task: Retrieve a file from DataONE, load into a DFC collaboration environment and add metadata 14
How It Works 1.Query DataONE Coordinating Nodes with SOLR query 2.Create iRODS collection with same name as query 3.Get list of identifiers for metadata files from search 4.Download the metadata file for each identifier 5.Store the metadata file in DFC data grid 15
What the Demo Shows 16 REST APIs Collection “rain” Query: “rain” 1 2 Matching identifier list 3 Get metadata file for each identifier 4 File goes into collection Mercury Web portal file
EarthCube Layered Architecture NSF EAGER DataNet Federation Consortium NSF OCI iRODS Policy-based data management NSF SDCI