DataNet Federation Consortium Reagan W. Moore (UNC-CH, PI) Arcot Rajasekar (UNC-CH, co-PI) Jonathan Goodall (USC, co-PI) William Regli (Drexel, co-PI) John Orcutt (UCSD, co-PI) Stan Ahalt (RENCI) Mary Whitton (UNC-CH, Project Manager) Mike Wan (UCSD) Wayne Schroeder (UCSD) Sheau-Yen Chen (UCSD) Lisa Stillwell (RENCI) Helen Tibbo (UNC-CH) Cal Lee (UNC-CH) Jewel Ward (UNC-CH) Ken Galluppi (ASU) Isaac Simons (Drexel University) Mirza Billah (University of South Carolina)
DataNet Federation Consortium Data Driven Science Implement national data infrastructure Federate existing discipline-specific data management systems to enable national research collaborations Enable collaborative research on shared data collections Manage collection life cycle as the user community broadens Integrate “live” research data into education initiatives Enable student research participation through control policies Project Shared Collection Processing Pipeline Digital Library Reference Collection Federation Collection Life Cycle Cyber-infrastructure Partners: Univ. of North Carolina, Chapel Hill Univ. of California, San Diego University of South Carolina Drexel University Arizona State University Duke University University of Arizona Science and Engineering Initiatives: Ocean Observatories Initiative Hydrology - CUAHSI, EarthCube Engineering - CIBER-U digital library the iPlant Collaborative Odum Social Science Research Institute Temporal Dynamics of Learning Center Policy-based data management National Science Foundation Cooperative Agreement: OCI-0940841
DFC Organizational Structure Vice Chancellor of Research, UNC-CH Barbara Entwisle PI, Reagan Moore, and Executive Committee External Advisory Board Community of Practice Expertise Boards Project Manager Mary Whitton Steering Committee Facilities & Operations Stan Ahalt Lisa Stillwell Sheau-Yen Chen Institutions and Sustainability Richard Marciano Science and Engineering William Regli OOI ---------------- John Orcutt CIBER-U -------- William Regli Hydrology ------ Ken Galluppi Technology and Research Arcot Rajasekar Wayne Schroeder Mike Wan Outreach & Education Marilyn Lombardi Julian Lombardi TDLC ------------ Andrea Chiba iPlant --------------- Sudha Ram Odum ------- Jonathan Crabtree Policies and Standards Helen Tibbo Cal Lee Jewel Ward 3
Build National Infrastructure Through Federation Ocean Observatories Initiative, National Climatic Data Center Data grid for oceanography, sensor control, real-time data streams, archive CUAHSI, UNC Institute for the Environment, National Climatic Data Center Data grid for hydrology, watershed modeling workflow integration CIBER-U (Engineering design, undergraduate education) Digital Library, OOI sensor documents Years 3-5 the iPlant Collaborative Data grid for plant biology, federation with existing biology resources Odum Social Science Research Institute DataVerse federation, data archive Temporal Dynamics of Learning Center Data grid for cognitive science
Enabling Tools Data grid Soft links Federated data grids Build shared name spaces for users, files, resources, metadata, rules, procedures Soft links Register data from external data management system, accessed through its protocol Federated data grids Cross-register users between data management systems Workflow integration Register workflows into data grid for storage side procedures Integrate data management workflows with external workflows
Policy-based Data Management Researchers - Client Data Grid iRODS controlled workflows Data Grid iRODS controlled workflows Shared Collection Storage Storage Storage Storage Consensus on Policies and Procedures controls the shared data within the federation
Extensibility Operations on Name Spaces
Community-Based Collection Life Cycle Each life cycle stage re-purposes the original collection Project Collection Private Local Policy Data Grid Shared Distribution Policy Data Processing Pipeline Analyzed Service Policy Digital Library Published Description Policy Reference Collection Preserved Representation Policy Federation Sustained Re-purposing Policy Stages correspond to addition of new policies to support a broader community The evolution of policies quantifies how impact is broadened
Accomplishments Installed three data grids OOI : Drexel engineering : USC Hydrology Installed Federation hub at RENCI Based on version 3.1 of iRODS data grid Federated with EUDAT, NCDC Created engineering digital library Integration of MediaWiki with iRODS Automated hydrology workflows Established collaborations with NCDC, NCCS, EarthCube, port 1247 ooi-ucsdResc1 ooi-osuResc1 DataNet Federation Communication Ports Port 1247 Port 1247 ooi Zone, port 1247 4 resources at hydrology Zone, port 2823 2 resources at res-dfcmain Port 2823 Port 1247 Port 2823 Port 2823 Port 1237 Port 1247 Port 1247 Port 1237 usc-resource Port 1237 renci-vault19 ooi-icatResc1 ooi-cgResc1 dfcmain Zone, port 1237 Federates with 4 zones 2 resources at renci-vault2 renci-vault1 Port 1237 Port 1237 Port 1237 Port 1247 Port 1247 Port 1247 Port 1247 res-bk15 renci Zone, port 1247 > 10 resources engineering Zone edge.cs.drexel, port 1247 1 resource edge.cs.drexel Port 1247 Port 1247 Port 1247 Port 1247 loadingResc europa-vault1 resource group edge
iRODS Integration in MediaWiki Date: July 10th, 2012
New features – iRODS wikipage Any mediawiki page that is added or edited from now on is synchronized with iRODS (a copy of the page is stored on iRODS server) You know if a page is synchronized with iRODS by looking at the bottom of a page, under “Irods Report”:
iRODS File Details
Hydrology Use Cases VIC model automation (USC) RHESSys model automation (UNC-CH, EarthCube) Sharing of workflows NCDC archiving of data from OOI SigClimate sustainability group (NCDC, NCCS)
Eco-Hydrology Choose gauge or outlet (HIS) RHESSys workflow to develop a nested watershed parameter file (worldfile) containing a nested ecogeomorphic object framework, and full, initial system state. Extract drainage area (NHDPlus) Digital Elevation Model (DEM) Slope Aspect Nested watershed structure Streams (NHD) Soil and vegetation parameter files Roads (DOT) Strata Patch Land Use NLCD (EPA) Hillslope Basin Leaf Area Index Landsat TM Stream network Phenology MODIS Flowtable Worldfile Soil Data USDA RHESSys
Workflow Management Workflow file eCWkflow.mss Directory holding all input and output files associated with workflow file (mounted collection that is linked to the workflow file) /earthCube/eCWkflow Automatically generated run file for Executing each input file Input parameter file, lists parameters and input and output file names eCWkflow.mpf eCWkflow2.mpf Directory holding all output files generated for invocation of, the version number is incremented /earthCube/eCWkflow/eCWkflow.runDir0 Outfile Output file created for eCWKflow.mpf /earthCube/eCWkflow/eCWkflow2.runDir0 Output file created for eCWKflow2.mpf Newfile
Workflow Re-execution & Sharing eCWkflow.mss imcoll imcoll …. /earthCube/eCWkflow /hydrology/myWkflow myWkflow.mpf eCWkflow.mpf /earthCube/eCWkflow/eCWkflow.runDir0 /hydrology/myWkflow/myWkflow.runDir0 Outfile Outfile /earthCube/eCWkflow/eCWkflow.runDir1 /hydrology/myWkflow/myWkflow.runDir1 Outfile Outfile
Re-use Architecture Components Research Environment {9} Portals, Applications {5}, Workflows {2} After generating results within a collaboration environment Apply appropriate policies and procedures to publish the results as a digital library Register a Community Resource that can be used by subsequent research initiatives Collaboration Environment – Data Grid {9} Protocols {0} Web Services {6} --------------------- Protocols {0} Brokers {7} Protocols {0} Web Services {6} --------------------- Protocols {0} Community Resource Collaboration Environment – Data Grid {9}
Education Develop policies and procedures to make “live” collections accessible by students Support classification, categorization, feature detection algorithms Integrate with student digital libraries UNC-CH School of Information and Library Science LifeTime Library Students build their own personal reference collection
Life-Time Library (SILS) Student digital libraries Enable students to build collections of Photographs MP3 audio files Video Class documents Web site archive Resources provided by School of Information and Library Science Student collections range from 2 GBytes to 150 Gbytes Number of files from 2000 to 12,000
LifeTime Library Policies Integrity Replication Checksums Versioning Management Strict access controls Quotas Metadata catalog replication Installation environment archiving Ingestion Automated synchronization of student directory with LifeTime Library