DataNet Federation Consortium

Slides:



Advertisements
Similar presentations
WDL Technical Architecture Working Group (TAWG) June 2010 Achievements and Recommendations Co-chaired by Noha Adly, Bibliotheca Alexandrina Babak Hamidzadeh,
Advertisements

A Community Approach to Preservation: Experiences with Social Science Data ASIST Summit 2010 Jonathan Crabtree April 9, 2010.
GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
Wayne Schroeder, Paul Tooby Data Intensive Cyber Environments Team (DICE) DICE Center, University of North Carolina at Chapel Hill; Institute for Neural.
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
A Very Brief Introduction to iRODS
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
Linking HIS and GIS How to support the objective, transparent and robust calculation and publication of SWSI? Jeffery S. Horsburgh CUAHSI HIS Sharing hydrologic.
AN OPEN-SOURCE SYSTEM FOR AUTOMATIC POLICY-BASED COLLABORATIVE ARCHIVAL REPLICATION Using the SafeArchive System The SafeArchive System coordinates six.
Extracting and Ingesting DDI Metadata and Digital Objects from a Data Archive into the iRODS extension of the NARA TPAP Using the OAI-PMH J. Ward, A. de.
EarthCube Layered Architecture Concept Award Interoperability Mechanisms.
This work was funded by the U.S. National Science Foundation under grant EAR Any opinions, findings and conclusions or recommendations expressed.
DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
National Science Foundation Cooperative Agreement: OCI
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
HydroShare: Advancing Hydrology through Collaborative Data and Model Sharing David Tarboton, Ray Idaszak, Jeffery Horsburgh, Dan Ames, Jon Goodall, Larry.
National Data Infrastructure Projects EarthCube Layered Architecture (GEO) DataNet Federation Consortium (OCI) integrated Rule Oriented Data System (SDCI)
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
OSG Public Storage and iRODS
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Institutional Repository Planning at UNC-Chapel Hill UNC System Records Officers Meeting General Administration November 13, 2006.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
Interoperability of Digital Repositories Adil Hasan Univ of Liverpool.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Data Management Planning Session Kevin Gomes Michael Meisinger Arcot Rajasekar Michael Wan October 19, 2007.
Ocean Observatories Initiative OOI Cyberinfrastructure Data Management Michael Meisinger & David Stuebe OOI Cyberinfrastructure Life Cycle Objectives Milestone.
National Science Foundation Cooperative Agreement: OCI
National Science Foundation Cooperative Agreement: OCI Reagan Moore, PI Mary Whitton, Project Manager.
1 HPEC'02 Distributed Data Management Architecture for Embedded Computing The Problem: –Integrated real-time management of large, distributed, heterogeneous.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Sharing models as social objects through HydroShare
Sharing Hydrologic Data with the CUAHSI* Hydrologic Information System
Challenges with Maintaining Legacy Software to Achieve Reproducible Computational Analyses: An Example for Hydrologic Modeling Data Processing Pipelines.
DataNet Collaboration
Collaboration and Outreach
An Overview of Data-PASS Shared Catalog
Policy-Based Data Management integrated Rule Oriented Data System
Joseph JaJa, Mike Smorul, and Sangchul Song
OGSA Data Architecture Scenarios
Odum Institute iRODS Policies to Support Preservation
Distributed Data Management Architecture for Embedded Computing
Technical Issues in Sustainability
Presentation transcript:

DataNet Federation Consortium Reagan W. Moore (UNC-CH, PI) Arcot Rajasekar (UNC-CH, co-PI) Jonathan Goodall (USC, co-PI) William Regli (Drexel, co-PI) John Orcutt (UCSD, co-PI) Stan Ahalt (RENCI) Mary Whitton (UNC-CH, Project Manager) Mike Wan (UCSD) Wayne Schroeder (UCSD) Sheau-Yen Chen (UCSD) Lisa Stillwell (RENCI) Helen Tibbo (UNC-CH) Cal Lee (UNC-CH) Jewel Ward (UNC-CH) Ken Galluppi (ASU) Isaac Simons (Drexel University) Mirza Billah (University of South Carolina)

DataNet Federation Consortium Data Driven Science Implement national data infrastructure Federate existing discipline-specific data management systems to enable national research collaborations Enable collaborative research on shared data collections Manage collection life cycle as the user community broadens Integrate “live” research data into education initiatives Enable student research participation through control policies Project Shared Collection Processing Pipeline Digital Library Reference Collection Federation Collection Life Cycle Cyber-infrastructure Partners: Univ. of North Carolina, Chapel Hill Univ. of California, San Diego University of South Carolina Drexel University Arizona State University Duke University University of Arizona Science and Engineering Initiatives: Ocean Observatories Initiative Hydrology - CUAHSI, EarthCube Engineering - CIBER-U digital library the iPlant Collaborative Odum Social Science Research Institute Temporal Dynamics of Learning Center Policy-based data management National Science Foundation Cooperative Agreement: OCI-0940841

DFC Organizational Structure Vice Chancellor of Research, UNC-CH Barbara Entwisle PI, Reagan Moore, and Executive Committee External Advisory Board Community of Practice Expertise Boards Project Manager Mary Whitton Steering Committee Facilities & Operations Stan Ahalt Lisa Stillwell Sheau-Yen Chen Institutions and Sustainability Richard Marciano Science and Engineering William Regli OOI ---------------- John Orcutt CIBER-U -------- William Regli Hydrology ------ Ken Galluppi Technology and Research Arcot Rajasekar Wayne Schroeder Mike Wan Outreach & Education Marilyn Lombardi Julian Lombardi TDLC ------------ Andrea Chiba iPlant --------------- Sudha Ram Odum ------- Jonathan Crabtree Policies and Standards Helen Tibbo Cal Lee Jewel Ward 3

Build National Infrastructure Through Federation Ocean Observatories Initiative, National Climatic Data Center Data grid for oceanography, sensor control, real-time data streams, archive CUAHSI, UNC Institute for the Environment, National Climatic Data Center Data grid for hydrology, watershed modeling workflow integration CIBER-U (Engineering design, undergraduate education) Digital Library, OOI sensor documents Years 3-5 the iPlant Collaborative Data grid for plant biology, federation with existing biology resources Odum Social Science Research Institute DataVerse federation, data archive Temporal Dynamics of Learning Center Data grid for cognitive science

Enabling Tools Data grid Soft links Federated data grids Build shared name spaces for users, files, resources, metadata, rules, procedures Soft links Register data from external data management system, accessed through its protocol Federated data grids Cross-register users between data management systems Workflow integration Register workflows into data grid for storage side procedures Integrate data management workflows with external workflows

Policy-based Data Management Researchers - Client Data Grid iRODS controlled workflows Data Grid iRODS controlled workflows Shared Collection Storage Storage Storage Storage Consensus on Policies and Procedures controls the shared data within the federation

Extensibility Operations on Name Spaces

Community-Based Collection Life Cycle Each life cycle stage re-purposes the original collection Project Collection Private Local Policy Data Grid Shared Distribution Policy Data Processing Pipeline Analyzed Service Policy Digital Library Published Description Policy Reference Collection Preserved Representation Policy Federation Sustained Re-purposing Policy Stages correspond to addition of new policies to support a broader community The evolution of policies quantifies how impact is broadened

Accomplishments Installed three data grids OOI : Drexel engineering : USC Hydrology Installed Federation hub at RENCI Based on version 3.1 of iRODS data grid Federated with EUDAT, NCDC Created engineering digital library Integration of MediaWiki with iRODS Automated hydrology workflows Established collaborations with NCDC, NCCS, EarthCube

icat.oceanobservatories.org, port 1247 ooi-ucsdResc1 ooi-osuResc1 DataNet Federation Communication Ports Port 1247 Port 1247 ooi Zone icat.oceanobservatories.org, port 1247 4 resources at ucsd_irods.oceanobservatories.org icat.oceanobservatories.org cg_east_whoi.oceanobservatories.org ooi.coas.oregonstate.edu hydrology Zone iren.renci.org, port 2823 2 resources at ce-broad.ce.sc.edu iren.renci.org res-dfcmain Port 2823 Port 1247 Port 2823 Port 2823 Port 1237 Port 1247 Port 1247 Port 1237 usc-resource Port 1237 renci-vault19 ooi-icatResc1 ooi-cgResc1 dfcmain Zone iren.renci.org, port 1237 Federates with 4 zones 2 resources at iren.renci.org srbbrick15.ucsd.edu renci-vault2 renci-vault1 Port 1237 Port 1237 Port 1237 Port 1247 Port 1247 Port 1247 Port 1247 res-bk15 renci Zone iren.renci.org, port 1247 > 10 resources engineering Zone edge.cs.drexel, port 1247 1 resource edge.cs.drexel Port 1247 Port 1247 Port 1247 Port 1247 loadingResc europa-vault1 resource group edge

iRODS Integration in MediaWiki Date: July 10th, 2012

New features – iRODS wikipage Any mediawiki page that is added or edited from now on is synchronized with iRODS (a copy of the page is stored on iRODS server) You know if a page is synchronized with iRODS by looking at the bottom of a page, under “Irods Report”:

iRODS File Details

Hydrology Use Cases VIC model automation (USC) RHESSys model automation (UNC-CH, EarthCube) Sharing of workflows NCDC archiving of data from OOI SigClimate sustainability group (NCDC, NCCS)

Eco-Hydrology Choose gauge or outlet (HIS) RHESSys workflow to develop a nested watershed parameter file (worldfile) containing a nested ecogeomorphic object framework, and full, initial system state. Extract drainage area (NHDPlus) Digital Elevation Model (DEM) Slope Aspect Nested watershed structure Streams (NHD) Soil and vegetation parameter files Roads (DOT) Strata Patch Land Use NLCD (EPA) Hillslope Basin Leaf Area Index Landsat TM Stream network Phenology MODIS Flowtable Worldfile Soil Data USDA RHESSys

Workflow Management Workflow file eCWkflow.mss Directory holding all input and output files associated with workflow file (mounted collection that is linked to the workflow file) /earthCube/eCWkflow eCWkflow.run Automatically generated run file for Executing each input file eCWkflow2.run Input parameter file, lists parameters and input and output file names eCWkflow.mpf eCWkflow2.mpf Directory holding all output files generated for invocation of eCWkflow.run, the version number is incremented /earthCube/eCWkflow/eCWkflow.runDir0 Outfile Output file created for eCWKflow.mpf /earthCube/eCWkflow/eCWkflow2.runDir0 Output file created for eCWKflow2.mpf Newfile

Workflow Re-execution & Sharing eCWkflow.mss imcoll imcoll …. /earthCube/eCWkflow /hydrology/myWkflow eCWkflow.run myWkflow.run myWkflow.mpf eCWkflow.mpf /earthCube/eCWkflow/eCWkflow.runDir0 /hydrology/myWkflow/myWkflow.runDir0 Outfile Outfile /earthCube/eCWkflow/eCWkflow.runDir1 /hydrology/myWkflow/myWkflow.runDir1 Outfile Outfile

Re-use Architecture Components Research Environment {9} Portals, Applications {5}, Workflows {2} After generating results within a collaboration environment Apply appropriate policies and procedures to publish the results as a digital library Register a Community Resource that can be used by subsequent research initiatives Collaboration Environment – Data Grid {9} Protocols {0} Web Services {6} --------------------- Protocols {0} Brokers {7} Protocols {0} Web Services {6} --------------------- Protocols {0} Community Resource Collaboration Environment – Data Grid {9}

Education Develop policies and procedures to make “live” collections accessible by students Support classification, categorization, feature detection algorithms Integrate with student digital libraries UNC-CH School of Information and Library Science LifeTime Library Students build their own personal reference collection

Life-Time Library (SILS) Student digital libraries Enable students to build collections of Photographs MP3 audio files Video Class documents Web site archive Resources provided by School of Information and Library Science Student collections range from 2 GBytes to 150 Gbytes Number of files from 2000 to 12,000

LifeTime Library Policies Integrity Replication Checksums Versioning Management Strict access controls Quotas Metadata catalog replication Installation environment archiving Ingestion Automated synchronization of student directory with LifeTime Library