National Science Foundation Cooperative Agreement: OCI-0940841.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Wayne Schroeder, Paul Tooby Data Intensive Cyber Environments Team (DICE) DICE Center, University of North Carolina at Chapel Hill; Institute for Neural.
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
A Very Brief Introduction to iRODS
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
iRODS: Interoperability in Data Management
Crossing the Digital Divide
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
1 CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Global Earth Observation Grid Workshop, Bangkok, Thailand, March Integration Platform.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
HydroShare: Advancing Hydrology through Collaborative Data and Model Sharing David Tarboton, Ray Idaszak, Jeffery Horsburgh, Dan Ames, Jon Goodall, Larry.
National Data Infrastructure Projects EarthCube Layered Architecture (GEO) DataNet Federation Consortium (OCI) integrated Rule Oriented Data System (SDCI)
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
HydroShare: An online collaborative environment for the sharing of hydrologic data and models IN11A-1510 We envision that HydroShare will enable more rapid.
DISTRIBUTED COMPUTING
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.
Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality Gary Berg-Cross, Keith.
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CaGrid Overview and Core Services caGrid Knowledge Center February 2011.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ocean Observatories Initiative OOI Cyberinfrastructure Data Management Michael Meisinger & David Stuebe OOI Cyberinfrastructure Life Cycle Objectives Milestone.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
National Science Foundation Cooperative Agreement: OCI
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
National Science Foundation Cooperative Agreement: OCI Reagan Moore, PI Mary Whitton, Project Manager.
CUAHSI HIS: Science Challenges Linking small integrated research sites (
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
OOI Cyberinfrastructure and Semantics OOI CI Architecture & Design Team UCSD/Calit2 Ocean Observing Systems Semantic Interoperability Workshop, November.
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
DataNet Federation Consortium
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
DataNet Collaboration
Design and Manufacturing in a Distributed Computer Environment
Policy-Based Data Management integrated Rule Oriented Data System
University of Technology
Bird of Feather Session
Presentation transcript:

National Science Foundation Cooperative Agreement: OCI

Compute Resources – HPC centers, institutional clusters DFC Collaboration Environment – Data Grid Community Resources – Repository, Catalog DFC Vision Build collaboration environment – Sharing of data, information, and knowledge Form national data cyberinfrastructure – Federation of existing data management systems Support reproducible data-driven research – Encapsulate knowledge within shared workflows Enable student participation in research – Policy-controlled analysis of “live” data NEW

Data Driven Science and Engineering Collaboration Environments – Oceanography – Ocean Observatory Initiative Archiving climatic data records from real-time sensor data streams – Engineering – CIBER-U Engineering Digital Library: Curating civil engineering data, materials data, archaeology data, student training materials – Hydrology- EarthCube Automating hydrology research workflows (data retrieval, transformation, analysis) – Plant biology – the iPlant Collaborative Enable collaborative research across existing data repositories – Cognitive science – the Temporal Dynamics of Learning Center Manage research data, apply IRB policies – Social Science – the Odum Institute Integrate policy-based data management with the existing Dataverse repository

Challenges Federated national data cyberinfrastructure Existing projects have web services, data repositories, digital libraries, archives, processing pipelines, science portals What are the interoperability mechanisms needed to enable federation of existing resources?

1.AstrophysicsAuger supernova search 2.Atmospheric scienceNASA Langley Atmospheric Sciences Center 3.BiologyPhylogenetics at CC IN2P3 4.ClimateNOAA National Climatic Data Center 5.Cognitive ScienceTemporal Dynamics of Learning Center 6.Computer ScienceGENI experimental network 7.Cosmic RayAMS experiment on the International Space Station 8.Dark Matter PhysicsEdelweiss II 9.Earth ScienceNASA Center for Climate Simulations 10.EcologyCEED Caveat Emptor Ecological Data 11.EngineeringCIBER-U 12.High Energy PhysicsBaBar / Stanford Linear Accelerator 13.HydrologyInstitute for the Environment, UNC-CH; Hydroshare 14.GenomicsBroad Institute, Wellcome Trust Sanger Institute, NGS 15.MedicineSick Kids Hospital 16.NeuroscienceInternational Neuroinformatics Coordinating Facility 17.Neutrino PhysicsT2K and dChooz neutrino experiments 18.OceanographyOcean Observatories Initiative 19.Optical AstronomyNational Optical Astronomy Observatory 20.Particle PhysicsIndra multi-detector collaboration at IN2P3 21.Plant geneticsthe iPlant Collaborative 22.Quantum ChromodynamicsIN2P3 23.Radio AstronomyCyber Square Kilometer Array, TREND, BAOradio 24.SeismologySouthern California Earthquake Center 25.Social ScienceOdum, TerraPop DFC Builds on the iRODS data grid (integrated Rule Oriented Data System)

Collection Defines Attribute Has Digital Object Has Collection Purpose Defines Policy Property Defines Controls Updates Persistent State Information Persistent State Information Policy Concept Graph Purpose Procedure Completeness Correctness Isa Consensus Consistency HasFeature Integrity Isa Authenticity Isa Access control HasFeature Property Replication Policy Checksum Policy Quota Policy Data Type Policy Isa Policy Workflow Isa Function Chains Operation Isa Updates GetUserACL SetDataType SetQuota DataObjRepl SysChksumDataObj Isa Procedure Isa DATA_ID DATA_REPL_NUM DATA_CHECKSUM Isa Persistent State Client Action Periodic Assessment Criteria Policy Policy Enforcement Point Invokes Has SubType Policy Enforcement

Policy-based Data Management – Implementation in iRODS Collection Purpose (5 main types) Purpose (5 main types) Completeness Correctness Consensus Defines Consistency Attribute HasFeature Has Defines Policy (11 default) Policy (11 default) Has Property (7 default) Defines Procedure (11 default) Controls Updates Clients (50) Periodic Assessment Criteria Policy Policy Enforcement Points (70) Workflow Invokes Has SubType Isa Micro-service (317) Chains Operation Isa Persistent State Information (338) Persistent State Information (338) Isa Digital Object Updates Has Replication Policy Checksum Policy Quota Policy Data Type Policy Isa Integrity Isa Authenticity Isa Access control Isa msiGetUserACL msiSetDataType msiSetQuota msiDataObjRepl msiSysChksumDataObj Isa DATA_ID DATA_REPL_NUM DATA_CHECKSUM Isa HasFeature Archive Data grid Collection Digital Library Processing Pipeline Archive Data grid Collection Digital Library Processing Pipeline SubType

Federation Approach Use middleware to implement unifying name spaces for: 1.UsersSingle sign-on 2.CollectionsDirectories, workflow, time series 3.ObjectsFiles, soft links, workflows 4.Storage systemsCloud, tape, file systems, objects 5.MetadataProvenance, description, state 6.PoliciesManagement, assessment 7.Micro-servicesProcedures, interactions DFC - CNI

Port: 1237, Zone: dfcmain iCAT iren2.renci.org iCAT iren2.renci.org hydroResc hydro.renci.org hydroResc hydro.renci.org res-bk15 srbbrick15.ucsd.edu res-bk15 srbbrick15.ucsd.edu res-dfcmain iren2.renci.org res-dfcmain iren2.renci.org demoResc iren2.renci.org demoResc iren2.renci.org renci Iren2.renci.org: 1247 renci Iren2.renci.org: 1247 ooi icat.oceanobservatories.org: 1247 ooi icat.oceanobservatories.org: 1247 TDLC tdlc-01.sdsc.edu: 6688 TDLC tdlc-01.sdsc.edu: 6688 odumMain iodum1.irss.unc.edu: 1247 odumMain iodum1.irss.unc.edu: 1247 dfctest dfctest.renci.org: 1248 dfctest dfctest.renci.org: 1248 engineering irods.ischool.drexel.edu: 1247 engineering irods.ischool.drexel.edu: 1247 hydrology iren2.renci.org: 2823 hydrology iren2.renci.org: 2823 DFC Federation Hub

National Infrastructure Research Environment - Portals, Applications, Workflows Research Environment - Portals, Applications, Workflows DFC Collaboration Environment – Data Grid DFC Collaboration Environment – Data Grid Community Resource Repository Community Resource Repository Community Resource Catalog Community Resource Catalog Community Resource Services Community Resource Services Existing infrastructure XSEDE Kepler OOI TDLC iPlant CUAHSI NCDC Dataverse GeoBrain DataONE NCSA Polyglot DFC - CNI

The Challenge: Support reproducible data-driven research Deliver the capability to manage, mine, and publish knowledge through collaboration environments. Experiments Archives Sensors Literature Simulation The Future: Reproducible Research DFC - CNI

National Infrastructure Approach 1.Build national data cyberinfrastructure prototype – Support multiple science and engineering domains by loosely coupling their existing infrastructure with a collaboration environment 2.Develop generic interoperability framework – Define the generic infrastructure needed for the national infrastructure to manage knowledge as well as data and information 3.Define interoperability mechanisms – Support access across the disparate types of infrastructure in common use 4.Define domain specific extensions – Support three levels: technical interoperability, project level policy, and end user usage requirements

Interoperability Mechanisms Information Collection Registration Information Exchange Soft Links Message Queue Information Manipulation Database Query Policies control execution of each interoperability mechanism Data Data Access Data Manipulation Micro-services Storage Driver Knowledge Knowledge Creation Analysis Workflows Knowledge Management Procedures : Micro-services DFC - CNI

DataNet Interoperability Research Environment - Portals, Applications, Workflows DFC Collaboration Environment Message Queue Web Service DataONE Member Node TerraPop Server SEAD Portal (VIVO) DataONE Coordinating Node SEAD Engagement Center DFC Data Grid DFC Data Grid SEAD Data DFC Data Grid DFC Data Grid DFC - CNI

DFC Interoperability Layers Authentication Workflows Data Manipulation Networks PAM / GSSAPI InCommon, GSI, Kerberos, Shibboleth, LDAP Micro-Services Kepler, NCSA Cyberintegrator, Taverna, NCSA Polyglot Format Drivers NetCDF, HDF5, THREDDS, ERDDAP Network Drivers HTTPS, TCP/IP, Parallel TCP/IP, RBUDP Data Access Micro-Services DataONE, Data Conservancy, CUAHSI, NCDC DFC - CNI Clients Vocabulary Messaging Management OpenSocial Web browsers, Web Services, Workflows, FUSE, Synchronization, MediaWiki Micro-Services HIVE, (Cheshire) Micro-Services AMQP, iRODS Xmsg Policies (RDA Policies), (ISO Criteria) Storage Systems Storage Drivers File Systems, Tape Archives, Object Stores, Cloud Storage

Interoperability Mechanisms Drivers – Encapsulate knowledge to support your operations at the remote repository: partial I/O, parsing of formats, manipulation of data structures – Authentication, format, storage Micro-services – Encapsulate knowledge needed to interact with an external system or with a data set using the remote protocol – Data access, external workflows, semantics, messaging Policies – Encapsulate knowledge needed for management functions – Federation control, administrative tasks, validation checks

Assertion Three basic types of interoperability mechanisms are sufficient for assembling national data cyberinfrastructure Example: Linked software defined networks to data grids – From an iRODS data grid, controlled the selection of three disjoint network paths for optimizing data transport by adding appropriate policy enforcement points and micro-services Expect functionality currently in data grid middleware to migrate into network middleware

Future Architecture Clients Resources Data Grid Middleware Clients Network Middleware Data Grid Middleware Resources DFC Federation GEMI - GENI Virtual collection Virtual network

Contacts Reagan W. Moore National Science Foundation Cooperative Agreement: OCI DFC - CNI