C2CAMP (A Working Title)

Slides:



Advertisements
Similar presentations
Preserving and Sharing Digital Data Greg Colati, Director, Archives and Special Collections May 11, 2012.
Advertisements

A Unified Approach to Combat Counterfeiting: Use of the Digital Object Architecture and ITU-T Recommendation X.1255 Robert E. Kahn President & CEO CNRI,
Information Types and Registries Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Lecture Nine Database Planning, Design, and Administration
SaaS, PaaS & TaaS By: Raza Usmani
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
Workshop on Brokering in Data Fabrics - community perspectives -
RDA 9th Plenary Breakout 3, 5 April :00-17:30
2nd GEO Data Providers workshop (20-21 April 2017, Florence, Italy)
Pasquale Pagano (CNR-ISTI) Project technical director
Motivation, Terminology, Layered systems (and other random stuff)
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
1 TOOL DESIGN A Review of Learning Design:
RDA to Deliver Why? What? When? How?.
Discovering Computers 2010: Living in a Digital World Chapter 14
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
RDA Data Fabric (DF) Interest Group Peter Wittenburg & Gary Berg-Cross
EOSC MODEL Pasquale Pagano CNR - ISTI
Donatella Castelli CNR-ISTI
Data Type Registries #2 12 Month Status Larry Lannom, Tobias Weigel Date Location TBD? CC BY-SA 4.0.
Data Ingestion in ENES and collaboration with RDA
Data Type Registries Breakout
Distribution and components
Grid Computing.
Active Data Management in Space 20m DG
CHAPTER 3 Architectures for Distributed Systems
Object oriented system development life cycle
Data Fabric Interest Group Plenary 9 Core Session Barcelona
Maggie, Carlo, Peter, Rebecca (GEDE discussions)
GRID COMPUTING PRESENTED BY : Richa Chaudhary.
Outline Pursue Interoperability: Digital Libraries
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 1 Database Systems
Hydra: a case study Chris Awre
Dr. Awad Khalil Computer Science Department AUC
Cyberinfrastructure in practice
Software Architecture
Geospatial and Problem Specific Semantics Danielle Forsyth, CEO and Co-Founder Thetus Corporation 20 June, 2006.
From Observational Data to Information (OD2I IG )
Hydrographic Data as a Service
WIS Strategy – WIS 2.0 Submitted by: Matteo Dell’Acqua(CBS) (Doc 5b)
Digital Object Interface Protocol (DOIP)
2. An overview of SDMX (What is SDMX? Part I)
Metadata Framework as the basis for Metadata-driven Architecture
Distributed Systems through Web Services
Middleware, Services, etc.
Automating Profitable Growth™
Metadata The metadata contains
Chapter 1 Database Systems
Jisc Research Data Shared Service (RDSS)
Overview of Workflows: Why Use Them?
Bird of Feather Session
ENVRI Reference Model (RM) Information Viewpoint components
School of Information Studies, Syracuse University, Syracuse, NY, USA
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Dr. Awad Khalil Computer Science Department AUC
Introduction to SOA Part II: SOA in the enterprise
Presentation transcript:

C2CAMP (A Working Title) Managing Digital Objects in an Expanding Science Ecosystem Jan 2018 Larry Lannom Corporation for National Research Initiatives

C2CAMP (Cross-Continental Collection & Management Pilot) Loose association of research groups with some agreed-upon goals Proposed multi-party distributed test bed based on open specifications across a minimal set of (mostly) existing components and interfaces allowing users to deal with Digital Objects efficiently Data producers and managers invited to prototype their work flows and other processes in the distributed test bed Solicit the creation of additional components and interfaces as needed to meet the requirements evolving from prototypic use of the test bed Demonstrate complex scientific workflows for data processing harmonized and automated across communities by using interchangeable infrastructure components and a structured resource market approach Corporation for National Research Initiatives

What Problem are We Trying to Solve? On the processing side – the vast amounts of data now being collected and generated require new levels of automation. Point solutions are terribly inefficient Core data processing is the same in physics as it is in medicine We need common tools and processes for the levels at which all data is the same On the access side – simply making data available (DOI to repository), challenging as that may sometimes be, is not sufficient for widespread sharing and re-use of scientific data. How can users interpret the data? What do they know other than format? Provenance? Documentation? Can they combine it with other data sets? What tools can they use? Where is the detailed metadata? We see the need for a common middle layer between clients of all kinds and storage of all kinds, allowing focus on the difficult semantics/knowledge challenges. Corporation for National Research Initiatives

What Exactly are we Proposing to Do? Implement a prototype distributed environment based on the digital object model Everything in the environment is a digital object For basic information management tasks every object can be treated the same, regardless of information content Every object has a globally unique and actionable identifier Every object is typed Every object has tightly associated metadata Every object has a query-able set of operations that can be performed on it Start with the minimal set of components and services that enable the DO model Identifiers + Resolution System Types + Type Registries DO Repositories, including repositories of metadata, aka, registries Mapping/brokering software & services to map existing data storage and management systems to DOs Digital Object Interface Protocol, implemented by DO Repositories Open the environment to as many use cases as possible to hone the core infrastructural pieces Corporation for National Research Initiatives

Global Digital Object Cloud (GDOC) ID: 843… G (object:publication) Identifier Service Identifier Service ID: 987/… 101110010101001010 010101010101010100 111110101101010111 Repo/Registry Repo/Registry Repo/Registry Identifier Service Repo/Registry Repo/Registry (object:dataset) ID: 123… ID: 876… A ID: XZY… ID: HGY… (object:collection) End users, developers, and automated processes deal with persistently identified, virtually aggregated digital objects, including collections which are overlays on multiple network services which in turn are overlays on existing or future information storage systems.

Global Digital Object Cloud (GDOC) Identifier Service Identifier Service Repo/Registry Repo/Registry Repo/Registry Identifier Service Repo/Registry Repo/Registry which in turn are overlays on existing or future information storage systems. End users, developers, and automated processes deal with persistently identified, virtually aggregated digital objects, including collections ID: 123… ID: 876… A ID: XZY… ID: HGY… which are overlays on multiple network services

Global Digital Object Cloud (GDOC) These services can be orchestrated to provide an object view of underlying storage, e.g., file systems, or basic data management systems, e.g., databases. Identifier Service Identifier Service Repo/Registry Repo/Registry Repo/Registry Identifier Service Repo/Registry Repo/Registry End users, developers, and automated processes deal with persistently identified, virtually aggregated digital objects, including collections ID: 123… ID: 876… A ID: XZY… ID: HGY… which in turn are overlays on existing or future information storage systems. which are overlays on multiple network services

Global Digital Object Cloud (GDOC) ID: 843… G (object:publication) ID: 987/… 101110010101001010 010101010101010100 111110101101010111 (object:dataset) ID: 123… ID: 876… A ID: XZY… ID: HGY… (object:collection) which are overlays on multiple network services Identifier Service Repo/Registry which in turn are overlays on existing or future information storage systems. End users, developers, and automated processes deal with persistently identified, virtually aggregated digital objects, including collections

Global Digital Object Cloud (GDOC) ID: 843… G (object:publication) The resulting set of identified and well-structured objects provide a common, and constant, view and ‘remote control’ management of data distributed in various locations and systems, which can change without changing the virtualized object. ID: 987/… 101110010101001010 010101010101010100 111110101101010111 (object:dataset) ID: 123… ID: 876… A ID: XZY… ID: HGY… (object:collection) End users, developers, and automated processes which are overlays on multiple network services Identifier Service Repo/Registry which in turn are overlays on existing or future information storage systems. deal with persistently identified, virtually aggregated digital objects, including collections

Global Digital Object Cloud (GDOC) All of these services exist today in one form or another, but some are not yet widely used and few are tightly coordinated and orchestrated in the way that is needed. Identifier Service Identifier Service Repo/Registry Repo/Registry Repo/Registry Identifier Service Repo/Registry Repo/Registry which in turn are overlays on existing or future information storage systems. End users, developers, and automated processes deal with persistently identified, virtually aggregated digital objects, including collections ID: 123… ID: 876… A ID: XZY… ID: HGY… which are overlays on multiple network services

Why is this a Good Idea? The Digital Object Model Simplifies the Solution Space Treat every information object the same until you have to differentiate among them to accomplish your purpose Push the current cacophony of information management and storage systems down a level of abstraction Objects are self-describing in that they carry their type information independent of their current system location The prototype will let us test the above assertions The prototype will be based on open standards and proven technology The proposed project has already gathered significant support and is coming out of the Research Data Alliance, broadly representing the international research data community Corporation for National Research Initiatives

Who Is We? Digital Object Model (CNRI) RDA Data Fabric Interest Group Reusable components, Automated Work Flows, Type-based operations Supporting Output “Recommendations for Implementing a Virtual Layer for Management of the Complete Life Cycle of Scientific Data” Brainstorming meeting Nov 16, 2017 Growing list of participants German Climate Center UK Natural History Museum (biodiversity) Swiss National Computing Center CNRI BRDI NCAR MPI Indiana U. Clarin FZ-Juelich CSIR Corporation for National Research Initiatives