Data Fabric Interest Group Plenary 9 Core Session Barcelona

Slides:



Advertisements
Similar presentations
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Welcome to the Conference !! Juan Bicarregui Chair, APA Executive.
Advertisements

Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Introducing Symposia : “ The digital repository that thinks like a librarian”
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Requirements for Epidemic Information Management Farrukh Najmi XML Standards Architect Sun Microsystems
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Content Strategy.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Requirements as Usecases Capturing the REQUIREMENT ANALYSIS DESIGN IMPLEMENTATION TEST.
Hydro DWG at the RDA Plenary: BoF and Aligning HDWG work with WMO expectations and timeline Sylvain, Tony, Silvano, Ilya.
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg.
Towards a Benchmark for the Evaluation of LD Expressiveness and Suitability Manuel Caeiro Rodríguez
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Research Data Repository Interoperability Thomas Jejkal.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
Moshe Shechter | Alma Product Manager
Core LIMS Training: Key Concepts & Definitions.
Sample Registration - Introduction
RDA 9th Plenary Breakout 3, 5 April :00-17:30
Overview of WGs, IGs and BoFs
DFIG and Workflows Tobias Weigel, Peter Wittenburg, Larry Lannom, Jay Pearlman, Stefano Nativi, Christine Staiger, Reagan Moore, Bridget Almas, Rainer.
Issues need harmonization
RDA to Deliver Why? What? When? How?.
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
WHY? - Found initiative while case statement preparation
RDA Data Fabric (DF) Interest Group Peter Wittenburg & Gary Berg-Cross
Usage scenarios, User Interface & tools
Network instantiation
WG Research Data Collections RDA P10 Montréal – September 2017
Data Ingestion in ENES and collaboration with RDA
Fernando Aguilar, IFCA-CSIC
Metadata Editor Introduction
Research Data Collections WG Plenary 9 Barcelona
Flexible Extensible Digital Object Repository Architecture
Notification Service May 19, 2006 Jon Atherton Mark Mara.
BoF: VREs- Keith G Jeffery & Helen Glaves
Flexible Extensible Digital Object Repository Architecture
Maggie, Carlo, Peter, Rebecca (GEDE discussions)
Data Type Registries (DTR)
Agenda Welcome and overview (Peter)
C2CAMP (A Working Title)
Standard Scripts Project 2
Agenda welcome and goals (Peter)
Brief WG/IG reporting Tobias Weigel on behalf of co-chairs
From Observational Data to Information (OD2I IG )
Digital Object Interface Protocol (DOIP)
WG Research Data Collections Draft outputs of a RDA bottom-up effort P9 - April 2017 Co-chairs: Bridget Almas, Frederik Baumgardt, Tobias Weigel, Thomas.
WG Research Data Collections An overview of the recommendation
Using the RDA Collections API to Shape Humanities Data
Repository Platforms for Research Data Interest Group: Requirements, Gaps, Capabilities, and Progress Robert R. Downs1, 1 NASA.
Agenda (AM) 9:30-10:15 Introduction to RDA
CMIP6 use case and adoption of RDA outputs
Standard Scripts Project 2
Overview of Workflows: Why Use Them?
Bird of Feather Session
4/5 May 2009 The Palazzo dei Congressi di Stresa Stresa, Italy
RDA uptake activities and plans: ESGF
Workflow and the WfMC Standards
Digital Object Management for ENES: Challenges and Opportunities
Standard Scripts Project 2
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July Tobias Weigel (DKRZ)
Contract Management Software 100% Cloud-Based ContraxAware provides you with a deep set of easy to use contract management features.
High-Level Group for the Modernisation of Official Statistics
Presentation transcript:

Data Fabric Interest Group Plenary 9 Core Session Barcelona

Agenda Welcome, Introduction (5 minutes) Election of new Co-Chair (5 minutes) Review of Activities (30 minutes) Global Digital Object Cloud Update (15 minutes) Discussion/Questions(20 minutes) Gaps/Opportunities/Next Steps (15 minutes)

Co-Chair Election

DFIG Current Activities Ecosystems and Core Components Recommendations (https://hdl.handle.net/11304/a3d012ca-4e23-425e-9e2a-1e6a195b966f) Aggregate slide deck (DF IG Documents RDA wiki page) Common Governance and Operating Procedures GEDE group Metadata and PIDs PID Kernel Subgroup -> PID Kernel WG proposal Brokering Services DFIG/Brokering Workflows Training and Education DFIG/ETHRD workshop planning session

Types of Data Fabrics We can differentiate between user data fabrics to support discovery and access to published data collaboration data fabrics that support processing of shared collections repository data fabrics that are focusing on preserving data Supported virtualized entities in these DFs are data collections that include the context of DOs workflows encapsulating analyses data flows managing data transport Essential capabilities are interoperability, federation, interaction control * Source: Reagan Moore

Nature of a Data Fabric Data Fabrics in the above sense are blueprints to create generic infrastructures that support virtualisation of collections, workflows and data flows Instantiations of Data Fabrics will offer a set of services some of which are core and others are optional Data Fabrics are NOT instantiations of a specific collection, workflow or data flow.

Defining Core Components configuration A configuration B Task to solve: Identify and specify Common Components (CoCo) Recommend CoCo Put CoCo in place Not ONE architecture: Identify CoCos that could cooperate in specific configurations to solve a function (infra, VRE, etc.), Common Components & Services Specific Components & Services

Identifying Core Components Core Data Type Definitions, Metadata Standards and Vocabularies Trustworthy Data Repositories Trustworthy, Machine-Actionable Registries of Repositories, Data Types, Metadata Standards, Vocabularies, Authorization Records, Licenses PID Services Collection Services Brokering Services Common Governance and Operating Procedures Training and Education

From Core Components to Data Fabrics Configurations must be driven by workflows and use cases Increasing scale requires moving away from Human Controlled Processing to Type-Triggered Automatic Processing Component configurations should enable an ecosystem of tools and services

Human Controlled Processing (HCP) Observations Experiments Simulations etc. Cycle can be manually controlled or semi-automatically via pre-set pipelines. Even in case of semi-automatic pipelines humans are close-in "designers“

some kind of profile matching Researchers are not in direct control Type-Triggered Automatic Processing (T-TAP) Data Events New feature: cycles run highly autonomously - precise steps depend on the types of data entering the workflow exposing new DOs Structured Data Markets adding new data some kind of profile matching Researchers are not in direct control Data Federation Agents Data Type Registry Processing services Brokering & Mediation services result scripts

Use Cases A neurologist wants to research the causal relation between Alzheimer phenomena and specific genes, proteins, neural activity, etc., using machine learning algorithms on confidential data from a federation of hospitals and labs. A linguist researches theories about „economy of languages“ finding objective patterns that make languages more or less easy to process and learn by applying machine learning algorithms on open data from a variety of sources filtered by languages and feature The data manager of a large data centre must continuously and asynchronously check the quality of new data of specific types, transform it according to certain rules, and create n replications in a federation

Recommendations Update PID Focus Area work is progressing GEDE Europe (https://www.rd-alliance.org/groups/gede-group-european-data-experts-rda) was highly active with f2f and virtual meetings Result is a new report: Grouped List of Assertions (also uploaded to DFIG pages) consultation of in total 25 reports and papers suggested by participants extraction of <60 assertions from all documents then classification of these assertions into sections (1. nature of PIDs and PID systems, 2. their relevance, 3. assigning PIDs, 4. using PIDs, 5. Handles and DOIs, 6. others) much agreement in core assertions some variety in way of assigning and using PIDs

Areas of discussion PID in binding role, which type of attribute to add to PID record or to landing page type of attributes need to be machine readable and specified how to indicate versions time of assignment of PIDs granularity of PID assignment role of repositories (trustworthy) in assigning use of fragment indicators how to add life cycle statements (deletion, splitting, merging, etc.) when Handles and when DOIs

Next Steps broad commenting on summary assertions by RDA/DFIG and GEDE people within April 17 via web pages and P9 sessions virtual meeting in May (DFIG and GEDE groups) f2f meeting in June/July to finish the main summary assertions afterwards a final report on agreements and identifying areas of disagreements start interacting about next topic area primary areas of interest could be „repositories“ (tasks, interfaces, data organisation, etc.) and „data processing“ (workflows, type triggered, etc.)

PIDs remain central PID Record PID PID CKSM PID PID paths PID Metadata Rights data copies Relations Provenance

PID Kernel Update Worked started in Denver at P8 Working groups met over the last 6 months Draft profile created PID Kernel Working Group Case Statement Submitted Work completes at P11

Global Digital Object Cloud (GDOC) ID: 843… G (object:publication) Identifier Service Identifier Service ID: 987/… 101110010101001010 010101010101010100 111110101101010111 Repo/Registry Repo/Registry Repo/Registry Identifier Service Repo/Registry Repo/Registry (object:dataset) ID: 123… ID: 876… A ID: XZY… ID: HGY… (object:collection) End users, developers, and automated processes deal with persistently identified, virtually aggregated digital objects, including collections which are overlays on multiple network services which in turn are overlays on existing or future information storage systems.

GDOC – Is it Real? Storage – not our problem, but Services Latency is an issue Changing interfaces can be a problem Services Identifier Common resolution systems PID Kernel, Profiles Repo/Registry Common APIs Confusion: Repository not equal to Storage Confusion: Registry is a Repository of metadata objects Object Level Common Object Interface must be provided by Repo/Registry Collections ARE Objects Clients Good News / Bad News – web browser remains universal client

GDOC – Is it Real? CONCLUSION: Evolution needed & inevitable; RDA can help drive it DFIG, Brokering, PID Kernel, Collections, DTR, ….

Gaps/Opportunities Further progress on Machine-Actionable Registries DFT for vocabulary - needs population and use Have DTR for data types - needs testing and iteration R3Data for Repositories - need a machine-actionable equivalent Metadata Catalog - machine actionable catalog is a pending RDA WG Not sure if anyone is working on Authorization and License registries Governance and Operating Procedures Need for this will become critical as soon as test beds and functional ecosystems are available PIDs Linked Open Data community needs Recommendations for workflow vs publication