Download presentation
Presentation is loading. Please wait.
Published byKory Maxwell Modified over 6 years ago
1
Data Fabric Interest Group Plenary 9 Core Session Barcelona
2
Agenda Welcome, Introduction (5 minutes)
Election of new Co-Chair (5 minutes) Review of Activities (30 minutes) Global Digital Object Cloud Update (15 minutes) Discussion/Questions(20 minutes) Gaps/Opportunities/Next Steps (15 minutes)
3
Co-Chair Election
4
DFIG Current Activities
Ecosystems and Core Components Recommendations ( Aggregate slide deck (DF IG Documents RDA wiki page) Common Governance and Operating Procedures GEDE group Metadata and PIDs PID Kernel Subgroup -> PID Kernel WG proposal Brokering Services DFIG/Brokering Workflows Training and Education DFIG/ETHRD workshop planning session
5
Types of Data Fabrics We can differentiate between
user data fabrics to support discovery and access to published data collaboration data fabrics that support processing of shared collections repository data fabrics that are focusing on preserving data Supported virtualized entities in these DFs are data collections that include the context of DOs workflows encapsulating analyses data flows managing data transport Essential capabilities are interoperability, federation, interaction control * Source: Reagan Moore
6
Nature of a Data Fabric Data Fabrics in the above sense are blueprints to create generic infrastructures that support virtualisation of collections, workflows and data flows Instantiations of Data Fabrics will offer a set of services some of which are core and others are optional Data Fabrics are NOT instantiations of a specific collection, workflow or data flow.
7
Defining Core Components
configuration A configuration B Task to solve: Identify and specify Common Components (CoCo) Recommend CoCo Put CoCo in place Not ONE architecture: Identify CoCos that could cooperate in specific configurations to solve a function (infra, VRE, etc.), Common Components & Services Specific Components & Services
8
Identifying Core Components
Core Data Type Definitions, Metadata Standards and Vocabularies Trustworthy Data Repositories Trustworthy, Machine-Actionable Registries of Repositories, Data Types, Metadata Standards, Vocabularies, Authorization Records, Licenses PID Services Collection Services Brokering Services Common Governance and Operating Procedures Training and Education
9
From Core Components to Data Fabrics
Configurations must be driven by workflows and use cases Increasing scale requires moving away from Human Controlled Processing to Type-Triggered Automatic Processing Component configurations should enable an ecosystem of tools and services
10
Human Controlled Processing (HCP)
Observations Experiments Simulations etc. Cycle can be manually controlled or semi-automatically via pre-set pipelines. Even in case of semi-automatic pipelines humans are close-in "designers“
11
some kind of profile matching Researchers are not in direct control
Type-Triggered Automatic Processing (T-TAP) Data Events New feature: cycles run highly autonomously - precise steps depend on the types of data entering the workflow exposing new DOs Structured Data Markets adding new data some kind of profile matching Researchers are not in direct control Data Federation Agents Data Type Registry Processing services Brokering & Mediation services result scripts
12
Use Cases A neurologist wants to research the causal relation between Alzheimer phenomena and specific genes, proteins, neural activity, etc., using machine learning algorithms on confidential data from a federation of hospitals and labs. A linguist researches theories about „economy of languages“ finding objective patterns that make languages more or less easy to process and learn by applying machine learning algorithms on open data from a variety of sources filtered by languages and feature The data manager of a large data centre must continuously and asynchronously check the quality of new data of specific types, transform it according to certain rules, and create n replications in a federation
13
Recommendations Update
PID Focus Area work is progressing GEDE Europe ( was highly active with f2f and virtual meetings Result is a new report: Grouped List of Assertions (also uploaded to DFIG pages) consultation of in total 25 reports and papers suggested by participants extraction of <60 assertions from all documents then classification of these assertions into sections (1. nature of PIDs and PID systems, 2. their relevance, 3. assigning PIDs, 4. using PIDs, 5. Handles and DOIs, 6. others) much agreement in core assertions some variety in way of assigning and using PIDs
14
Areas of discussion PID in binding role, which type of attribute to add to PID record or to landing page type of attributes need to be machine readable and specified how to indicate versions time of assignment of PIDs granularity of PID assignment role of repositories (trustworthy) in assigning use of fragment indicators how to add life cycle statements (deletion, splitting, merging, etc.) when Handles and when DOIs
15
Next Steps broad commenting on summary assertions by RDA/DFIG and GEDE people within April 17 via web pages and P9 sessions virtual meeting in May (DFIG and GEDE groups) f2f meeting in June/July to finish the main summary assertions afterwards a final report on agreements and identifying areas of disagreements start interacting about next topic area primary areas of interest could be „repositories“ (tasks, interfaces, data organisation, etc.) and „data processing“ (workflows, type triggered, etc.)
16
PIDs remain central PID Record PID PID CKSM PID PID paths PID Metadata
Rights data copies Relations Provenance
17
PID Kernel Update Worked started in Denver at P8
Working groups met over the last 6 months Draft profile created PID Kernel Working Group Case Statement Submitted Work completes at P11
18
Global Digital Object Cloud (GDOC)
ID: 843… G (object:publication) Identifier Service Identifier Service ID: 987/… Repo/Registry Repo/Registry Repo/Registry Identifier Service Repo/Registry Repo/Registry (object:dataset) ID: 123… ID: 876… A ID: XZY… ID: HGY… (object:collection) End users, developers, and automated processes deal with persistently identified, virtually aggregated digital objects, including collections which are overlays on multiple network services which in turn are overlays on existing or future information storage systems.
19
GDOC – Is it Real? Storage – not our problem, but Services
Latency is an issue Changing interfaces can be a problem Services Identifier Common resolution systems PID Kernel, Profiles Repo/Registry Common APIs Confusion: Repository not equal to Storage Confusion: Registry is a Repository of metadata objects Object Level Common Object Interface must be provided by Repo/Registry Collections ARE Objects Clients Good News / Bad News – web browser remains universal client
20
GDOC – Is it Real? CONCLUSION: Evolution needed & inevitable; RDA can help drive it DFIG, Brokering, PID Kernel, Collections, DTR, ….
21
Gaps/Opportunities Further progress on Machine-Actionable Registries
DFT for vocabulary - needs population and use Have DTR for data types - needs testing and iteration R3Data for Repositories - need a machine-actionable equivalent Metadata Catalog - machine actionable catalog is a pending RDA WG Not sure if anyone is working on Authorization and License registries Governance and Operating Procedures Need for this will become critical as soon as test beds and functional ecosystems are available PIDs Linked Open Data community needs Recommendations for workflow vs publication
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.