Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Fabric IG Introduction. 2  about 50 interviews & about 75 community interactions  Data Management and Processing is too time consuming and costly.

Similar presentations


Presentation on theme: "Data Fabric IG Introduction. 2  about 50 interviews & about 75 community interactions  Data Management and Processing is too time consuming and costly."— Presentation transcript:

1 Data Fabric IG Introduction

2 2  about 50 interviews & about 75 community interactions  Data Management and Processing is too time consuming and costly due to organization heterogeneity.  Federating data including logical layer information (tracing provenance, understanding creation context, checking identity and integrity, etc.) is too costly.  DM and DP is not ready for Big Data due to the lack of usage of automated procedures incorporating proper data organization mechanisms. Observations I from Recent Overview

3 3  Due to lack of software that is supporting proper data organizations we continue to create legacy data.  Example: a key biologists is spending 75% of his time for data management - a waste of money and human capital.  To a large extent results are not reproducible.   Senior Domain People agree:  need a change data organization and procedures  but risky path and lack data professionals  people hesitate since they miss clear perspectives Observations II from Recent Overview

4 4 Data Fabric Sketch a very rough sketch of the Data production and processing machinery of data-driven science One Big Question for RDA: How can we maximally support this machinery unload researchers, make science reproducible, etc.

5 5 “Data Fabric” based on Recent Overview data one can work with organized data sharable & re-usable data often all in file system often a lot of copying in file system

6 6 One concrete conclusion  current practice: a data collection comes with its own data organization, management and access solutions  future: there is no need for this heterogeneity since DOs can be treated content independent to a certain extent

7 7  started with a number of WG activities in P1  DTR, PP, PIT, MD, DFT – the old ones  some people found this urgent and interesting topics  almost at the end and some questions: what now, how does it all fit into landscape, etc.?  they started this DF brainstorming  even more groups started  are there general themes in the data landscape  the whole issue of data publishing/citation/etc.  the whole issue of scientific culture/legal & ethical aspects  our daily data work in the departments – the Data Fabric  may be more Until now in RDA

8 8  what is the scope of RDA’s Data Fabric?  what are the characteristics of RDA’s Data Fabric? (term is used in industry already: efficient computational machine)  what are the components of RDA’s Data Fabric?  what should the DF IG do within RDA and what not? A few Questions arise

9 9  DF is about  making departments’ data science reproducible  creating the conditions for trust in the anonymous data domain  identifying mechanisms, components and interfaces making data science efficient and cost effective  discussing cross-disciplinary approaches  defining a framework that allows to include new components or component variants in a flexible way  Example:  DF will state necessity of a worldwide available machinery to register & resolve DOs, we will say something about registered attribute types and specify an API  but we will not say how to implement and use such a system Scope and Characteristics of RDA’s DF

10 10  DF is NOT about  prescribing an overarching architecture we need to follow  specifying an implementation of such an architecture  discussing specific technologies and tools  more than discussing the processing machinery (not publication, citation, l & e, etc.)  DF is about highly automated procedures or at least guidance to follow such procedures. Scope and Characteristics of RDA’s DF

11 11  domain of registered data objects (DO) incl. basic organization principles (data, code, knowledge)  domain of registered actors (ORCID, etc.)  domain of trusted repositories for DOs  accepted policy principles (proper organizing mechanisms, self-documenting, certified, etc.)  set of trusted registries (types&concepts, metadata and provenance schemas, metadata instances, repositories, PIDs, policies, etc.)  what about semantics – so important!  much already out there, need to see how this can all fit together and how we can foster software development Components of RDA’s DF (just first ideas)

12 12  DF IG must be an inclusive open platform for interaction  DF IG needs to place the various WGs/IGs on the landscape  DF IG needs to identify barriers across groups  DF IG can work as umbrella to maintain WG results  open position papers will summarize the state of discussions and provoke convergence debates  it will NOT take council’s of TAB’s role DF IG way of acting

13 13 1.What are DF’s Scope and Characteristics? 2.What are DF’s components, interfaces, mechanisms? 3.How should DF act? 4.Who will chair DF IG? Task of today DF IG BoF


Download ppt "Data Fabric IG Introduction. 2  about 50 interviews & about 75 community interactions  Data Management and Processing is too time consuming and costly."

Similar presentations


Ads by Google