Data Fabric IG Introduction. 2  about 50 interviews & about 75 community interactions  Data Management and Processing is too time consuming and costly.

Slides:



Advertisements
Similar presentations
Open repositories: value added services The Socionet example Sergey Parinov, CEMI RAS and euroCRIS.
Advertisements

IATI Technical Advisory Group Technical Proposals Simon Parrish IATI Technical Advisory Group, DIPR March 2010.
Personalized Presentation in Web-Based Information Systems Institute of Informatics and Software Engineering Faculty of Informatics and Information Technologies.
Configuration management
Engineering Medical Information Systems
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Welcome to the Conference !! Juan Bicarregui Chair, APA Executive.
Automated Policy Enforcement Adam Vincent, Layer 7 Federal Technical Director
Computer Systems & Architecture Lesson Software Product Lines.
Enterprise Architecture
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
A Robust Health Data Infrastructure P. Jon White, MD Director, Health IT Agency for Healthcare Research and Quality
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices.
Provenance Metadata for Shared Product Model Databases Etiel Petrinja, Vlado Stankovski & Žiga Turk University of Ljubljana Faculty of Civil and Geodetic.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality Gary Berg-Cross, Keith.
International Telecommunication Union Geneva, 9(pm)-10 February 2009 ITU-T Security Standardization on Mobile Web Services Lee, Jae Seung Special Fellow,
Planning and Integrating Curriculum: Unit 4, Key Topic 1http://facultyinitiative.wested.org/1.
Summary Data Practices Report Peter Wittenburg Max Planck Data & Compute Center former MPI for Psycholinguistics.
XML Registries Source: Java TM API for XML Registries Specification.
Chapter 1: Introduction Omar Meqdadi SE 2730 Lecture 1 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.
CLARIN work packages. Conference Place yyyy-mm-dd
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
RDA Data Foundation and Terminology (DFT) WG: Overview  Prepared for Collab Chairs Meeting, NIST, Nov 13-14, 2014  Gary Berg-Cross, Raphael Ritz, Peter.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
Hydro DWG at the RDA Plenary: BoF and Aligning HDWG work with WMO expectations and timeline Sylvain, Tony, Silvano, Ilya.
Summary of RDA Outputs so far dr. Ir. Herman Stehouwer 22 September 2015.
Design and Implementation of a Rationale-Based Analysis Tool (RAT) Diploma thesis from Timo Wolf Design and Realization of a Tool for Linking Source Code.
Hydro DWG at the RDA Plenary BoF - Improve sharing of water resource data globally 24 September BREAKOUT :30-15:00.
Software Engineering Introduction.
Sources of inspiration Discussions in DFT Use Cases Discussions in DF Use Cases „Paris“ Document Comments on „PARIS“ document Urgently need “Basic and.
Jemerson Pedernal IT 2.1 FUNDAMENTALS OF DATABASE APPLICATIONS by PEDERNAL, JEMERSON G. [BS-Computer Science] Palawan State University Computer Network.
Why RDA? A domain repository perspective George Alter ICPSR University of Michigan.
RDA End to End RDA Global Tested, Hardened, Integrated Council TAB OAB Sec Tech Transfer Outreach Mtgs Publication Testing & Eval RDA Coord Groups Third.
Discussion of Data Fabric Terms & Preparation for RDA P7 Virtual Meeting Monday, January 25, 2016 Organized by Gary Berg-Cross (DFT-IG) and Peter Wittenburg.
Repository Registries Agenda 11.30Welcome & State of the Discussion Is it all one – is it all different? Peter & Herman and commenters 12.10Actions to.
Information Resource Stewardship A suggested approach for managing the critical information assets of the organization.
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
Design Pattern Support based on principles of model driven development Zihao Zhao.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
1 Geospatial Standards for Canada Proposed blueprint for Jean Brodeur and Cindy Mitchell.
Preservation e-Infrastructure IG Description: help ensure preservation of needed data succeeds Goals: foster worldwide collaboration; ensure consistency.
Data Fabric IG From Testing to Recommendations Beth Plale.
RDA for Data Practitioners Peter Wittenburg / Rainer Stotzka.
Research Data Repository Interoperability Thomas Jejkal.
European Grid Initiative The EGI Federated Cloud as Educational and Training Infrastructure for Data Science Tiziana Ferrari/ EGI.eu.
MANAGEMENT INFORMATION SYSTEM
Bringing visibility to food security data results: harvests of PRAGMA and RDA Quan (Gabriel) Zhou, Venice Juanillas Ramil Mauleon, Jason Haga, Inna Kouper,
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
CESSDA SaW Training on Trust, Identifying Demand & Networking
RDA 9th Plenary Breakout 3, 5 April :00-17:30
Overview of WGs, IGs and BoFs
RDA to Deliver Why? What? When? How?.
Current and Upcoming RDA Recommendations Dr. ir. Herman Stehouwer
RDA Data Fabric (DF) Interest Group Peter Wittenburg & Gary Berg-Cross
EOSC MODEL Pasquale Pagano CNR - ISTI
Donatella Castelli CNR-ISTI
Data Foundations And Terminology (DFT) IG
Maggie, Carlo, Peter, Rebecca (GEDE discussions)
Agenda Welcome and overview (Peter)
From Observational Data to Information (OD2I IG )
Joint DFIG – Broker Meeting The DFIG view Peter Wittenburg
ONAP Architecture Principle Review
Presentation transcript:

Data Fabric IG Introduction

2  about 50 interviews & about 75 community interactions  Data Management and Processing is too time consuming and costly due to organization heterogeneity.  Federating data including logical layer information (tracing provenance, understanding creation context, checking identity and integrity, etc.) is too costly.  DM and DP is not ready for Big Data due to the lack of usage of automated procedures incorporating proper data organization mechanisms. Observations I from Recent Overview

3  Due to lack of software that is supporting proper data organizations we continue to create legacy data.  Example: a key biologists is spending 75% of his time for data management - a waste of money and human capital.  To a large extent results are not reproducible.   Senior Domain People agree:  need a change data organization and procedures  but risky path and lack data professionals  people hesitate since they miss clear perspectives Observations II from Recent Overview

4 Data Fabric Sketch a very rough sketch of the Data production and processing machinery of data-driven science One Big Question for RDA: How can we maximally support this machinery unload researchers, make science reproducible, etc.

5 “Data Fabric” based on Recent Overview data one can work with organized data sharable & re-usable data often all in file system often a lot of copying in file system

6 One concrete conclusion  current practice: a data collection comes with its own data organization, management and access solutions  future: there is no need for this heterogeneity since DOs can be treated content independent to a certain extent

7  started with a number of WG activities in P1  DTR, PP, PIT, MD, DFT – the old ones  some people found this urgent and interesting topics  almost at the end and some questions: what now, how does it all fit into landscape, etc.?  they started this DF brainstorming  even more groups started  are there general themes in the data landscape  the whole issue of data publishing/citation/etc.  the whole issue of scientific culture/legal & ethical aspects  our daily data work in the departments – the Data Fabric  may be more Until now in RDA

8  what is the scope of RDA’s Data Fabric?  what are the characteristics of RDA’s Data Fabric? (term is used in industry already: efficient computational machine)  what are the components of RDA’s Data Fabric?  what should the DF IG do within RDA and what not? A few Questions arise

9  DF is about  making departments’ data science reproducible  creating the conditions for trust in the anonymous data domain  identifying mechanisms, components and interfaces making data science efficient and cost effective  discussing cross-disciplinary approaches  defining a framework that allows to include new components or component variants in a flexible way  Example:  DF will state necessity of a worldwide available machinery to register & resolve DOs, we will say something about registered attribute types and specify an API  but we will not say how to implement and use such a system Scope and Characteristics of RDA’s DF

10  DF is NOT about  prescribing an overarching architecture we need to follow  specifying an implementation of such an architecture  discussing specific technologies and tools  more than discussing the processing machinery (not publication, citation, l & e, etc.)  DF is about highly automated procedures or at least guidance to follow such procedures. Scope and Characteristics of RDA’s DF

11  domain of registered data objects (DO) incl. basic organization principles (data, code, knowledge)  domain of registered actors (ORCID, etc.)  domain of trusted repositories for DOs  accepted policy principles (proper organizing mechanisms, self-documenting, certified, etc.)  set of trusted registries (types&concepts, metadata and provenance schemas, metadata instances, repositories, PIDs, policies, etc.)  what about semantics – so important!  much already out there, need to see how this can all fit together and how we can foster software development Components of RDA’s DF (just first ideas)

12  DF IG must be an inclusive open platform for interaction  DF IG needs to place the various WGs/IGs on the landscape  DF IG needs to identify barriers across groups  DF IG can work as umbrella to maintain WG results  open position papers will summarize the state of discussions and provoke convergence debates  it will NOT take council’s of TAB’s role DF IG way of acting

13 1.What are DF’s Scope and Characteristics? 2.What are DF’s components, interfaces, mechanisms? 3.How should DF act? 4.Who will chair DF IG? Task of today DF IG BoF