RDA Terminology: Data Management and Data Fabric Prepared for RDA 6 th Plenary Paris, Sept. 23, 2015 Gary Berg-Cross Co-Chair DFT IG, Co-organizing Chair.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Document Control DAP Quality Conference May 12, 2008 Debbie Penn.
Software Configuration Management (SCM)
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Lecture Nine Database Planning, Design, and Administration
Configuration Management
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Software Configuration Management
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
1 Data Strategy Overview Keith Wilson Session 15.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
BY Karen Liu, Ph. D. Indiana State University August 18,
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Content Strategy.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
Discussion of Larger Scope DFT Concepts & Terminological Issues Prepared for RDA P4, Amsterdam, Sept 2014 Gary Berg-Cross: Co-Chair DFT WG.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Configuration Management (CM)
Chapter 7 Developing a Core Knowledge Framework
Data Fabric IG Introduction. 2  about 50 interviews & about 75 community interactions  Data Management and Processing is too time consuming and costly.
Joint Declaration of Data Citation Principles Notes [1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman &
Dynamic Document Sharing Detailed Profile Proposal for 2010 presented to the IT Infrastructure Technical Committee Karen Witting November 10, 2009.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Chapter 7 Developing a Core Knowledge Framework
FEA DRM Management Strategy Presented by : Mary McCaffery, US EPA.
RDA Data Foundation and Terminology (DFT) WG: Overview  Prepared for Collab Chairs Meeting, NIST, Nov 13-14, 2014  Gary Berg-Cross, Raphael Ritz, Peter.
Configuration Management and Change Control Change is inevitable! So it has to be planned for and managed.
1/22/08 RTR Project Presentation to TPTF RTR Project Michael Daskalantonakis & Brian Cook.
© 2010 Health Information Management: Concepts, Principles, and Practice Chapter 5: Data and Information Management.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Technical Overview. Project Overview Document Library Document List Index TransmittalsPlanning.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group Should.
TSS Database Inventory. CIRA has… Received and imported the 2002 and 2018 modeling data Decided to initially store only IMPROVE site-specific data Decided.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Object storage and object interoperability
ITIL VS COBIT 06 PLM - Group 9
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Requirement engineering & Requirement tasks/Management. 1Prepared By:Jay A.Dave.
Dynamic/Deferred Document Sharing (D3S) Profile for 2010 presented to the IT Infrastructure Technical Committee Karen Witting February 1, 2010.
Discussion of Data Fabric Terms & Preparation for RDA P7 Virtual Meeting Monday, January 25, 2016 Organized by Gary Berg-Cross (DFT-IG) and Peter Wittenburg.
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg.
Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.
Joint Declaration of Data Citation Principles (Overview) The Data Citation Synthesis Group Joint Declaration.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Updating image To update the background image: Go to ‘View’ Select ‘Slide Master’ Select the page with the image Right click on the image and select ‘Change.
Dynamic/Deferred Document Sharing (D3S) Profile for 2010 presented to the IT Infrastructure Technical Committee Karen Witting February 1, 2010.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Draft Data Foundation and Terminology (DFT) Vocabulary Development Process Prepared for WG-Core meeting 24/25.2 Munich/Garching Gary Berg-Cross Co-Chair.
Data Foundations And Terminology (DFT) IG Virtual Meeting July 6 th 2016 Co-Chairs DFT IG :Gary Berg-Cross & Raphael Ritz P8 Sessions DFT IG Breakout Session.
Data Foundations And Terminology (DFT) IG
RDA WG on Dynamic Data Citation
Current and Upcoming RDA Recommendations Dr. ir. Herman Stehouwer
RDA Data Foundation and Terminology (DFT) WG
RDA Data Fabric (DF) Interest Group Peter Wittenburg & Gary Berg-Cross
Using E-Business Suite Attachments
Chapter 11: Software Configuration Management
Distribution and components
Data Foundations And Terminology (DFT) IG
Data Foundation and Terminology (DFT) Vocabulary Development Session
Chapter 4 Relational Databases
Maggie, Carlo, Peter, Rebecca (GEDE discussions)
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Chapter 11: Software Configuration Management
Metadata The metadata contains
Software Requirements Specification (SRS) Template.
Bird of Feather Session
Presentation transcript:

RDA Terminology: Data Management and Data Fabric Prepared for RDA 6 th Plenary Paris, Sept. 23, 2015 Gary Berg-Cross Co-Chair DFT IG, Co-organizing Chair for DF IG DFT Goal: Describe a basic, abstract (but clear) data organization model that systemizes the already large body of definition work on data management terms, especially as involved in RDA’s efforts. Terminology Issue What do we expect from RDA ? Adopt one or build own language? Spend years on terminology debates? Build our own language stepwise, Other - such as cooperate with other efforts?

Topics - RDA DFT is about clarifying and labeling concepts and Terminology Strategy Franco Zoppi “The document seems to suffer from a problem in the used terminology. Terms are sometimes unclear (in many cases definitions would help) or even wrong or misused. I guess that most of these problems could be avoided with a correct use of Computer Science/ICT well established and consolidated terminology. This is particularly evident in Sections 2.2, 2.3 and 2.6.” Broadening discussion beyond a core to wider Data Management Including suggested concepts with candidate terminology Current strategy is to: Clarify and update existing terms Digital Objects need IDs, but what and how as part of data management? etc... Improve supporting models with conceptual relations (a big job) Provide practical guidance (technical and policy views)

Digital Data Management including unregistrered (is a braoder concept) Broadening the Discussion (Stepwise or Scope- wise) Data Management (and use) is broader still Digital Object Management (registered, digital data) Where are datasets???

Integrate Concepts: Policy-based Digital Data Management Concept Graph (Reagan Moore) Based on practical principles, Policy defines when in a workflow a PID is created as well as other curation activities..These defs are linked

Including suggested concepts with candidate terminology: Examples 1.Data practice is the actual application/ use of ideas & methods (as opposed to theories) about how data are collected, created, stored (maintained), curated, used, shared and released (disseminated). 2.Data principles are rules that provide guidance across data management and use for such things as” data acquisition, data lifecycle control, data policy & ownership, metadata practices, data quality etc. 3.Common data solutions are agreed upon, easily available, tested & approved approaches to widely occurring problems in data management and use 4.Data discovery is a process of query and/or search to find (research) data of interest. 5.Database cracking features incremental partial indexing and/or sorting of the data. It combines features of automatic index selection and partial indexes. It reorganizes data within the query operators, integrating the re-organization effort (occasionally invoking creation or removal of indexes on tables and views based on use) into query execution. It shifts the cost of index maintenance from updates to query processing. 6.Adaptive indexing is characterized by the partial creation and refinement of preliminary or fixed DB indexes as side effects to support efficient query execution. (after

Clarifying Concepts: we discussed other organizing model ideas Digital Object (aka Digital Entity) A digital object is composed of structured sequence of bits/bytes. As an object it is named. This bit sequence can be identified & accessed by a unique and persistent identifier or by use of referencing attributes describing its properties. Note Digital Entity definition from X.1255 ITU standard “machine-independent data structure consisting of one or more elements in digital form that can be parsed by different information systems; the structure helps to enable interoperability among diverse information systems in the Internet.” Link data management principles to the actual workflow of generating data Data Management Workflow Structured Object – includes provenance, versioning, and output MD (from PP)

Clarifying and updating existing terms: adding practicality Comments on the DF White paper include challenges to the idea that Internal/External properties is a useful distinction for DOs: Internal property refers to the properties, making up an internal structure, that allow one to interpret the content of a DO. the statement “we need to distinguish the external characteristics from the internal characteristics to ensure that we really can separate common data management tasks from discipline–specific heterogeneity..” seems not appropriate.... many such things considered external for data managements vary by discipline too...search by sample type or Dx. I think that it is unfeasible the assignment of PIDS to single data. Therefore you need search and query capabilities to find the required data contained in datasets/databases identified by the PIDs. ID, creation date,... Sample type UoM Obs. Precision Patient Age Symptom Dx... Common Management for these External Properties? Part is Identification, but Part is for discoverability

Improving conceptual relations Concept map overview of Core Terms How is some part of a database or dataset to be identified/cited? How should data stored in a repository that has complex internal structure and that is subject to change be identified/cited? We will need smarter resolvers that offer additional services beyond getting from an identifier to an object location.

Providing Practical Guidance (Tech, Policy & Strategy) When should a PID be assigned to be useful with dynamic data? If you build up a clinical trial database you will continuously add and change data. There is no PID necessary because here you have the audit trail which stores all actions. A PID should be assigned, for example, when the database is cleaned and frozen, which is a definite working step in the workflow of clinical trials. (Christian Ohmann, Wolfgang Kuchinke, Steve Canham) PIDs should be assigned at the level of granularity (data sets) appropriate for a functional use that is envisaged (Costantino Thanos) Responses Scalibility is an issue, so the management of objects & identifiers should work through the same mechanisms as much as possible. To enable management of objects beyond a view focusing on single items, adequate mechanisms should, for example, be able to select objects by their most important characteristics or aggregate them at multiple levels of granularity and provide basic CRUD operations on such object collections. Tobias Weigel, Michael Lautenschlager For added-value services registries at the resolvers’ level are also needed and should be maintained by recognized international organizations. Publishers will rely on the DOI system because there has been major investment. What highly available and scalable PID system is feasible? We should develop a strategy build upon what is existing and what can be done for those cases, where currently no PID is used. Etc....