Datatypes Characterizing data

Slides:



Advertisements
Similar presentations
DC8 Registries Breakout. Goals of the session Discuss and clarify : Requirements for registry Framework for policy Relate issues raised to EOR prototype.
Advertisements

A Unified Approach to Combat Counterfeiting: Use of the Digital Object Architecture and ITU-T Recommendation X.1255 Robert E. Kahn President & CEO CNRI,
Information Types and Registries Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
1  Bob Hager Director of Publishing Standards Metadata Specification.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
Chapter 2 Software processes. Topics covered Software process models Process activities Coping with change.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
RDA End to End RDA Global Tested, Hardened, Integrated Council TAB OAB Sec Tech Transfer Outreach Mtgs Publication Testing & Eval RDA Coord Groups Third.
Extending the Metadata Registry for Semantic Web - Enforcing the MDR for supporting ontology concept - May 28, 2008 ISO/IEC JTC 1/SC 32 WG 2 Meeting Sydney,
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
ICSU-WDS & RDA Data Publication Services WG. 2 Linking Research Data and the Literature: why? Why link? 1.Increase visibility & discoverability of research.
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Data Type Registries (DTR) WG RDA P3 Breakout 28 March 2014 Larry Lannom Corporation for National Research Initiatives
Draft Data Foundation and Terminology (DFT) Vocabulary Development Process Prepared for WG-Core meeting 24/25.2 Munich/Garching Gary Berg-Cross Co-Chair.
SC’13 BoF Discussion Sean Hefty Intel Corporation.
Data Foundations And Terminology (DFT) IG Virtual Meeting July 6 th 2016 Co-Chairs DFT IG :Gary Berg-Cross & Raphael Ritz P8 Sessions DFT IG Breakout Session.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
Data Type Registries #2 Co-Chairs: RDA Chairs’ Mtg Gothenburg
Process 4 Hours.
Chapter 3 Data Representation
RDA 9th Plenary Breakout 3, 5 April :00-17:30
Future Network Standardization Activities in ISO/IEC JTC1/SC6
View Controllers In the Model-View-Controller (MVC) design pattern, a controller object provides the custom logic needed to bridge the application’s data.
Muen Policy & Toolchain
Web Engineering CS-4513 Prepared By: Junaid Hassan Lecturer at UOS M.B.Din Campus
Processes and threads.
DROPS Focus Groups Human Factors
SWOT Analysis Overview Hotel Level SWOT Analysis Template
WG Research Data Collections RDA P10 Montréal – September 2017
IEEE 802 JTC1 Standing Committee Proposal for SC6 contribution process
Data Type Registries #2 12 Month Status Larry Lannom, Tobias Weigel Date Location TBD? CC BY-SA 4.0.
The RPID Testbed Rob Quick Manager – High Throughput Computing
Workplan for Updating the As-built Architecture of the 2007 GEOSS Architecture Implementation Pilot Session 7B, 6 June 2007 GEOSS Architecture Implementation.
Data Type Registries Breakout
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Active Data Management in Space 20m DG
Data Foundation and Terminology (DFT) Vocabulary Development Session
RDA Plenary 9 Breakout Session
PID centric fabric constructed piece by piece
Object oriented system development life cycle
FICEER 2017 Docker as a Solution for Data Confidentiality Issues in Learning Management System.
Data Type Registries (DTR)
CSC480 Software Engineering
C2CAMP (A Working Title)
Net 323 D: Networks Protocols
Service Discovery Middleware
Action Request (Advice) Registry
Project Plan Template (Help text appears in cursive on slides and in the notes field)
Sophia Lafferty-hess | research data manager
Software Measurement Process ISO/IEC
IEEE 802 JTC1 Standing Committee Proposal for SC6 contribution process
Standards for the Internet of Things
Component-Based Software Engineering
WG Research Data Collections An overview of the recommendation
Data types and persistent identifiers in
IS-ENES Cases Seven use cases are listed as data lifecycle steps A B C
Agenda (AM) 9:30-10:15 Introduction to RDA
The new RDA: resource description in libraries and beyond
Use Case Analysis – continued
Reinhard Scholl, GTSC-7 Chairman
The Role of CIM in Smart Grid & The SGAC Semantic Framework
X-Road application guide
Requirements for MFI Part6: Registration procedure
WG PID Kernel Information RDA P11 Berlin – March 2018
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July Tobias Weigel (DKRZ)
Recent Standardization Activities on Cloud Computing
Presentation transcript:

Datatypes Characterizing data Lannom (largely based on P10 DTR Breakout) January, 2018

Corporation for National Research Initiatives DTR v1 & v2 Q1: Standards / best practices for precise typing of data + distributed registry system for same Q2: v1 output approved, accepted as EC ICT Tech Spec, multiple prototypes in operation, v2 approved to work on ‘recipes’, ISO/IEC takes up as a standards activity Q3: Ontology rabbit hole looms, need some standards-like org to establish reliable base types Q4: TBD, but a handover to ISO/IEC seems likely Q5: Hope to finish at P11, final document, pointers to running prototypes, pointer to ISO/IEC standards activity Q6: Closely connected to PID groups, Data Fabric, Collections Q7: Data sharing and re-use requires deep and precise understanding of data, which is enabled by typing. Corporation for National Research Initiatives

Datatype ID: 123 Dataset Datatype Datatype Record references (with the help of metadata) ID: 123 Datatype Datatype Record 1 0 1 0 0 0 1 0 1 0 1 1 0 1 0 1 Dataset (for example) resolves to characterizes used by to process Corporation for National Research Initiatives

Datatype Record If the goal is to primarily help humans, then could contain Self-contained, (less precise), documentation. This means, the record could contain mostly words with a few references to other datatypes if applicable. Corporation for National Research Initiatives

Datatype Record If the goal is to also enable software to process data, then Datatype Record should contain A (more) precise definition. This means, it should be a detailed record with references to other datatypes for composition, normalization, and reuse reasons. Corporation for National Research Initiatives

Differing scopes: Human use vs. software use Records for primarily human use and for perhaps internal software use Definitions would mostly be at the collection level, e.g., table-level not cell level. Required upfront effort is low for registering datatypes. No real leverage from pre-existing definitions. That is, there would be little to no reuse. Mainly useful for humans, although “wired” software can take advantage of these definitions. Records for software and external use Definitions would be built on top of each other. Required upfront effort is high. Reuse possibilities are high. Software can leverage the definitions for automated processing and actions. External agents can potentially process data without additional help. Finding a sweet spot between the two is a challenge. Previous attempts have quickly put us in the ontology space or in the ISO 11179 space. Corporation for National Research Initiatives

Corporation for National Research Initiatives Standards Activity Output of DTRv1 (2016) one of four RDA outputs approved as EC ICT Tech Spec Flippant summary – we think this is a good idea and lots of people seem to agree Formal summary - Confirmation that detailed and precise data typing is a key consideration in data sharing and reuse and that a federated registry system for such types is highly desirable and needs to accommodate each community’s own requirements ISO/IEC JTC1 SC32 WG2 (Joint Technical Committee for Info Tech, Metadata group, home of ISO 11179) NIST/ITL played matchmaker We put forward a strawman - a building blocks model - but for just tabular data. In that proposal, a datatype record can build on other records to compose, extend, or otherwise depend - this proposal is already into the ontology space. The proposal generated a lot of discussion in the ISO WG. New WG2 plan Leave ISO 11179 as is Move forward with new metadata groups, with DTR as one focus DTRv2 – will try to wrap-up at P11 Corporation for National Research Initiatives

Corporation for National Research Initiatives

PID Kernel Information (KI) Guiding Principles FAQ Q:  What is PID Kernel Information? A:  PID Kernel Information is information stored in the form of attributes within the PID record. PID Kernel Information supports smart programmatic decisions that can be accomplished through inspection of the PID record alone.  Q:  How broadly applicable are these guiding principles? A:  These guiding principles are generally applicable to PID resolving systems that satisfy the following:  first, a PID resolving service must be able to store and retrieve a small amount of user defined metadata, and second there is a globally discoverable service available through which typing information about the extended PID record can be retrieved.   The Handle service and Data Type Registry meet these requirements; we expect other systems do as well and seek community input on others.  Q:  What is the purpose of the guiding principles? A:  The guiding principles are a guide through which determination of the fitness of information for inclusion in the PID Kernel Information record can be made.    The principles apply to PIDs that reference (point to) data objects where a data object has a digital manifestation.  The object itself can be a physical object, data, metadata, etc.   Q: To whom is this document important?  A:  The PID Kernel Information WG envisions global convergence around a small number of PID Kernel Information profiles. For instance, there could be a profile for IOT, for physical devices that are part of an IoT, and for research data.  The principles will guide the definition of these three profiles.  The profiles in the example illustrate that more than one profile can be useful for a data object.   Corporation for National Research Initiatives

PID Kernel Information (KI) Guiding Principles: Work In Progress PID Kernel Information cannot be an authoritative source for metadata.  Thus PID Kernel Information is always a duplicate of metadata whose authoritative version is elsewhere. A:  PID Kernel Information is information stored in the form of attributes within the PID record. PID Kernel Information supports smart programmatic decisions that can be accomplished through inspection of the PID record alone.  PID Kernel Information is stored directly at the local resolving service and not referenced Benefit of PID KI is its enabling of a middleware ecosystem of smart machine actionable services.  E.g., PID KI information used to determine coarse grained, routing/filtering decisions on large lists of PIDs (>1,000,000) Contents of PID KI record is property of data object owner or owner delegate PID Kernel Information attributes have a slow rate of change. Attributes (items) in the profile are expressed as key-value pairs where the values are simple (indivisible) Corporation for National Research Initiatives

Corporation for National Research Initiatives

Road Map(s) Motivation  What infrastructure is needed? How do newcomers figure out where to plug in? Soln:  Profiles of groups? Where are overlaps in groups? What are our general directions for approving groups How do we prioritize resources for outputs What are the current activities? Where is RDA going? How do we do what RDA does?   (limited resources, etc) or What does RDA need to do for/with volunteers?  Disagreement on this one… fiscally available is more helpful than ‘how’?  Clarification - making the most effective use of resources/people at hand. What level of support does each of these volunteers need to apply (cat herding, automation, etc?) RDA is quickly growing, diversifying. Our volunteer workforce is not static, constant evolution Have one way of looking at the organization in RDA.  At P7 in Tokyo, the cultural bias became very stark.  Having an onboarding mechanism by IDW/P12 may be a very good idea. Ingrid - Very important to distinguish the business of RDA (organization, running of the governance bodies, etc), vs. the RDA community and group work Corporation for National Research Initiatives

Road Map(s) Motivation How do newcomers figure out where to plug in? Short, readable profiles? Where are overlaps in groups? Help TAB in approval / encouragement Where to put resources for output adoption Ambassadors Standards bodies Related comments What infrastructure is needed (separate topic) Where is RDA going? How do we do what RDA does?   (limited resources, etc) or What does RDA need to do for/with volunteers?  Disagreement on this one… fiscally available is more helpful than ‘how’?  Clarification - making the most effective use of resources/people at hand. What level of support does each of these volunteers need to apply (cat herding, automation, etc?) RDA is quickly growing, diversifying. Our volunteer workforce is not static, constant evolution Have one way of looking at the organization in RDA.  At P7 in Tokyo, the cultural bias became very stark.  Having an onboarding mechanism by IDW/P12 may be a very good idea. Ingrid - Very important to distinguish the business of RDA (organization, running of the governance bodies, etc), vs. the RDA community and group work Corporation for National Research Initiatives