RDA Plenary 9 Breakout Session

Slides:



Advertisements
Similar presentations
Contextual Linking Architecture Christophe Blanchi June Corporation for National Research Initiatives Approved for.
Advertisements

Meta Data Larry, Stirling md on data access – data types, domain meta-data discovery Scott, Ohio State – caBIG md driven architecture semantic md Alexander.
OASIS OData Technical Committee. AGENDA Introduction OASIS OData Technical Committee OData Overview Work of the Technical Committee Q&A.
Visual Scripting of XML
Database Systems: Design, Implementation, and Management Tenth Edition
Information Types and Registries Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
SOFTWARE ENGINEERING BIT-8 APRIL, 16,2008 Introduction to UML.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
February 17, 1999Open Forum on Metadata Registries 1 Census Corporate Statistical Metadata Registry By Martin V. Appel Daniel W. Gillman Samuel N. Highsmith,
Corporation for National Research Initiatives DOI API IDF Members Meeting 22 June 2004 Larry Lannom CNRI.
AUKEGGS Architecturally Significant Issues (that we need to solve)
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
Rupa Tiwari, CSci5980 Fall  Course Material Classification  GIS Encyclopedia Articles  Classification Diagram  Course – Encyclopedia Mapping.
Health eDecisions Use Case 2: CDS Guidance Service Strawman of Core Concepts Use Case 2 1.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Data Type Registries (DTR) RDA 4th WG/IG Collab Meeting NIST: Dec 2015 Larry Lannom CNRI.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The Data Type.
Data Type Registries (DTR) WG RDA P3 Breakout 28 March 2014 Larry Lannom Corporation for National Research Initiatives
Data Typing BoF RDA Plenary 7 Tokyo: March 2016 Larry Lannom CNRI.
International Planetary Data Alliance Registry Project Update September 16, 2011.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Data Type Registries #2 Co-Chairs: RDA Chairs’ Mtg Gothenburg
Workshop on Brokering in Data Fabrics - community perspectives -
RDA WG on Dynamic Data Citation
RDA 9th Plenary Breakout 3, 5 April :00-17:30
Semblog Project Personal Information Distribution with Social Network
User Characterization in Search Personalization
Data Type Registry Data set descriptions for automation
RDA Data Fabric (DF) Interest Group Peter Wittenburg & Gary Berg-Cross
WG Research Data Collections RDA P10 Montréal – September 2017
Data Type Registries #2 12 Month Status Larry Lannom, Tobias Weigel Date Location TBD? CC BY-SA 4.0.
Data Type Registries Breakout
Information Delivery Manuals: Functional Parts
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
PID centric fabric constructed piece by piece
Data Type Registries (DTR)
T-TAP for climate data RDA P10 Montréal – September 2017
C2CAMP (A Working Title)
Geospatial Knowledge Base (GKB) Training Platform
Software Measurement Process ISO/IEC
The JISC IE Metadata Schema Registry
CIM Model Manager Report
Health Ingenuity Exchange - HingX
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Brief WG/IG reporting Tobias Weigel on behalf of co-chairs
2. An overview of SDMX (What is SDMX? Part I)
WG Research Data Collections Draft outputs of a RDA bottom-up effort P9 - April 2017 Co-chairs: Bridget Almas, Frederik Baumgardt, Tobias Weigel, Thomas.
2. An overview of SDMX (What is SDMX? Part I)
WG Research Data Collections An overview of the recommendation
Using the RDA Collections API to Shape Humanities Data
Datatypes Characterizing data
Research Data Alliance (RDA) 9th WG/IG Collaboration Meeting: Repository Platforms for Research Data (RPRD) Interest Group 13nd June 2018 Co-Chairs:
SDMX Information Model: An Introduction
Agenda (AM) 9:30-10:15 Introduction to RDA
CMIP6 use case and adoption of RDA outputs
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
School of Information Studies, Syracuse University, Syracuse, NY, USA
WG PID Kernel Information RDA P11 Berlin – March 2018
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July Tobias Weigel (DKRZ)
QoS Metadata Status 106th OGC Technical Committee Orléans, France
Palestinian Central Bureau of Statistics
Presentation transcript:

RDA Plenary 9 Breakout Session Data Type Registries #2 RDA Plenary 9 Breakout Session April 2017 Co-Chairs: Larry Lannom - CNRI Tobias Weigel – DKRZ

Agenda 14:00 - 14:05 Introductions, Agenda Bashing 14:05 - 14:10 Larry Lannom, State of the WG 14:10 - 14:15 Steve Richard, EarthCube (remote) 14:15 - 14:20 Mike Finnegan, Vermont Monitoring Cooperative (remote) 14:20 - 14:30 Tobias Weigel, Climate Data Processing 14:30 - 14:40 Ulrich Schwardmann, ePIC-DTR 14:40 - 14:50 Andres Ferreyra, AgGateway 14:50 - 15:00 Larry Lannom, DTR in ISO 15:00 - 15:30 Discussion: What next? Type catalog, base types, federation, education and training.

Brief Review

Problem: Implicit Assumptions in Data Data sharing requires that data can be parsed, understood, and reused by people and applications other than those that created the data How do we do this now? For documents – formats are enough, e.g., PDF, and then the document explains itself to humans This doesn’t work well with data – numbers are not self-explanatory What does the number 7 mean in cell B27? Data producers may not have explicitly specified certain details in the data: measurement units, coordinate systems, variable names, etc. Need a way to precisely characterize those assumptions such that they can be identified by humans and machines that were not closely involved in its creation

Goal of the DTR Effort: Explicate and Share Assumptions using Types and Type Registries Evaluate and identify a few assumptions in data that can be codified and shared in order to… Produce a functioning Registry system that can easily be evaluated by organizations before adoption Highly configurable for changing scope of captured and shared assumptions depending on the domain or organization Supports several Type record dissemination variations Design for allowing federation between multiple Registry instances The emphasis is not on Identifying every possible assumption and data characteristic applicable for all domains Technology

What is a Data Type? A unique and resolvable identifier Which resolves to characterization of structures, conventions, semantics, and representations of data Serves as a shortcut for humans and machines to understand and process data File formats and mime types have solved the ‘representation’ problem at a ‘unit’ level Examples of problems we aim to solve with data types: It is a number in cell A3, but is it temperature? If so, in Celsius? It is a dataset consisting of location, temperature, and time, but what variable names should I look for? Is it all packaged as CSV or NetCDF? And as a single unit or a collection of units? Type record structure will continue to evolve – not finished, but functioning

What is a Data Type Registry? A low-level infrastructure with wide applicability to record and disseminate type records Not an immediate ROI application Assigns unique and resolvable identifiers to type records Enforces and validates common data model & expression for interoperation between multiple instances of Registries API for machine consumption UI for human use

Federated Set of Type Registries Process Use Case 3 Users 2 4 Federated Set of Type Registries 1 Typed Data ID Type Payload 10100 11010 101…. Visualization I Agree Terms:… Rights Services Data Processing Data Set Dissemination Client (process or people) encounters unknown data type. 1 Resolved to Type Registry. 2 Response includes type definitions, relationships, properties, and possibly service pointers. Response can be used locally for processing, or, optionally 3 typed data or reference to typed data can be sent to service provider. 4

Discovery Use Case 2 Users 1 3 4 Federated Set of Type Registries 2 Users 1 3 4 Repositories and Metadata Registries ID Type Payload Clients (process or people) look for types that match their criteria for data. For example, clients may look for types that match certain criteria, e.g., combine location, temperature, and date-time stamp. 1 Type Registry returns matching types. 2 Clients look up in repositories and metadata registries for data sets matching those types. 3 Appropriate typed data is returned. 4

Type Registry History Handle Types – 0.Type/SomeType Good idea, limited applicability Profiles – ‘type’ the whole set of handle/type/value triples. No traction Sloan Grant: 2012 -2014 Generic Registry system using Type Registry as a use case NSF Grant: 2013 -2014 Included support for Type Registry Research Data Alliance (RDA) Data Type Registries Working Group One of first two WGs approved at Plenary 1 – March 2013 International representation, > 50 members Co-chairs Lannom (CNRI), Broeder (Max Planck Psycholinguistics Institute) Approved as RDA Recommendation (2015) DTR Phase 2: Follow-on Group (P8 – so 6 months in) Focus on data type records Help data producers create useful record types Co-chairs Lannom (CNRI), Weigel (DKRZ)

Current State A prototype is at: http://typeregistry.org/ Multiple other implementations/projects Implementation supports notions of primitives and derived types Primitives are fundamental types that we expect humans and software to parse and understand Integer, floating point, boolean value, string, date, timestamp, etc. Derived types depend on primitives to describe something complex Stream gauge, Lidar, Spatial bounding box, etc. Registered types are assigned unique identifiers

Data Type Example

Data Type Registry Data set descriptions for automation Stephen M Richard, IEDA, EarthCube

Use cases Document the meaning of entities and attributes in data. Re-use of data type and attribute definitions Machine-assisted data integration: matching attribute content. Validation of data instances against a type definition. Tools that spin up a UI for a particular data type. Link software to data sources that it can use Support file introspection to assist with deep data registration

Progress JSON schema implemented for data type model https://github.com/usgin/digital-crust-LDR DataTypeJSON.json Initial testing with Cordra Next steps– how to get model compilations from spreadsheet or rdf to Cordra. Where to deploy

Corporation for National Research Initiatives enrich.cordra.org Enrich DTR Service enhances dataset metadata Syntactic nature (Decimal values between 0.0 – 24.0) Semantic information (Concept time in unit hours) Currently applying the DTR at an attribute level (Concept and Unit) Expand to the dataset level Dataset Metadata (row/column count, maintainer, description, etc.) Dataset Schema (columns information – constraints, formats, semantic information, etc.) Utilizing DTR mapping, in addition to other metadata, to drive a dataset recommendation system. Continuing to collaborate with CNRI on this.

https://rd-alliance.org/ - https://twitter.com/resdatall Concept as it was at P8 netCDF-Files Agent Collection script Processing service (WPS) <Metadata> (xml) ? (third-party input) well-defined ways to publish it (automatically) possible repacking into a new collection output multiple types, e.g. netcdf, xml, linked data, text reports, PROV record described in DTR https://rd-alliance.org/ - https://twitter.com/resdatall

Pathway for climate data processing services Multiple upcoming angles for attaching a DT solution: CMIP6 data Handles, Handle records, potential PID Kernel Information profile Copernicus Data Services re-using existing WPS-based services Handles? – Depends on success and acceptance of CMIP6 solution and available effort Climate Analytics Service (DKRZ/CMCC) server-side data processing with PID support data sources: CMIP6, possibly Copernicus, ... https://rd-alliance.org/ - https://twitter.com/resdatall

https://rd-alliance.org/ - https://twitter.com/resdatall Typing angles a) Service typing automated discovery or at least verification b) Data input and output typing (coarse) to distinguish the data sources and help with management c) Typing of data internals (fine) traditional DTR: enable machines to understand meaning of data https://rd-alliance.org/ - https://twitter.com/resdatall

ISO Study Group

ISO Study Group Activity on Data Type Records: Background ISO-IEC/JTC1/SC32/WG2 is currently working on a meta model for dataset description. That meta model covers data elements "about" datasets versus "internal details" of datasets. We refer to the record that captures internal details a "data type record”. WG2 was receptive to the idea of exploring the "data type record" space. A study group was authorized Nov 2016, with CNRI as lead ISO-IEC/JTC1/SC32/WG2/SG Present use case(s) from existing RDA members and see what fields are needed to describe the internals of the datasets pertaining the use case.(s) Evaluate what existing ISO standards cover and what they do not. If applicable, recommend a technical report or technical specification or standard. The study group terminates June 2017

ISO Study Group Activity on Data Type Records: Work and Possible Outcomes ISO members referenced portions of existing ISO standards that are applicable to the proposed data type record structure (11179-3, 11179-7,  19763-12, and 11404) CNRI will create a UML diagram that, wherever applicable, references the pieces from each of those standards. Feb 10th – ‘Data Type Record – Elements for Characterizing Data’ submitted to WG. Has been posted to DTR site Based in part on CMIP6 and Vermont Monitoring Coop examples For discussion purposes, CNRI introduced the notion of a simple data type and a complex data type. Simple data type describes characteristics of a single value (e.g., a cell in a spreadsheet) Complex data type is an aggregate of simple data types (e.g., a row in a spreadsheet can be described using a complex data type) June ’17 Decision Technical Report: documentation on how to use existing standards for the data type use case Technical Specification: intermediate spec, possible future standard, still under development Technical Standard: something new and useful, fully standardized Good news: we get an ISO number, no matter what happens