Data Type Registries (DTR) WG RDA P3 Breakout 28 March 2014 Larry Lannom Corporation for National Research Initiatives

Slides:



Advertisements
Similar presentations
2008 Handle System Workshop Handle Value Types 17 June 2008 Larry Lannom Corporation for National Research Initiatives
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
IATI Technical Advisory Group Technical Proposals Simon Parrish IATI Technical Advisory Group, DIPR March 2010.
Effective management Accurate tracking Easier automation.
A Unified Approach to Combat Counterfeiting: Use of the Digital Object Architecture and ITU-T Recommendation X.1255 Robert E. Kahn President & CEO CNRI,
Persistent identifiers – an Overview Juha Hakala The National Library of Finland
Information Types and Registries Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
1 Workshop on Metadata Interoperability for Electronic Records Management November 15, 2001 Archives II, College Park, MD.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
Measurement Data Archive – Project Highlights GEC12 Nov 2011 Giridhar Manepalli Corporation for National Research Initiatives
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
EdReNe Workshop London, 8th – 9th January 2008 Enhancing the LOM application profiles using the DOI AIE – Italian Publishers Association.
Tobias Weigel (DKRZ) Tobias Weigel Deutsches Klimarechenzentrum (DKRZ) Persistent Identifiers Solving a number of problems through a simplistic mechanism.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
Measurement Data Archive GEC11 July 2011 Giridhar Manepalli Corporation for National Research Initiatives
Digital Object Architecture
The Final Study Period Report on MFI 6: Model registration procedure SC32WG2 Meeting, Sydney May 26, 2008 H. Horiuchi, Keqing He, Doo-Kwon Baik SC32WG2.
Corporation for National Research Initiatives DOI API IDF Members Meeting 22 June 2004 Larry Lannom CNRI.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
XCAP Needed Diffs Jonathan Rosenberg Cisco Systems.
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
1 Open Ontology Repository: Architecture and Interfaces Ken Baclawski Northeastern University 1.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Data Type Registries (DTR) RDA 4th WG/IG Collab Meeting NIST: Dec 2015 Larry Lannom CNRI.
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
“Custom” Checks/Constraints/Actions A proposal for the OASIS SDD TC Rich Aquino, Macrovision Julia McCarthy, IBM March 1, 2007.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
1 CS 502: Computing Methods for Digital Libraries Guest Lecture William Y. Arms Identifiers: URNs, Handles, PURLs, DOIs and more.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The Data Type.
Joint Declaration of Data Citation Principles (Overview) The Data Citation Synthesis Group Joint Declaration.
1 The Metadata Groups - Keith G Jeffery. 2 Positioning  Raise profile of metadata  Data first  Also software, resources, users  Achieve outputs/outcomes.
16 May 2006IVOA Interoperability – Registries WG1 VOResource Schema v1.0 Release 6 Ray Plante NCSA T HE I NTERNATIONAL V IRTUAL O BSERVATORY A LLIANCE.
Automate Blue Button Initiative Pull Workgroup Meeting December 13, 2012.
DEVELOPING WEB SERVICES WITH JAVA DESIGN WEB SERVICE ENDPOINT.
Data Typing BoF RDA Plenary 7 Tokyo: March 2016 Larry Lannom CNRI.
Metadata Schema Registries: background and context MEG Registry Workshop, Bath, 21 January 2003 Rachel Heery UKOLN, University of Bath Bath, BA2 7AY UKOLN.
International Planetary Data Alliance Registry Project Update September 16, 2011.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
Data Type Registries #2 Co-Chairs: RDA Chairs’ Mtg Gothenburg
RDA WG on Dynamic Data Citation
Current and Upcoming RDA Recommendations Dr. ir. Herman Stehouwer
Data Type Registries #2 12 Month Status Larry Lannom, Tobias Weigel Date Location TBD? CC BY-SA 4.0.
Data Type Registries Breakout
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
RDA Plenary 9 Breakout Session
Data Type Registries (DTR)
Wsdl.
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
Health Ingenuity Exchange - HingX
WG Research Data Collections An overview of the recommendation
Using the RDA Collections API to Shape Humanities Data
Agenda (AM) 9:30-10:15 Introduction to RDA
EUDAT Site and Service Registry
The new RDA: resource description in libraries and beyond
IVOA Interoperability Meeting - Boston
RDA uptake activities and plans: ESGF
Presentation transcript:

Data Type Registries (DTR) WG RDA P3 Breakout 28 March 2014 Larry Lannom Corporation for National Research Initiatives

Corporation for National Research Initiatives First Third: Review and Current State Second Third: Next Steps and Closure Last Third: Exercise the Prototype Agenda

Corporation for National Research Initiatives How do we now get from prototype to useful? Adoption strategies Evolution of the data model What happens when the WG ends? Governance/Certification issues Multiple registries assumed Who do you trust? CNRI (later DONA) will set up at least one for Handle Types as needed EUDAT2 – separate registry or use one from CNRI Interaction with PIT WG Properties/Types/Profiles? How to close the group Future within RDA Documentation Coordination outside RDA, e.g., EUDAT2 What else? Discussion Items

Corporation for National Research Initiatives Characterize data structures at multiple levels of granularity – Serve as macro or shortcut for understanding and processing data File formats & mime types are examples of solved problems at the container level but don’t solve finer grained interpretation – It’s a number in cell A3 but what does it mean Other structures with more limited use, e.g., many sci. data sets, may need multiple levels of typing Data types enable humans and machines to discover, process, and reason about data What are Data Types?

Corporation for National Research Initiatives Goal: Interoperable set of Type Registries Each type registered with unique identifier Common data model and expression Associate with services, tools, format registries, etc. Common API for machine consumption Schedule – 3/2013 – 9/2013 o Gathering use cases o Investigating other work in the area o First drafts of data model and functional specs for a type registry – 10/2013 – 12/2013 o Refine data model and functional specs o Deploy initial prototype – 1/2014 – 5/2014 o Finalize data model and functional specs o Deploy functional type registry for PID types o Release turnkey registry conforming to functional specs RDA Data Type Registries WG

Corporation for National Research Initiatives Users Repositories and Metadata Registries ID Type Payload ID Type Payload ID Type Payload ID Type Payload ID Type Payload ID Type Payload Federated Set of Type Registries Clients (process or people) look for types that match their criteria for data. For example, clients may look for types that match certain criteria, e.g., combine location, temperature, and date-time stamp. 1 Type Registry returns matching types. 2 Clients look up in repositories and metadata registries for data sets matching those types.3 Appropriate typed data is returned Discovery Use Case 4

Corporation for National Research Initiatives Users Typed Data ID Type Payload ID Type Payload ID Type Payload ID Type Payload ID Type Payload ID Type Payload …. Visualization I Agree Terms:… Rights Services Data Processing Data Set Dissemination Client (process or people) encounters unknown type.1 Resolved to Type Registry. 2 Response includes type definitions, relationships, properties, and possibly service pointers. Response can be used locally for processing, or, optionally 3 typed data or reference to typed data can be sent to service provider Process Use Case Federated Set of Type Registries

Corporation for National Research Initiatives Data type records contain – textual description for human understanding – provenance information (who created when and what) Records could contain – structured metadata about types for machines to process – encoding information (think file formats) – service information (think APIs to systems or applications that can process typed data) – semantic information (think description or predicate logic, useful for reasoning) Records do not enforce or define new ways to describe or represent data structures, but rely on existing frameworks and technologies – File formats (mime types), etc., may be used for describing encoding information – WSDL, REST APIs, etc., may be used for describing service information – OWL, KIF, etc., may be used for representing semantics and knowledge What do Data Type Records contain?

Corporation for National Research Initiatives Prototyped Data Type Data Model

Corporation for National Research Initiatives Multiple type registries will be deployed; perhaps one per community Type registries federate across each other; local policies may restrict (the scope of) such federation Users register data structures within a type registry and acquire a unique, persistent identifier (data type) Data type identifiers are then associated with corresponding data Registered type records are additionally disseminated by type registries as Linked Data compatible outputs General Guidelines – Users decide what data structures to register or not. If a data structure is expected to play a global role, then users are encouraged to register that data structure – Users are encouraged to first search if the data structure is registered prior to registering to avoid duplicates – Users decide the encoding, service, and semantic technology or framework that best suits them Proposed Use of Data Types

Corporation for National Research Initiatives Broad Functional Classification – Repos hold widely varying levels of data & metadata – High-level functional classification of the identified object needed to make sense of what is available, e.g., data object, metadata, repo description, contact info, etc. Simple License Information via PID Resolution – Data set access conditions cannot be predicted based on ID – For DataCite DOIs, a handle/type/value triple could be used to provide access information, probably through a level of indirection, resulting in a pop-up or intervening page or open linked data Object Types as a Short-cut for Dependent Services to Match Processing Requirements to Data Objects – Using data acquisition as an example o Determine object type you are trying to build o Consult registry to index into an ontology to dynamically define required and optional properties o Does the input data have what is needed? Registration of PID Types (in ID/Type/Value triples) for Data Processing and Interpretation – Distinguish pointers to objects from pointers to metadata from pointers to services – Enable complex client interactions as opposed to simple one-to-one re-direction Example DTR Use Cases

Corporation for National Research Initiatives EUDAT Project E-Infrastructure project aimed at providing data-management services to research communities Services: – B2FIND interdisciplinary metadata catalog – B2SHARE easy archiving for researchers – B2SAFE replicating data for safety and easy access – …

Corporation for National Research Initiatives PID use by EUDAT EUDAT uses Handle System type PIDs for its central DO administration It needs to attache extra information to the PID – Keep track of “repository of origin” RoR – Keep track of original DO PID – Some other stuff …. Proposed to make use of the future DTR For now request specific handle prefix to register EUDAT data-types and use that to type the handle records

Corporation for National Research Initiatives How do we now get from prototype to useful? Adoption strategies Evolution of the data model What happens when the WG ends? Governance/Certification issues Multiple registries assumed Who do you trust? CNRI (later DONA) will set up at least one for Handle Types Interaction with PIT WG Properties/Types/Profiles? How to close the group Future within RDA Documentation Coordination outside RDA, e.g., EUDAT2 What else? Discussion Items

Corporation for National Research Initiatives Initial Prototype:

TypeExpected ValuesExampleNotes Primitive Types booleanYes or No intAn Integer stringA free-form string hexA restricted alpha-decimal string IDA string with inherent value TCP-EndPoint(IP, Port, Protocol)IP:port. Ex 1: ( , 9900, DOP) datetimeA UTC string URLA string with inherent formatting Lat-long And more… Primitive types are basic vehicles for specifying semantically relevant characters. We expect to use primitive types in registered data types as illustrated in Table 3. Primitive Types

Note that the list of properties only highlight the ‘minimum’ set of properties. A given implementation may choose to include other properties perhaps. Another point to think is whether the types will define the serialization of the type or not. Proposed Data Types TypeMinimum Expected Properties for the Type ExampleNotes Registered Data Types /checksum(checksum_type: string, checksum_value: hex) (MD5,1234abcd5678cdef1234) /format(mime_type: string)application/pdf /reference(type_of_reference: string, ID: ID, reference_value: URL/TCP- EndPoint) Ex 1: (data, 11479/1234, ( , 9900, DOP)) Ex 2: (metadata, metadata1234, Data reference, metadata reference, landing page to objects, source references, version reference, presentation reference, etc., can all be covered here using “type_of_reference” property. /repository(ID: ID, repository_address: URL/TCP-EndPoint) Ex 1: (data.gov, Ex 2: (11479/repository, pository) /data_mutability(mutability: boolean)Ex 1: (yes) /access_rights(reference: URL) And more… Registered data types have one or more properties. Those properties could be either primitive or registered data types in turn. The table below lists the various ‘types’ to be registered. Furthermore, such types have one or more properties. The ‘type’ of each of those properties is specified right next to the property name with a ‘:’ delimiter.