Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Type Registries (DTR) WG RDA P3 Breakout 28 March 2014 Larry Lannom Corporation for National Research Initiatives

Similar presentations


Presentation on theme: "Data Type Registries (DTR) WG RDA P3 Breakout 28 March 2014 Larry Lannom Corporation for National Research Initiatives"— Presentation transcript:

1 Data Type Registries (DTR) WG RDA P3 Breakout 28 March 2014 Larry Lannom Corporation for National Research Initiatives http://www.cnri.reston.va.us/

2 Corporation for National Research Initiatives First Third: Review and Current State Second Third: Next Steps and Closure Last Third: Exercise the Prototype Agenda

3 Corporation for National Research Initiatives How do we now get from prototype to useful? Adoption strategies Evolution of the data model What happens when the WG ends? Governance/Certification issues Multiple registries assumed Who do you trust? CNRI (later DONA) will set up at least one for Handle Types as needed EUDAT2 – separate registry or use one from CNRI Interaction with PIT WG Properties/Types/Profiles? How to close the group Future within RDA Documentation Coordination outside RDA, e.g., EUDAT2 What else? Discussion Items

4 Corporation for National Research Initiatives Characterize data structures at multiple levels of granularity – Serve as macro or shortcut for understanding and processing data File formats & mime types are examples of solved problems at the container level but don’t solve finer grained interpretation – It’s a number in cell A3 but what does it mean Other structures with more limited use, e.g., many sci. data sets, may need multiple levels of typing Data types enable humans and machines to discover, process, and reason about data What are Data Types?

5 Corporation for National Research Initiatives Goal: Interoperable set of Type Registries Each type registered with unique identifier Common data model and expression Associate with services, tools, format registries, etc. Common API for machine consumption Schedule – 3/2013 – 9/2013 o Gathering use cases o Investigating other work in the area o First drafts of data model and functional specs for a type registry – 10/2013 – 12/2013 o Refine data model and functional specs o Deploy initial prototype – 1/2014 – 5/2014 o Finalize data model and functional specs o Deploy functional type registry for PID types o Release turnkey registry conforming to functional specs RDA Data Type Registries WG

6 Corporation for National Research Initiatives Users Repositories and Metadata Registries ID Type Payload ID Type Payload ID Type Payload ID Type Payload ID Type Payload ID Type Payload Federated Set of Type Registries Clients (process or people) look for types that match their criteria for data. For example, clients may look for types that match certain criteria, e.g., combine location, temperature, and date-time stamp. 1 Type Registry returns matching types. 2 Clients look up in repositories and metadata registries for data sets matching those types.3 Appropriate typed data is returned.4 3 1 2 Discovery Use Case 4

7 Corporation for National Research Initiatives Users Typed Data ID Type Payload ID Type Payload ID Type Payload ID Type Payload ID Type Payload ID Type Payload 10100 11010 101…. Visualization I Agree Terms:… Rights Services Data Processing Data Set Dissemination Client (process or people) encounters unknown type.1 Resolved to Type Registry. 2 Response includes type definitions, relationships, properties, and possibly service pointers. Response can be used locally for processing, or, optionally 3 typed data or reference to typed data can be sent to service provider. 4 1 2 3 4 4 Process Use Case Federated Set of Type Registries

8 Corporation for National Research Initiatives Data type records contain – textual description for human understanding – provenance information (who created when and what) Records could contain – structured metadata about types for machines to process – encoding information (think file formats) – service information (think APIs to systems or applications that can process typed data) – semantic information (think description or predicate logic, useful for reasoning) Records do not enforce or define new ways to describe or represent data structures, but rely on existing frameworks and technologies – File formats (mime types), etc., may be used for describing encoding information – WSDL, REST APIs, etc., may be used for describing service information – OWL, KIF, etc., may be used for representing semantics and knowledge What do Data Type Records contain?

9 Corporation for National Research Initiatives Prototyped Data Type Data Model

10 Corporation for National Research Initiatives Multiple type registries will be deployed; perhaps one per community Type registries federate across each other; local policies may restrict (the scope of) such federation Users register data structures within a type registry and acquire a unique, persistent identifier (data type) Data type identifiers are then associated with corresponding data Registered type records are additionally disseminated by type registries as Linked Data compatible outputs General Guidelines – Users decide what data structures to register or not. If a data structure is expected to play a global role, then users are encouraged to register that data structure – Users are encouraged to first search if the data structure is registered prior to registering to avoid duplicates – Users decide the encoding, service, and semantic technology or framework that best suits them Proposed Use of Data Types

11 Corporation for National Research Initiatives Broad Functional Classification – Repos hold widely varying levels of data & metadata – High-level functional classification of the identified object needed to make sense of what is available, e.g., data object, metadata, repo description, contact info, etc. Simple License Information via PID Resolution – Data set access conditions cannot be predicted based on ID – For DataCite DOIs, a handle/type/value triple could be used to provide access information, probably through a level of indirection, resulting in a pop-up or intervening page or open linked data Object Types as a Short-cut for Dependent Services to Match Processing Requirements to Data Objects – Using data acquisition as an example o Determine object type you are trying to build o Consult registry to index into an ontology to dynamically define required and optional properties o Does the input data have what is needed? Registration of PID Types (in ID/Type/Value triples) for Data Processing and Interpretation – Distinguish pointers to objects from pointers to metadata from pointers to services – Enable complex client interactions as opposed to simple one-to-one re-direction Example DTR Use Cases

12 Corporation for National Research Initiatives EUDAT Project E-Infrastructure project aimed at providing data-management services to research communities Services: – B2FIND interdisciplinary metadata catalog – B2SHARE easy archiving for researchers – B2SAFE replicating data for safety and easy access – …

13 Corporation for National Research Initiatives PID use by EUDAT EUDAT uses Handle System type PIDs for its central DO administration It needs to attache extra information to the PID – Keep track of “repository of origin” RoR – Keep track of original DO PID – Some other stuff …. Proposed to make use of the future DTR For now request specific handle prefix to register EUDAT data-types and use that to type the handle records

14 Corporation for National Research Initiatives How do we now get from prototype to useful? Adoption strategies Evolution of the data model What happens when the WG ends? Governance/Certification issues Multiple registries assumed Who do you trust? CNRI (later DONA) will set up at least one for Handle Types Interaction with PIT WG Properties/Types/Profiles? How to close the group Future within RDA Documentation Coordination outside RDA, e.g., EUDAT2 What else? Discussion Items

15 Corporation for National Research Initiatives Initial Prototype: http://typeregistry.org/registrar/http://typeregistry.org/registrar/

16 TypeExpected ValuesExampleNotes Primitive Types booleanYes or No intAn Integer stringA free-form string hexA restricted alpha-decimal string IDA string with inherent value TCP-EndPoint(IP, Port, Protocol)IP:port. Ex 1: (10.27.4.102, 9900, DOP) datetimeA UTC string URLA string with inherent formatting Lat-long And more… Primitive types are basic vehicles for specifying semantically relevant characters. We expect to use primitive types in registered data types as illustrated in Table 3. Primitive Types

17 Note that the list of properties only highlight the ‘minimum’ set of properties. A given implementation may choose to include other properties perhaps. Another point to think is whether the types will define the serialization of the type or not. Proposed Data Types TypeMinimum Expected Properties for the Type ExampleNotes Registered Data Types /checksum(checksum_type: string, checksum_value: hex) (MD5,1234abcd5678cdef1234) /format(mime_type: string)application/pdf /reference(type_of_reference: string, ID: ID, reference_value: URL/TCP- EndPoint) Ex 1: (data, 11479/1234, (10.27.4.102, 9900, DOP)) Ex 2: (metadata, metadata1234, http://data.gov/metadata1234) Data reference, metadata reference, landing page to objects, source references, version reference, presentation reference, etc., can all be covered here using “type_of_reference” property. /repository(ID: ID, repository_address: URL/TCP-EndPoint) Ex 1: (data.gov, http://data.gov) Ex 2: (11479/repository, http://hdl.handle.net/11479/re pository) /data_mutability(mutability: boolean)Ex 1: (yes) /access_rights(reference: URL) And more… Registered data types have one or more properties. Those properties could be either primitive or registered data types in turn. The table below lists the various ‘types’ to be registered. Furthermore, such types have one or more properties. The ‘type’ of each of those properties is specified right next to the property name with a ‘:’ delimiter.


Download ppt "Data Type Registries (DTR) WG RDA P3 Breakout 28 March 2014 Larry Lannom Corporation for National Research Initiatives"

Similar presentations


Ads by Google