A Lightweight Structured Data Implementation Using JSON-LD and Schema A Lightweight Structured Data Implementation Using JSON-LD and Schema.org for Digital Repository Lucas Mak, Lisa Lorenzo, Nicole Smeltekop Michigan State University Libraries ALCTS CaMMS Cataloging Norms Interest Group (ALA Midwinter 2017, Atlanta GA, January 21, 2017)
Background Digital repository @ MSU Islandora repository Formats Text, audio, image, compound object Metadata MarcXML, MODS, DC, ETD-MS Stored as datastreams along with digital objects Fedora backend with Drupal front end
Structured Data Markup “Describes things on the Web with their properties”* Typically uses schema.org vocabulary Commonly in JSON-LD, RDFa, or microdata format
What others have done …
What we want …
Mapping Firstly, map MODS elements to Schema.org elements Not all MODS elements are mapped
Choosing Markup Format JSON-LD JavaScript Object Notation for Linked Data Will be embedded in HTML <head> as a block of codes instead of requiring adding attributes in HTML tags less work
Validation Google Structured Data Testing Tool Created a sample record based on the mapping and validated it using an online validation tool by Google Validation tool does not allow mix & match of vocabularies -> can’t mix dcterms with schema -> can’t mix properties from different schema types Google Structured Data Testing Tool @ https://search.google.com/structured-data/testing-tool?url
Creating Transformation PHP can run XSLT 1.0 only </> XSLT
Implementation MODS XSLT JSON-LD PHP Decided not to store the JSON-LD data as datastream to minimize maintenance When the item page loads, a PHP script grabs the MODS records, applies the XSLT against the MODS, and embeds the output JSON-LD (wrapped in <script> tag) into the HTML header
Getting URIs into JSON-LD Getting URIs into source data URIs inserted by authority vendor during authority processing (for records originated from the catalog) Inserts URIs into $0 in MarcXML or @valueURI in MODS manually or programmatically using conversion table Possibly using MarcNext in MarcEdit or querying APIs of various linked data services in the future Builds URIs for certain elements during transformation from MarcXML to MODS e.g. Language code: eng http://id.loc.gov/vocabulary/iso639-2/eng URIs in MODS get carried over to JSON-LD during XSLT transformation
Getting URIs into JSON-LD LCSH – the sticking point Not all possible LCSH strings have corresponding URIs Holocaust memorials http://id.loc.gov/authorities/subjects/sh88005153 Poland http://id.loc.gov/authorities/names/n79131071 Holocaust memorials -- Poland ?? Pattern headings Personal narratives, American, [French, etc.] http://id.loc.gov/authorities/subjects/sh99001715 Can we really use this URI? FAST – the “solution” Holocaust memorials http://id.worldcat.org/fast/958834 Poland http://id.worldcat.org/fast/1206891 Personal narratives--American http://id.worldcat.org/fast/1424071
Getting URIs into JSON-LD FAST – the “solution” Gets FAST headings and IDs from LCSH using OCLC FAST Converter Builds URIs based on FAST ID inserted in $0 during transformation into MODS e.g. World War (1939-1945) (OCoLC)fst01180924 http://id.worldcat.org/fast/1180924 Problem: Lag in update of the converter database @ https://fast.oclc.org/lcsh2fast/
Next Step Script to convert LCSH in MODS to FAST using OCLC FAST API MarcXML not available for all digital collections Moving away from creating MarcXML Should we just use FAST?? Markup in vocabularies other than schema.org, e.g. dcterms
Thank You! Lucas Mak makw@mail.lib.msu.edu Lisa Lorenzo lorenzo7@mail.lib.msu.edu Nicole Smeltekop nicole@mail.lib.msu.edu