Presentation is loading. Please wait.

Presentation is loading. Please wait.

Winds of Change James Madison University Libraries’ Participation in the VIVA/Zepheira/Atlas Linked Data Pilot Steven W. Holloway Director of Metadata.

Similar presentations


Presentation on theme: "Winds of Change James Madison University Libraries’ Participation in the VIVA/Zepheira/Atlas Linked Data Pilot Steven W. Holloway Director of Metadata."— Presentation transcript:

1 Winds of Change James Madison University Libraries’ Participation in the VIVA/Zepheira/Atlas Linked Data Pilot Steven W. Holloway Director of Metadata Strategies, James Madison University Rebecca B. French Metadata Analyst Librarian, James Madison University

2 VIVA/Zepheira/ATLAS Pilot Project
Virtual Library of Virginia Consortium (VIVA) pilot project Contracted with Zepheira and Atlas Systems Seven Virginia academic libraries participated Contributed over 48K Special Collections/Archival Records JMU contributed 900 Special Collections records [SLIDE 2.0] The pilot project: The Virtual Library of Virginia Consortium, known by the acronym VIVA, [SLIDE 2.1] contracted in 2016 with Zepheira and Atlas Systems to launch a pilot project using archives/special collections records from member schools in a Linked Data experiment. The rationale for the pilot includes the inability of the major search engines to parse MARC records, and the steep technical hurdle to converting our MARC records into Linked Data and then exposing them to search engine indexing in a sustainable fashion. Adding subfield “0” URIs to MARC records is the easiest part of the conundrum. The pilot project sought to exploit materials that are either rare or unique to the VIVA schools, a desideratum that translated into limiting the data pool to Special Collections and archival collections metadata. [SLIDE 2.2] The seven schools – The Library of Virginia, College of William and Mary, University of Virginia, George Mason University, Virginia Commonwealth University, Virginia Tech, and James Madison University -- [SLIDE 2.3] provided over 48,000 MARC XML records. [SLIDE 2.4] JMU’s share was 900 Special Collections records.

3 VIVA/Zepheira/ATLAS Pilot Project
James Madison University used four methods to add over 4.5K subfield “0” URIs to its 900 Special Collections bibliographic records: Affordances within Sierra ILS Link Identifier tool in MarcEdit OpenRefine reconciliation services XQuery scripting using eXist-db [SLIDE 3.0] Zepheira’s initial transformation of the MARC records uses pybibframe, a Github-accessible tool developed for Library of Congress. The records are converted into Zepheira’s flavor of BIBFRAME 1.0, known as BIBFRAME Lite. While a step in the Zepheira conversion process entails automated reconciliation with LC authority links, we at JMU preferred to enrich the records ourselves as a means of guaranteeing accurate linkages to authority sources. We used four methods to add over 4,500 subfield “0” URIs to the 900 Special Collections MARC records: [SLIDE 3.1] Affordances within our Sierra ILS, [SLIDE 3.2] Terry Reese’s MarcEdit, [SLIDE 3.3] OpenRefine, [SLIDE 3.4] and XQuery scripting. Thousands of vendor-supplied bib records in our ILS already had OCLC FAST record numbers – it was elementary, if tedious, to convert those into actionable URIs within the ILS itself. Using similar methods, we converted 043 MARC geographic area codes into LC Linked Data URIs. The redoubtable MarcEdit Link Identifier tool, part of the MarcNext suite of tools, did an adequate job of reconciling our 1XX, 6XX and 7XX headings against LCSH and LCNAF. Additional reconciliations were achieved using OpenRefine. None of these methods were effective at capturing 655 genre/form headings, so we loaded the enriched MARCXML data into eXist-db and added about a thousand LCGFT, FAST and AAT URIs using XQuery database scripting.

4 Base URI: http://link.lib.jmu.edu
[SLIDE 4] What does Zepheira Library Link look like? To human beings, the HTML pages index the BIBFRAME 1.0 records into faceted classes, called “Resources” that are both familiar and unfamiliar to MARC mavens like you. Base URI:

5 [SLIDE 5] Opening a Form facet heading, “Minute Books”

6 VIVA/Zepheira/ATLAS Pilot Project
[SLIDE 6] brings up a “Resource” page, with the Form heading and the records that use them, in this case a single item.

7 [SLIDE 7] Selecting the Item link brings up a short-form BIBFRAME Item record that can be expanded into its constituent

8 [SLIDE 8] Work and

9 [SLIDE 9] Instance records.

10 VIVA/Zepheira/ATLAS Pilot Project
[SLIDE 10] If a heading with an authority URI is opened using the ”focus of” link on the right,

11 VIVA/Zepheira/ATLAS Pilot Project
[SLIDE 11] a hyperlink to the remote authority record appears, together with the vocabulary name.

12 [SLIDE 12] The “Borrow it” link directs the would-be patron to the main Sierra catalog webpage.

13 VIVA/Zepheira/ATLAS Pilot Project
JMU Library Link structured data Format: RDFa, meta tags with no namespaces Vocabularies + (distinctive tag count) RDF (1) RDFS (1) RDFA (2) OGP (2) VOID (2) Creative Commons (3) DC (4) DC Terms (6) schema.org (37) BIBFRA.ME/vocab/ (45) [SLIDE 13] What do the machines see when they parry and thrust with Library Link websites? For our records, Zepheira Library Link injected ten different vocabularies into the HTML code using RDFa and meta-tags. The structured data is mostly schema.org and BIBFRAME-LITE, using 37 and 45 distinct URIs, respectively, in the sample we analyzed. Only a few values from the MARC Leader, control fields and were translated into BIBFRAME; the 043 MARC Geographic Area URIs we had added as subfield “0”s were among the casualties, unfortunately. Catering to the bean-counting propensities of the search engines, the markup is massively redundant, with multiple predicates used for the same literal object values: DC Description, DC Terms Description, schema.org Description, BIBFRAME Lite Description, etc. etc. etc.

14 [SLIDE 14.2] the list of Resource types,
[SLIDE 14.0] Looking at a subset of the triples in the JMU Library Link entry page, [SLIDE 14.1] you will observe a specification of the data presentation as RDFa, [SLIDE 14.2] the list of Resource types, [SLIDE 14.3] with the title and [SLIDE 14.4] description of the dataset.

15 VIVA/Zepheira/ATLAS Pilot Project
[SLIDE 15] As you can see in the raw source code, the resources themselves, like the form heading “Minute Books” are system-assigned an internal URI.

16 VIVA/Zepheira/ATLAS Pilot Project
[SLIDE 16] The literal value “Minute books” is the object of redundant schema.org and BIBFRAME Lite predicates.

17 VIVA/Zepheira/ATLAS Pilot Project
[SLIDE 17.0] The webpage with the authority record links identifies the literal value of the heading with a BIBFRAME Lite “name” predicate, [SLIDE 17.1] uses the authority record link itself as the object of a schema.org “sameAs” predicate, [SLIDE 17.2] and specifies the authority vocabulary itself with a BIBFRAME MARC “source” predicate.

18 Page 1 Location of Library Link Records in Search Results (Partial)
[SLIDE 18] What do the search engines make of our hyper-structured BIBFRAME LITE records? We tested queries for four Instance, and one Person resource records using both quoted and unquoted search strings, and added “jmu” to the unquoted search strings. We ran the queries using four search engines, Google, Bing, Yahoo, and Duck Duck Go. The Google, Bing and Yahoo searches ran on the current Mac OS FireFox browser, whereas Duck Duck Go was tested using the anonymity network TorBrowser. Of these 14 discrete searches, the Library Link record appeared as a result on Google’s first page for 4 of them. Bing returned the Library Link record in 9 out of 14 first page search results, Yahoo returned none, and Duck Duck Go, using TorBrowser, returned 8 out of 14 first page search results.

19 VIVA/Zepheira/ATLAS Pilot Project
Google, Bing, and Duck Duck Go are using the structured data; Yahoo is not Library Link records are usually ranked highly when they appear in search results Google shows Library Link results for items that are also in JMU Scholarly Commons Bing typically shows Library Link results for items with online finding aids [SLIDE 19.0] A few things we’ve noticed from this (admittedly small) sample: [SLIDE 19.1] It’s clear that Google, Bing and Duck Duck Go are all using the structured data. Yahoo is not. [SLIDE 19.2] When our Library Link records are present in the search results, they are ranked highly, usually appearing in the top few hits on the first page. [SLIDE 19.3] Neither Google, Bing nor Duck Duck Go found the Library Link record for every item we searched, and there were some interesting differences in which items were found by which search engine. With one exception, Google only found the Library Link record for items that have been digitized and are available in our institutional repository, JMU Scholarly Commons. [SLIDE 19.4] With one exception, Bing returned Library Link results when the item also had an online finding aid, either on the JMU Special Collections website or through the Virginia Heritage database.

20 VIVA/Zepheira/ATLAS Pilot Project
Adding “jmu,” the school initials, to the search strings only generated one Library Link first page result “Simms, Lucy,” the only person subject heading we searched for, garnered Library Link first page results using only Bing and Duck Duck Go [Slide 20.1] Adding “jmu” the initials of James Madison University, to the search strings changed the rankings of the first page results, but unambiguously generated only one Library Link first page result. [SLIDE 20.2] “Simms, Lucy” the only person subject heading we searched for, garnered Library Link first page results using Bing and Duck Duck Go, despite the brevity of the common name. Lucy F. Simms, was a prominent African American educator with several recently mounted JMU-sponsored websites, related LC subject headings, and has the distinction of having both a Harrisonburg Virginia public school and a JMU research center named after her. Most of the other first page hits in all the search engines we tested pointed at the correct entity, to be sure, but it is still an impressive feat for Library Link’s SEO to corral a search string as unpromising as “Simms, Lucy” with or without the “jmu” descriptor, with or without quotation marks.

21 VIVA/Zepheira/ATLAS Pilot Project
Conclusions: The MARCXML => BIBFRAME Lite conversion captured institution-supplied subfield “0” URIs in all 1XX, 6XX, 7XX variable fields we tested, but did not preserve the 043 MARC Geographic Area URIs. The extreme structured data approach often, but not invariably, yields first-page SEO results for the search engines that support it, with or without adding an institution’s initials to the search string. Library Link’s success in bagging Bing and Duck Duck Go person subject heading searches on “Simms, Lucy” is impressive. Maybe there is a future for subject analysis cataloging… [Slide 21.1] Conclusions: [Slide 21.2] The MARC XML => BIBFRAME Lite conversion captured institution-supplied subfield “0” URIs in all 1XX, 6XX, 7XX variable fields we tested, but did not preserve the 043, MARC Geographic Area URIs. [Slide 21.3] Zepheira’s extreme structured data approach often, but not invariably, yields first-page SEO results for the search engines that support it, with or without adding an institution’s initials to the search string. [Slide 21.4] Library Link’s success in bagging Bing and Duck Duck Go person subject heading searches on “Simms, Lucy” is impressive. [Slide 21.5] Maybe there is a future for subject analysis cataloging after all…

22 Questions? Illustration adapted from Please include rights metadata next time you create a Wordpress site.


Download ppt "Winds of Change James Madison University Libraries’ Participation in the VIVA/Zepheira/Atlas Linked Data Pilot Steven W. Holloway Director of Metadata."

Similar presentations


Ads by Google