Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015.

Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Linked Data for Libraries (LD4L) Andrew W. Mellon Foundation made a two- year $999K grant to Cornell, Harvard, and Stanford starting Jan 2014 Partners will work together to develop an ontology and linked data sources that provide relationships, metadata, and broad context for Scholarly Information Resources Leverages existing work by both the VIVO project and the Hydra Partnership

Specific LD4L Goals Free information from existing library system silos to provide context and enhance discovery of scholarly information resources Leverage usage information about resources Link bibliographic data about resources with academic profile systems and other external linked data sources Assemble (and where needed create) a flexible, extensible LD ontology to capture all this information about our library resources Demonstrate combining and reconciling the assembled LD across our three institutions

Bibliographic Data MARC MODS EAD Bibliographic Data MARC MODS EAD Person Data VIVO ORCID ISNI VIAF Person Data VIVO ORCID ISNI VIAF Usage Data Circulation Citation Curation Exhibits Research Guides Syllabi Tags Usage Data Circulation Citation Curation Exhibits Research Guides Syllabi Tags LD4L Data Sources

Agenda Simeon – Intro (done) and use cases Jon – Ontologies Steven – Modified Bibframe for LD4L Rebecca – UC2, mass conversion, post processing Lynette – UC1.1, ORE, scraping, lookups Naun – Preview of LD4P

LD4L Use Case Clusters 1.Bibliographic + curation data 2.Bibliographic + person data 3.Leveraging external data including authorities 4.Leveraging the deeper graph (via queries or patterns) 5.Leveraging usage data 6.Three-site services, e.g. cross-site search 42 Raw Use Cases 12 Refined Use Cases in 6 clusters…

Cluster 1. Bibliographic + curation data UC1.1: Build a virtual collection “As a faculty member or librarian, I want to create a virtual collection or exhibit containing information resources from multiple collections across multiple universities either by direct selection or by a set of resource characteristics, so that I can share a focused collection with a.”

UC1.1: Build a virtual collection Individual selection or resources Collection may be exhibit, reading list, etc. Includes description, ordering Shared or private Enable across institutions Well developed, uses annotations Demo => Lynette Rayle

UC1.2: Tag scholarly information resources to support reuse “As a librarian, I would like to be able to tag scholarly information resources from one or multiple institutions into curated lists, so that I can feed these these lists into subject guides, course reserves, or reference collections. I'd like these lists to be portable (into Drupal, into LibGuides, into Spotlight! or Omeka, into Sakai, e.g.) and durable. I'd like these lists/tags to selectively feed back into the discovery environment without having to modify the catalog records.”

UC1.2: Tag scholarly information resources to support reuse Scale O(100,000) items Use for e.g. virtual subject library Selection by rules and patterns Collaborative curation Feed data into discovery environment Same model as 1.1, partially developed CULLR v2

CULLR -> provide virtual library “views/collections” as part of main Blacklight discovery system

Cluster 2. Bibliographic + person data UC2.1: See and search on works by people to discover more works, and better understand people “As a researcher, I'd like to see / search on works by University faculty, to discover works of interest based on connection to people, and to understand people based on their relation to works.”

UC2.1: See and search on works by people to discover more works, and better understand people Profile + catalog data Perhaps between institutions: multiple catalogs + multiple profile systems Poor journal data suggests likely can do more in non-journal disciplines Demo => Rebecca Younes

UC3.1: Search with Geographic Data for Record Enrichment and Pivoting “As a researcher, I'd like to see the geographic context of my search results, and be able to pivot, extend or refine a search with a single click, in order to better assess found resources, find related resources, and filter or expand search results to broaden or narrow a search on the fly.”

UC3.x: Search with XYZ for Record Enrichment and Pivoting XYZ = geographic / subject /person data External data and entity linking Things not strings Different data types have different challenges and data sources

Cluster 4. Leveraging the deeper graph (via queries or Patterns) UC4.1: Identifying related works “As a scholar, I would like to find all the images associated with various instances of a work sorted by time, so that I can see how the depictions of or illustrations in a work have changed over time.” “(GloPAD specific): As a scholar, I would like to find all costume photographs and scene illustrations for various stagings and performances of the plays of a particular author or the operas of a particular composer, so that I can see how the visual look of performances of the plays or operas have changed over time.”

UC4.1: Identifying related works use complex graph relationships via queries or patterns (rather than direct connections allow discovery that would not be possible without the semantics of different relationship restricted by available data Steven Folsom work on complex and linked descriptions for hip-hop flyers

UC4.2: Leverage the deeper graph to surface more relevant works “As a researcher, I would like to see resources in response to a search where the relevance ranking of the results reflects the "importance" of the works, based on how they have been used or selected by others, so that I can find important resources that might otherwise be "hidden" in a large set of results.”

Cluster 5. Leveraging usage data UC5.1: Research guided by community usage “As a researcher, I want to find what is being used (read, annotated, bought by libraries, etc.) by the scholarly communities not only at my institution but at others, and to find sources used elsewhere but not by my community”

UC5.1: Research guided by community usage need to understand community of user – direct: auth and from identity? – inferred as part of discovery? cross-institutional – need usage data that can be shared and merged

UC5.2: Be guided in collection building by usage “As a librarian, I would like help building my collection by seeing what is being used by students and faculty.” “As a subject librarian, I would like to see what resources in my subject area are heavily used at peer institutions but are not in my institution’s collection.”

UC5.2: Be guided in collection building by usage business analytics help libraries make best use of collection building activities and funds useful at both institutional or cross- institutional levels cross-institutional – need usage data that can be shared and merged

Usage data Most work at Harvard

Cluster 6. Three site services UC6.1: Cross-site search As a scholar I don’t want my discovery process to be constrained by the collection boundaries of my university yet I want to retain the detailed coverage of special collections that are important in my field. Bonus points: I want results ranked by the scholarly value, not simply popularity in the public eye.

UC6.1: Cross-site search demonstrate the power of sharing library data as linked open data – opportunities for third parties – benefits for partners deal with dupes & near dupes, group in works Will have simple demo by end of project using all the LOD from our three institutions merged in LD4L ontology / Bibframe

LD4L Ontology Team Overview Metadata Working Group May 15, 2015 Jon Corson-Rikert

The team

Goals for an LD4L ontology framework (from the 2013 proposal) reuse appropriate parts of currently available ontologies while introducing extensions and additions where necessary be sufficiently expressive to encompass traditional catalog metadata from the 3 partners maintain compatibility with VIVO and research networking ontologies include usage and other contextual elements

Early LD4L discussions What library information is local vs. global? How do we add links to external identifiers, authorities, and real world objects (RWOs)? How do we link across our libraries? What existing ontologies should we use? What services and workflows do we need? Which services are (or aren’t) ready for prime time?

Other starting points LD4L is not about original cataloging (that’s LD4P) LD4L is not trying to reconcile the different approaches of Schema.org and BIBFRAME Linking to data beyond library catalogs is a focus – Predisposed to re-use ontologies appropriate to the domain involved Ontology work has focused on our use cases

MWG April 18, 2003

Ontology design Source: MK Bergmann, http://www.mkbergman.com

Ontologies need to evolve http://www.mkbergman.com/908/a-new-methodology-for-building-lightweight-domain-ontologies/

Ontology orthogonality http://www.w3.org/2007/Talks/0223-Bangalore-IH/Slides.html

BIBFRAME and Schema.org in LD4L We considered the needs of our project and our libraries Both will likely need the greater expressiveness that BIBFRAME offers – Technical services units are actively participating in BIBFRAME training and trials Not an either/or proposition -- libraries also want broader discoverability – Value in exposing bibliographic metadata on the web with Schema.org tags

Conversations continue on the BF list … “Much of Bibframe’s problems seem to come from trying to keep everything from the past (MARC) while moving to something fit for the future, an ambition that has I think also afflicted RDA” (Thomas Meehan) “Why do the 'powers that be' think that we even want our local catalogs to be semantically connected to the web or have all of our data linked?!” (Michael Ayres) “The whole linking idea is great, but really, after 40 years using MARC21, some yahoo wants to unravel everything and bill me for it? I don’t think so.” (Jeffrey Trimble)

Simple core framework

Becomes more complex …

Gets downright messy …

Simple framework, ironing out details

More work in progress …

Local vs. global identifiers Establishing local identifiers allows libraries to make their own assertions about resources and authorities – Assigning stable linked data URIs are accessible anywhere Shared references to global identifiers enable both direct and indirect linkages OCLC, VIAF, ISNI, ORCID and others are addressing global identifiers at scale

Local linked data identifiers For library resources but also local or unique information on people, organizations, events, collections – There is value and credibility in the institutional namespace Supplement with locally-sourced and/or locally-targeted annotations – Not restricted to local generation or visibility Does not require exposing all operational data

From strings to things People Organizations Places Subjects Events Works Datasets Image: designyoutrust.com

Entity resolution Can potentially happen before, during, or after MARC to RDF conversion Can draw on existing authorities directly and indirectly – Local sources may involve custom workflows and services – Remote sources are likely shared and can more likely benefit from standardized services Assess potential to loop back – From non-MARC sources to other catalog resources – From external sources to local

Working with non-MARC metadata Faculty research profiles (CAP, Faculty Finder, VIVO) Library guides and other library-sourced web content Digital collections that vary widely in size and complexity, and that encompass diverse subject domains Pilot projects – Cornell’s Hip Hop flyers – Harvard’s Visual Image Access metadata

Usage data Inspiration – Harvard’s Stacklife Goals – To supplement library discovery interfaces – To inform collection review and additions Challenges – Data availability – Concerns for patron privacy Potential for a normalized stack score across institutions

Stacklife screenshot

LD4L and BIBFRAME: Then and Now Image credit: http://www.chicagobearboards.com/node/4372http://www.chicagobearboards.com/node/4372 Steven Folsom Cornell University Library Metadata Working Group Forum May 15 th, 2015

Why is LD4L attempting to Use BIBFRAME? An ontology built by the Library of Congress as a replacement for MARC –If/when it solidifies, we will likely see data provided by libraries using this model Open Source tools available for converting MARC to BIBFRAME LD4L perspective is that BIBFRAME will likely be a starting point, with post-processing needed to meet some of our use cases Didn’t want to develop an ontology from scratch within the timeframe of the grant…

BIBFRAME Vocabulary Status (As of May 15, 2015, 11 am EST) Paused in its current/first version since January 2014 to allow the community to respond with feedback and so that tools could be built to test it in pilot projects Expecting a new version this spring based on feedback, more on this in a moment Structure Classes: Work, Instance, Authority and Annotation –bf:Work loosely conflates FRBR concepts of Works and Expressions –bf:Instance loosely aligns with FRBR Manifestion –No equivalent to FRBR Item yet in the vocabulary

What BIBFRAME Is. Image credits: http://bibframe.org/vocab-model/ and http://en.wikipedia.org/wiki/George_Clooneyhttp://bibframe.org/vocab-model/http://en.wikipedia.org/wiki/George_Clooney

Image credits: http://bibframe.org/vocab-model/ and http://en.wikipedia.org/wiki/George_Clooneyhttp://bibframe.org/vocab-model/http://en.wikipedia.org/wiki/George_Clooney What BIBFRAME Isn’t.

LD4L’s Role in Grounding BIBFRAME in Linked Data Best Practices Image credit: http://www.moviefanatic.com/2014/01/gravity-george-clooney-reacts-to-oscar-nominations/

BIBFRAME: Areas for Improvement Including, but not limited to…

Improvements: Reuse Existing Classes and Properties from Proven Ontologies bf:Person foaf:Person bf:Event schema:Event bf:subject dcterms:subject

Improvements: Provide Inverse Relationships For example: bf:hasPart, add bf:isPartOf bf:hasReproduction, add bf:reproduces) dcterms:creator, add bf:creatorOf

Improvements: Lessen the number of sub- properties that do not add semantics For Example: bf:titleValue doesn’t add more meaning between a title entity and its value. The more generalizable rdf:value works fine.

Improvements: Focus on Things not Strings

BIBFRAME (Post-Makeover)

Questions I want to answer going forward… How will descriptive practice change in with this more modular approach to the structure? –What changes to content standards will be needed? (Especially for related entities, e.g. People, Publishers…) What new data sources will the move to RDF allow for? (VIVO, MusicBrainz, Getty Vocabularies, ISNI) What types of entities can we describe better in RDF? (e.g. Awards, Events) How can we take advantage of URIs right now in MARC, while ensuring a smoother transition to linked data?

Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015.

Similar presentations

Presentation on theme: "Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015.

Similar presentations

Presentation on theme: "Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015."— Presentation transcript:

Similar presentations

About project

Feedback