Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015.

Slides:



Advertisements
Similar presentations
Lorcan Dempsey OCLC Big Heads – Heads of Technical Services of Large Research Libraries ALA 2013 Chicago 28 June things about
Advertisements

An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Bibliographic Framework Initiative Approach for MARC Data as Linked Data Sally McCallum Library of Congress.
Linked Data, Discovery and Discoverability John McCullough Senior Product Manager, OCLC December 3, 2014 UCL Discovery and Discoverability.
The Linked Data for Libraries (LD4L) Project: A Progress Report Dean B. Krafft, Cornell University Library Tom Cramer, Stanford University Libraries CNI.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
LEVERAGING THE DEEPER GRAPH (VIA QUERIES OR PATTERNS) STEVEN FOLSOM PAOLO CICCARESE LD4L USE CASE 4.
EXtensible Catalog David Lindahl University of Rochester.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
Sakaibrary in 2.4: User Feedback Guides Development Jon Dunn and Mark Notess Digital Library Program Indiana University.
Open Library Environment Designing technology for the way libraries really work November 19, 2008 ~ ASERL, Atlanta Lynne O’Brien Director, Academic Technology.
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
BIBFLOW: An IMLS Project
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Leveraging Names with Linked Data Karen Smith-Yoshimura Ralph LeVan 2010 RLG Partnership Annual Meeting Chicago, IL 9 June 2010.
Project Report Presentation and Update October 10, 2014 Jeff Mixter - OCLC Research Patrick OBrien - Montana State Univeristy Kenning Arlitsch - Montana.
1 Open Library Environment Designing technology for the way libraries really work December 8, 2008 ~ CNI, Washington DC Lynne O’Brien Director, Academic.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
Only Connect: Better Use of Library, Publisher and End-User Metadata in a Networked World 31 st International Supply Chain Seminar Tuesday 13 th October,
Using Taxonomies Effectively in the Organization v. 2.0 KnowledgeNets 2001 Vivian Bliss Microsoft Knowledge Network Group
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Improving user engagement in a data repository with web analytics LITA Forum November 7, 2013 Heather CoatesSummer Durrant Digital Scholarship & Data Management.
Information Trends in Libraries Get More Value from Data Give More Value to Users Get Users involved July 9, 2007 Stuart Weibel Senior Research Scientist.
Jenn Riley Metadata Librarian IU Digital Library Program New Developments in Cataloging.
Interoperability through Library APIs Library Technology Services Open House 7/30/15.
SUMMON ® 2.0 DISCOVERY REINVENTED. What is Summon 2.0? A new, streamlined, modern interface New and enhanced features providing layers of contextual guidance.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
Aligning library-domain metadata with the Europeana Data Model Sally CHAMBERS Valentine CHARLES ELAG 2011, Prague.
Sakaibrary Project Update: Subject Research Guides and Next Steps Jon Dunn Indiana University July 2, 2008.
Jennifer Bowen, University of Rochester ALA Annual Conference, 2009, Chicago, Illinois 1 Defining Linked Data for the eXtensible Catalog (XC): Metadata.
ORGANIZATIONS AT THE MARGINS: PROSPECTS AND NEW DIRECTIONS Deanna B. Marcum July 20, 2002.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
10/07/2008 Semantic Web Technologies & Higher Education.
All the Reasons to be a Fan of PCC's Strategic Directions Shifting from Authorities to People, Places, Events, Awards… Steven Folsom | Metadata.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Cataloguing standards are evolving … still Renate Beilharz ALIA National Library and Information Technician’s Symposium 2015.
“A Library outranks any other one thing a community can do to benefit its people.” --Andrew Carnegie.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Empowering Libraries & Researchers in Russia Michael Leuschner.
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
Linked Data Best Practices and BibFrame December 15 th, 2015 Rob Sanderson (google doc) CNI 2015 F ALL F ORUM.
Discovering libraries’ gold through collection-level descriptions ELAG 2014, Bath Valentine Charles Data specialist.
BIBFRAME Update Session  Library of Congress pilot and development  Beacher Wiggins – Pilot project  Sally McCallum – Vocabulary development  A supplier’s.
Sally McCallum Library of Congress
1 E-Acquisitions Workflows and Management in Alma Network Zone.
23 rd Annual Innovative Users Group Conference April 13 th – 16 th 2015.
One Take At An LD4P Straw Architecture Crude, Unpolished, Ready for Beating Up.
Sarah Beaubien Grand Valley State University MLA Academic, May 2013.
CNI Spring 2016 Membership Meeting San Antonio TX Linked Data Implementations— Who, What and Why? Karen Smith-Yoshimura OCLC Research.
Catalogs, MARC and other metadata Kathryn Lybarger March 25, 2009.
Linked Library (+AM) Data Presented LITA Next-Generation Catalog IG Corey A Harper Publish, Enrich, Relate and Un-Silo.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Jason Kovari Head of Metadata Services | Cornell University Library New York Technical Services Librarians 2016 May 04 Linked Data Efforts.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
Xiaoli Li Co-head of Content Support Services
Summon® 2.0 Discovery Reinvented
VIVO: Faculty Research Information System and Discovery
Lifecycle Metadata for Digital Objects
Optimize your research performance using SciVal
An ecosystem of contributions
PREMIS Tools and Services
NSDL Data Repository (NDR)
Name authority control in an evolving landscape
Presentation transcript:

Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Linked Data for Libraries (LD4L) Andrew W. Mellon Foundation made a two- year $999K grant to Cornell, Harvard, and Stanford starting Jan 2014 Partners will work together to develop an ontology and linked data sources that provide relationships, metadata, and broad context for Scholarly Information Resources Leverages existing work by both the VIVO project and the Hydra Partnership

Specific LD4L Goals Free information from existing library system silos to provide context and enhance discovery of scholarly information resources Leverage usage information about resources Link bibliographic data about resources with academic profile systems and other external linked data sources Assemble (and where needed create) a flexible, extensible LD ontology to capture all this information about our library resources Demonstrate combining and reconciling the assembled LD across our three institutions

Bibliographic Data MARC MODS EAD Bibliographic Data MARC MODS EAD Person Data VIVO ORCID ISNI VIAF Person Data VIVO ORCID ISNI VIAF Usage Data Circulation Citation Curation Exhibits Research Guides Syllabi Tags Usage Data Circulation Citation Curation Exhibits Research Guides Syllabi Tags LD4L Data Sources

Agenda Simeon – Intro (done) and use cases Jon – Ontologies Steven – Modified Bibframe for LD4L Rebecca – UC2, mass conversion, post processing Lynette – UC1.1, ORE, scraping, lookups Naun – Preview of LD4P

LD4L Use Case Clusters 1.Bibliographic + curation data 2.Bibliographic + person data 3.Leveraging external data including authorities 4.Leveraging the deeper graph (via queries or patterns) 5.Leveraging usage data 6.Three-site services, e.g. cross-site search 42 Raw Use Cases 12 Refined Use Cases in 6 clusters…

Cluster 1. Bibliographic + curation data UC1.1: Build a virtual collection “As a faculty member or librarian, I want to create a virtual collection or exhibit containing information resources from multiple collections across multiple universities either by direct selection or by a set of resource characteristics, so that I can share a focused collection with a.”

UC1.1: Build a virtual collection Individual selection or resources Collection may be exhibit, reading list, etc. Includes description, ordering Shared or private Enable across institutions Well developed, uses annotations Demo => Lynette Rayle

UC1.2: Tag scholarly information resources to support reuse “As a librarian, I would like to be able to tag scholarly information resources from one or multiple institutions into curated lists, so that I can feed these these lists into subject guides, course reserves, or reference collections. I'd like these lists to be portable (into Drupal, into LibGuides, into Spotlight! or Omeka, into Sakai, e.g.) and durable. I'd like these lists/tags to selectively feed back into the discovery environment without having to modify the catalog records.”

UC1.2: Tag scholarly information resources to support reuse Scale O(100,000) items Use for e.g. virtual subject library Selection by rules and patterns Collaborative curation Feed data into discovery environment Same model as 1.1, partially developed CULLR v2

CULLR -> provide virtual library “views/collections” as part of main Blacklight discovery system

Cluster 2. Bibliographic + person data UC2.1: See and search on works by people to discover more works, and better understand people “As a researcher, I'd like to see / search on works by University faculty, to discover works of interest based on connection to people, and to understand people based on their relation to works.”

UC2.1: See and search on works by people to discover more works, and better understand people Profile + catalog data Perhaps between institutions: multiple catalogs + multiple profile systems Poor journal data suggests likely can do more in non-journal disciplines Demo => Rebecca Younes

UC3.1: Search with Geographic Data for Record Enrichment and Pivoting “As a researcher, I'd like to see the geographic context of my search results, and be able to pivot, extend or refine a search with a single click, in order to better assess found resources, find related resources, and filter or expand search results to broaden or narrow a search on the fly.”

UC3.x: Search with XYZ for Record Enrichment and Pivoting XYZ = geographic / subject /person data External data and entity linking Things not strings Different data types have different challenges and data sources

Cluster 4. Leveraging the deeper graph (via queries or Patterns) UC4.1: Identifying related works “As a scholar, I would like to find all the images associated with various instances of a work sorted by time, so that I can see how the depictions of or illustrations in a work have changed over time.” “(GloPAD specific): As a scholar, I would like to find all costume photographs and scene illustrations for various stagings and performances of the plays of a particular author or the operas of a particular composer, so that I can see how the visual look of performances of the plays or operas have changed over time.”

UC4.1: Identifying related works use complex graph relationships via queries or patterns (rather than direct connections allow discovery that would not be possible without the semantics of different relationship restricted by available data Steven Folsom work on complex and linked descriptions for hip-hop flyers

UC4.2: Leverage the deeper graph to surface more relevant works “As a researcher, I would like to see resources in response to a search where the relevance ranking of the results reflects the "importance" of the works, based on how they have been used or selected by others, so that I can find important resources that might otherwise be "hidden" in a large set of results.”

Cluster 5. Leveraging usage data UC5.1: Research guided by community usage “As a researcher, I want to find what is being used (read, annotated, bought by libraries, etc.) by the scholarly communities not only at my institution but at others, and to find sources used elsewhere but not by my community”

UC5.1: Research guided by community usage need to understand community of user – direct: auth and from identity? – inferred as part of discovery? cross-institutional – need usage data that can be shared and merged

UC5.2: Be guided in collection building by usage “As a librarian, I would like help building my collection by seeing what is being used by students and faculty.” “As a subject librarian, I would like to see what resources in my subject area are heavily used at peer institutions but are not in my institution’s collection.”

UC5.2: Be guided in collection building by usage business analytics help libraries make best use of collection building activities and funds useful at both institutional or cross- institutional levels cross-institutional – need usage data that can be shared and merged

Usage data Most work at Harvard

Cluster 6. Three site services UC6.1: Cross-site search As a scholar I don’t want my discovery process to be constrained by the collection boundaries of my university yet I want to retain the detailed coverage of special collections that are important in my field. Bonus points: I want results ranked by the scholarly value, not simply popularity in the public eye.

UC6.1: Cross-site search demonstrate the power of sharing library data as linked open data – opportunities for third parties – benefits for partners deal with dupes & near dupes, group in works Will have simple demo by end of project using all the LOD from our three institutions merged in LD4L ontology / Bibframe

LD4L Ontology Team Overview Metadata Working Group May 15, 2015 Jon Corson-Rikert

The team

Goals for an LD4L ontology framework (from the 2013 proposal) reuse appropriate parts of currently available ontologies while introducing extensions and additions where necessary be sufficiently expressive to encompass traditional catalog metadata from the 3 partners maintain compatibility with VIVO and research networking ontologies include usage and other contextual elements

Early LD4L discussions What library information is local vs. global? How do we add links to external identifiers, authorities, and real world objects (RWOs)? How do we link across our libraries? What existing ontologies should we use? What services and workflows do we need? Which services are (or aren’t) ready for prime time?

Other starting points LD4L is not about original cataloging (that’s LD4P) LD4L is not trying to reconcile the different approaches of Schema.org and BIBFRAME Linking to data beyond library catalogs is a focus – Predisposed to re-use ontologies appropriate to the domain involved Ontology work has focused on our use cases

MWG April 18, 2003

Ontology design Source: MK Bergmann,

Ontologies need to evolve

Ontology orthogonality

BIBFRAME and Schema.org in LD4L We considered the needs of our project and our libraries Both will likely need the greater expressiveness that BIBFRAME offers – Technical services units are actively participating in BIBFRAME training and trials Not an either/or proposition -- libraries also want broader discoverability – Value in exposing bibliographic metadata on the web with Schema.org tags

Conversations continue on the BF list … “Much of Bibframe’s problems seem to come from trying to keep everything from the past (MARC) while moving to something fit for the future, an ambition that has I think also afflicted RDA” (Thomas Meehan) “Why do the 'powers that be' think that we even want our local catalogs to be semantically connected to the web or have all of our data linked?!” (Michael Ayres) “The whole linking idea is great, but really, after 40 years using MARC21, some yahoo wants to unravel everything and bill me for it? I don’t think so.” (Jeffrey Trimble)

Simple core framework

Becomes more complex …

Gets downright messy …

Simple framework, ironing out details

More work in progress …

Local vs. global identifiers Establishing local identifiers allows libraries to make their own assertions about resources and authorities – Assigning stable linked data URIs are accessible anywhere Shared references to global identifiers enable both direct and indirect linkages OCLC, VIAF, ISNI, ORCID and others are addressing global identifiers at scale

Local linked data identifiers For library resources but also local or unique information on people, organizations, events, collections – There is value and credibility in the institutional namespace Supplement with locally-sourced and/or locally-targeted annotations – Not restricted to local generation or visibility Does not require exposing all operational data

From strings to things People Organizations Places Subjects Events Works Datasets Image: designyoutrust.com

Entity resolution Can potentially happen before, during, or after MARC to RDF conversion Can draw on existing authorities directly and indirectly – Local sources may involve custom workflows and services – Remote sources are likely shared and can more likely benefit from standardized services Assess potential to loop back – From non-MARC sources to other catalog resources – From external sources to local

Working with non-MARC metadata Faculty research profiles (CAP, Faculty Finder, VIVO) Library guides and other library-sourced web content Digital collections that vary widely in size and complexity, and that encompass diverse subject domains Pilot projects – Cornell’s Hip Hop flyers – Harvard’s Visual Image Access metadata

Usage data Inspiration – Harvard’s Stacklife Goals – To supplement library discovery interfaces – To inform collection review and additions Challenges – Data availability – Concerns for patron privacy Potential for a normalized stack score across institutions

Stacklife screenshot

LD4L and BIBFRAME: Then and Now Image credit: Steven Folsom Cornell University Library Metadata Working Group Forum May 15 th, 2015

Why is LD4L attempting to Use BIBFRAME? An ontology built by the Library of Congress as a replacement for MARC –If/when it solidifies, we will likely see data provided by libraries using this model Open Source tools available for converting MARC to BIBFRAME LD4L perspective is that BIBFRAME will likely be a starting point, with post-processing needed to meet some of our use cases Didn’t want to develop an ontology from scratch within the timeframe of the grant…

BIBFRAME Vocabulary Status (As of May 15, 2015, 11 am EST) Paused in its current/first version since January 2014 to allow the community to respond with feedback and so that tools could be built to test it in pilot projects Expecting a new version this spring based on feedback, more on this in a moment Structure Classes: Work, Instance, Authority and Annotation –bf:Work loosely conflates FRBR concepts of Works and Expressions –bf:Instance loosely aligns with FRBR Manifestion –No equivalent to FRBR Item yet in the vocabulary

What BIBFRAME Is. Image credits: and

Image credits: and What BIBFRAME Isn’t.

LD4L’s Role in Grounding BIBFRAME in Linked Data Best Practices Image credit:

BIBFRAME: Areas for Improvement Including, but not limited to…

Improvements: Reuse Existing Classes and Properties from Proven Ontologies bf:Person foaf:Person bf:Event schema:Event bf:subject dcterms:subject

Improvements: Provide Inverse Relationships For example: bf:hasPart, add bf:isPartOf bf:hasReproduction, add bf:reproduces) dcterms:creator, add bf:creatorOf

Improvements: Lessen the number of sub- properties that do not add semantics For Example: bf:titleValue doesn’t add more meaning between a title entity and its value. The more generalizable rdf:value works fine.

Improvements: Focus on Things not Strings

BIBFRAME (Post-Makeover)

Questions I want to answer going forward… How will descriptive practice change in with this more modular approach to the structure? –What changes to content standards will be needed? (Especially for related entities, e.g. People, Publishers…) What new data sources will the move to RDF allow for? (VIVO, MusicBrainz, Getty Vocabularies, ISNI) What types of entities can we describe better in RDF? (e.g. Awards, Events) How can we take advantage of URIs right now in MARC, while ensuring a smoother transition to linked data?