Practical Application of Linked Data Music Library Association Annual Meeting 2016
Today’s Speakers Kimmy Karen Steven James Soe
Music Library Association Annual Meeting March 4, 2016 Practical Applications of Linked Data Are We There Yet? Kimmy Szeto
Metadata Building Blocks Kimmy Szeto Practical Applications of Linked Data: Are We There Yet? MLA Annual Meeting March 4, 2016 Data Model Relating Resources and Metadata key-value, RDF, etc. Content Rules Extracting information AACR2, RDA, CCO, DACS, etc. Schema Organizing the Information MARC, ONIX, DC, EAD, etc. Exchange Query, Retrieval, Transmission Z39.50, SRU, SQL, SPARQL, etc. Serialization Notating the Structured Information ISO 2709, XML, JSON, Turtle, etc.
Tim Berners-Lee Linked Open Data Kimmy Szeto Practical Applications of Linked Data: Are We There Yet? MLA Annual Meeting March 4, 2016 Use URI as identifiers Use HTTP URIs for look up Use standards (RDF/SPARQL) Link to other URIs Link statements to form trees and networks b.info/roles/ authorWork orities/names/n b.info/Elemen tsGr2/nameO fThePerson “1913” b.info/Eleme ntsGr2/date OfBirth “The holy sonnets of John Donne” info/Elements/ preferredTitleF orTheWork “Britten, Benjamin” subject object predicate resource value property
Linked Open Data Building Blocks Kimmy Szeto Practical Applications of Linked Data: Are We There Yet? MLA Annual Meeting March 4, 2016 Data Model Schema Records RDF Statements Content Rules Free Text Data + URI Exchange Open Standard Serialization Open Standard Bibframe
Mix and Match Kimmy Szeto Practical Applications of Linked Data: Are We There Yet? MLA Annual Meeting March 4, 2016 schema.org Bibframe foaf LD4L MusicOntology Music Event DBpedia
Linked Jazz: The Data Sessions
Backgroun d Art Kane, A Great Day in Harlem, 1958
Backgroun d Emmett BerryLawrence Brown Marian McPartland Sonny Rollins Thelonious Monk Mary Lou Williams Art Kane, A Great Day in Harlem, 1958
Backgroun d The constellation of people represented in the Linked Jazz network
Leveraging linked data combine and extend relationships Backgroun d Representing relationships in jazz Analyzing linked data extracted RDF to express Sharing our data new ways
Process & Tools What resources do we use to define relationships? How is our linked data generated?
Process & Tools Interview with Vi Redd by Monk Rowe, 1999 Hamilton College Jazz Archive Transcripts of oral histories from jazz collections around the country
Process & Tools Transcript Analyzer (TA) for machine-assisted identification and reconciliation of name entities Interview with Mary Lou Williams by John S. Wilson, 1973 Rutgers, Institute of Jazz Studies
Process & Tools Name control: LOD resources
Process & Tools Name mapping allows the relationship to be automatically established in RDF Mary Lou Williamstalks about a Buck. talks about < < <
Process & Tools Relationships derived from all transcripts visualized in an interactive network graph A new way to explore resources…
Process & Tools Linked Jazz triple data: Under the Hood
Process & Tools Dynamic Ego Networks
Process & Tools Interview with Buster Williams by Monk Rowe, 2002 Hamilton College Jazz Archive
Process & Tools “…Betty Carter who always sang out of tune but you still better play in tune… It was sort of like an art form for her.” “…she was the consummate canned heat. She was like a can of Sterno. You open it up and you put a flame to it and you get this beautiful blue flame.” Interview with Buster Williams by Monk Rowe, 2002 Hamilton College Jazz Archive
Service s Ways we provide access to our data: API NetworkGraphs SPARQL
Service s Dereferencing Pages for Name Entities
Service s Dereferencing Pages for Name Entities assigned URI in the Linked Jazz namespace two musicians mentioned him: “is rel:knowsOf of 2 resources”
LOD Experiments What does our Linked Open Data enable us and other usersto do?
LOD Experiments Interlink our data with other LOD datasets to build custom datasets
LOD Experiments Experimental interlinking examples
LOD Experiments Experimental interlinking examples Exploratory network visualization of musicians in Linked Jazz transcript data and Carnegie Hall performance data For more information: by Molly Reese-Lerner and Hannah Sistrunk
LOD Experiments Enrich our entities with attributes from other LOD resources to create new ways to understand the data
LOD Experiments Using Linked Data to write loops to query gender data from other LOD resources Toshiko Akiyoshi owl:sameAs (‘zitgist’) dcterms:subject dbpedia-owl:viafId F F “a” = F For more information:
LOD Experiments Storing the queried data for evaluation and use
LOD Experiments New view of network through a gender lens
LOD Experiments Roy Haynes’ transcript visualized with gender encoding
LOD Experiments Mary Lou Williams’ transcript visualized with gender encoding
Crowdsourcing semantic refinement of relationships on 52nd Street Publishing the Linked Jazz ontology Interlinking with other resource types: Tulane University jazz photo collections William P. Gottlieb collection, Library of Congress Adding new attributes to our dataset, e.g. “instrument”, “date of birth”, “place of death” Ongoing and Future Projects Direction s
Find us at: linkedjazz.orglinkedjazz.org
STEVEN FOLSOM HARVARD LIBRARY CORNELL UNIVERSITY LIBRARY) MUSIC LIBRARY ASSOCIATION, 2016 ANNUAL MEETING LINKING HIP HOP PARTY AND EVENT FLYERS TO THE SEMANTIC WEB
DISCLAIMER Image credit:
ABOUT THE HIP HOP FLYERS All images of Hip Hop Flyers in this presentation are courtesy of the Cornell Hip Hop Collection
LD4L
USE CASE 4 The essence of this use case is making use of complex graph relationships via queries or patterns (rather than direct connections) to allow discovery that would not be possible without the semantics of different relationships between items and types of items included in the graph. User stories and demonstrations will be somewhat tied to available data because detailed information and relationships will not be available for all resources.
PILOT: LINKING HIP HOP FLYER METADATA TO MUSICBRAINZ/LINKEDBRAINZ DATA Model non-MARC metadata from Cornell Hip Hop Flyer Collection to RDF Test BIBFRAME for describing the flyers Test the use of other ontologies for describing other entities, e.g. events, venues (more on this in a moment) Use of LinkedBrainz URIs for performers to discover relationships to other entities to discover relationships to other entities… (On and on to da break of dawn)
HIP HOP FLYER METADATA
ONTOLOGY DECISIONS Describe the flyer in BIBFRAME, extend where needed Used Getty AAT W orktypes to create bf:Work sub-classes Describe events and related entities using MusicOntology, Event Ontology and Schema.org Use foaf:Person’s to reflect RWO persons, with bf:Person as an associated authority Same pattern for other bf:Authority subclasses
ONTOLOGY DECISIONS: BIBFRAME FOR FLYERS
ONTOLOGY DECISIONS: FOAF FOR PERSONS
ONTOLOGY DECISIONS: EVENTS AND PERFORMERS
MUSICBRAINZMUSICBRAINZ AND LINKED BRAINZ
TYING THIS TO EXTERNAL GRAPHS When we have a MusicBrainz URI for instances of mo:MusicArtist we can query for relationships to other entities and properties of these new entities.
BRUTE FORCE RECONCILIATION Normalized Labels using Open Refine Manually searched for MusicBrainz for entries for a subset of literals (many of these were derivations for the same performer) Found roughly 250 URL’s for entries in MusicBrainz Ultimately surfacing 115 unique corresponding LinkedBrainz URIs for the proof of concept
PULLING DATA FROM LINKEDBRAINZ.ORG CONSTRUCT { ?s ?p1 ?o1. ?o1 ?p2 ?o2. } WHERE { ?s ?p1 ?o1. ?o1 ?p2 ?o2. FILTER ( ?s = ) # Eliminate guid property FILTER ( ?p1 != ) FILTER ( ?p2 != ) # Eliminate Tracks FILTER ( NOT EXISTS { ?o1 a.} ) FILTER (NOT EXISTS { ?o2 a.} ) }
LINKEDBRAINZ.ORG CONTINUED { [ { " " : [ " b68554c98#_" ], " : [ " ], " : [ " a4ac-3fb8-9c29-fdaf1c429212#_", " " " ] }, { " : [ " ], " : [ { : "United States" } ], " : [ " " ], " : [ { : "United States" } ] }, { " bfd56daad006#_", : [ " ], " : [ { : "Def Jam / Cold Chillin\u2019 in the Spot", : " } ], “
MAPPING METADATA TO RDF USING ISI’S KARMA
RECONCILING MO:RELEASE WITH BF:AUDIO, ETC.
REMAINING WORK IN FEBRUARY 2015 Continue Metadata clean up and RDF conversion Post Processing More Reconciliation Add data to a visualization/ discovery layer
2015 TAKEAWAYS FROM FLYERS PILOT Able to map large parts of our metadata to RDF using multiple ontologies to discover more relationships to more entities Largely predicated on manual workflows for preprocessing, URI lookups, and unstable software for RDF creation Need more URI’s, for both linking to and linking from in order to take advantage of queries and patterns Yes it is possible to describe flyers and related entities using BIBFRAME 1.0, but do we want to…
ONE YEAR LATER... Image credit:
A YEAR LATER: STATUS UPDATE Largely dormant because focus turned to: LD4L Ontology/BIBFRAME 2.0 BIBFRAME to LD4L Ontology Post- processing LD4L has made some decisions on local URIs and Infrastructure **New Metadata Librarian** focusing on batch remediation, interoperability, reconciliation at Cornell
A YEAR LATER: REVISITING THE PROCESS LinkedBrainz- Efforts are being made to improve performance, but… A possible Side B: LOD Laundromat Laundry Basket Wardrobe Analytics LOTUS
A YEAR LATER: KARMA Karma is still great if you can get it installed ISI has implemented a Virtual Box option for work with Karma Side B: Considering how to make a business case for “best of breed” Converters to RDF Infrastructures that go beyond pilot
PARAPHRASING JIM HENDLER (WHO MIGHT BEEN PARAPHRASING SOMEONE ELSE) Saying that we can do the same things, only now it’s more difficult… Isn’t much of a sales pitch. With core ontology decisions decide we can now build/adapt tools that make it easier. Better RDF Converters Better RDF Reconciliation Better RDF Native Cataloging Tools *Actually meet uses cases previously unmet!*
Image credit: Discogs