Traditional Linked DATA or Connect Your Data Your Way Library Technology Conference Macalester College, St. Paul, MN March 20-21, 2013
Authority Control =Cougar = Puma = Mountain Lion = Dostoevsky = Dostoyevskii = Достоевский The foundation of classic linked data is based on the concept of authority control. Authority Control, as you may remember from Library School, is the notion that we can bring together everything about a topic or by an author regardless of the words used to describe that topic or how the author’s name is represented.
Authority Control in the Catalog In the library world, authority control is done through authority records in the catalog.
Authority Control on the Web Traditional Linked Data makes use of “URIs” (Unique Resource Identifiers) to unambiguously identify people and concepts. URIs are: Stable (meant to last ‘forever’) Dynamic (no “headings” to change as wording changes) Language neutral Designed to provide links to other URIs
Examples of URIs http://viaf.org/viaf/104023256/ http://id.loc.gov/authorities/subject s/sh85109054.html Linking is accomplished through using additional URIs to make statements
Information at it’s most basic level In words we might say: “The Brothers Karamazov” was written by Dostoevsky In MARC we would say: 100 1 Dostoyevsky, Fyodor,|d1821-1881. 245 14 The brothers Karamazov With linked data we would say: URI (for The Brothers Karamazov) URI (for RDA hasCreator) URI (for Dostoevsky)
RDF triple example Subject Predicate Object The Brothers Karamazov hasCreator Dostoyevsky http://en.wikipedia.org/wiki/The_Brothers_Karamazov http://metadataregistry.org/schemaprop/show/id/1483.html http://viaf.org/viaf/104023256/
Future of Information Organization Information can be updated once in one place for everyone Information can be dynamically linked meaning web pages can pull “live” information from other sources on the web Human knowledge can grow exponentially as we make meaningful connections between our information silos
DBpedia DBPedia is Wikipedia represented as linked open data. By covering such a wide variety of topics and disciplines, it provides a rich and exciting opportunity for data harvesting and reuse. Using a computer query language called “SPARQL” you can pull specific information based on fields and hierarchies. So, for example, you could generate a list of people who were born on a specific date or towns in Minnesota with more than 10,000 residents. You could also use SPARQL to enrich information you already have. For example, if you have a large list of famous people you could query Wikipedia to find birthdates for all of them at once. In either case you could either pull the information once or even generate a “live” table of information for your web page that would stay up-to-date as Wikipedia changes. Freebase is a similar large and open repository of open data.
BBC Nature The BBC hosts multiple sites that combine the power of linked data with traditional web pages. Their wildlife finder pages are each identified by an http uri. Outgoing links connect each species, behavior and habitat to the corresponding resources in Dbpedia data set, and to BBC Programmes that depict these. This page is bringing in live information from Wikipedia, the World Wildlife Federation, and from elsewhere on the BBC sites. Notice that because of the use of URIs, the BBC is able to pull data onto this page from Dbpedia, even though it is filed under “cougar”.
BBC Here is another example from the BBC. This time from their music site. Again, they are bringing in information from Wikipedia, but you can also see just how rich they’ve made their site by providing URIs and meaningful links for all of their tv and radio programming, reviews, etc… They are also bringing in live information about band members from another linked data source called MusicBrainz. NY Times – internally published internal subject headings as linked data, interlinking these topics with Dbpedia, Freebase, and Geonames.
MusicBrainz http://musicbrainz.org MusicBrainz.org: Open Music Encyclopedia They have their own metadata schema for describing music, but what is of note here is that they are providing their data as open data on the web. And others can make use of that data, such as this developer did when he created (move to next slide) 6 degrees of black sabbath. http://musicbrainz.org
6 Degrees of black sabbath Application built to query MusicBrainz. To make the connections between the artists the developer relied on the relation data from MusicBrainz (e.g. member of band, is person, personal relationship, parent, sibling) This is only possible because MusicBrainz is using open, structured data. Take suggestions from the audience for a search. If no takers, use the following: Create a path from: Prince to Black Sabbath Six degrees refers to the idea that everyone is at most six steps away from any other person on Earth, so that a chain of, "a friend of a friend" statements can be made to connect any two people in six steps or fewer. Inspired by the Oracle of Bacon: Connecting actors to Kevin Bacon. http://oracleofbacon.org/cgi-bin/movielinks http://labs.echonest.com/SixDegrees
europeana http://www.europeana.eu/portal Linked Open Data has also begun to be used more heavily in the cultural heritage community. Europeana contains 20 million items from 2200 libraries, museums, archives including content from the British Library, Rijksmuseum in Amsterdam and the Louvre in Paris. From over 34 countries. They are aggregating cultural heritage objects from throughout Europe, but also creating linked data for these institutions. They recently announced their 20 million data set was freely available. (Creative Commons CC0 Public Domain Dedication, meaning that anyone can use the data for any purpose - creative, educational, commercial - with no restrictions. ) The Digital Public Library of America is looking at this same model to connect America’s cultural heritage content. http://www.europeana.eu/portal
Bio2RDF http://bio2rdf.org/ Finally, Linked data is proving to be exceptionally useful in the scientific community where enormous data sets have historically been stored and used separately between disciplines and specialties. Bio2RDF is one example of an attempt to link some of these silos. There are more than 15 million “statements” currently in the database linking species with genes with proteins with drugs and much more. http://bio2rdf.org/