Introduction to Linked Data

Introduction to Linked Data
Linked Data: what cataloguers need to know #cigld CILIP Cataloguing and Indexing Group (CIG) 25 November 2013 Thomas Meehan @orangeaurochs WHY LEARN ABOUT LINKED DATA Likely to be replacement for MARC (Bibframe) Even if not, is being used to openly publish bibliographic data on the web Being used by eg search engines for semantic results Because cataloguers can take some part in the discussion!

Linked Open Data Data Linked Open
Term often taken to means linked open data: Linked: not just text strings, eg hypertext, you can find out more by clicking on links Open: Freely available, licensed, re-usable, re-purposable Data: not just text like HTML which is a marked up document.

linked open DATA 245 00 $a Models for decision : $b
$b a conference under the auspices of the United Kingdom Automation Council organised by the British Computer Society and the Operational Research Society / $c edited by C.M. Berners-Lee. 260 __ London : English Universities Press, 1965. 300 x, 149 p. : ill. ; 23 cm. 504 Includes bibliographical references. 700 1_ Berners-Lee, C. M. Structured Labelled In a recognised format But No links: all the data is in text strings. If you want to find out anything more about these things, you have to get out of the system and search google or the lc authorities site. Arguably, the 700 is a link if you follow a recognised authority scheme. However, it's not an actionable link like an 856 field. You cannot follow the contents of that and find out more. It would, in fact, be hard to construct a URL from that which would go to anything meaningful. You have to get out of the system and search google or the lc authorities site. Not (necessarily) open, or at least easy to get at. Record sharing is common in library cataloguing, but licences are rare and access is through a z39.50 gate or reconstructed web pages. Not data as such but actually a record. None of the bits make sense in isolation. I'll talk a little more about MARC in particular this afternoon. I

LINKED open data.. This is linked in that it's hypertext: you can find out more by clicking on links, although only internally in this case. These links are still aimed at people: difficult for a computer, e.g. a search engine, to assess value. If we look at the source for this…..

…LINKED open data <table border="0" cellpadding="0" cellspacing="0"> <tbody> <tr id="bib-author-row"> <th>Author:</th> <td id="bib-author-cell"> <a href="/search?q=au%3ABerners-Lee%2C+C.+M.&qt=hot_author" title="Search for more by this author">C M Berners-Lee</a>; <a href="/search?q=au%3ABritish+Computer+Society.&qt=hot_author" title="Search for more by this author">British Computer Society.</a>; <a href="/search?q=au%3AInstitution+of+Electrical+Engineers.&qt=hot_author" title="Search for more by this author">Institution of Electrical Engineers.</a>; <a href="/search?q=au%3AOperational+Research+Society.&qt=hot_author" title="Search for more by this author">Operational Research Society.</a? </td> </tr> <tr id="bib-publisher-row"> <th>Publisher:</th> <td id="bib-publisher-cell">London : English Universities Press, [1965]</td> … </table> This is a snippet of the HTML from the previous page, specifically the part listing the authors and the publisher. It is all document based. The table is a means of display only. The th for Author is merely for human readability. The links go to other searches. The publisher information is wholly textual and there is no attempt to even split the elements. Someone like Mr Google could attempt to extract meaning from a page like this but it is unreliable at best. This is the battle that Google has been fighting since it started: how to extract meaning from web pages. This is one reason why Google are keen on linked data! Furthermore, the links that there are merely perform another search. Looking at webpage from the previous slide, you'll also note the copyright notice at the bottom!

linked OPEN data Here is an example of some beautiful linked data but I can't let you see it, search it, or use it. We can discuss terms later. Open means several things: Freely available Licensed to minimise restriction (see the whole open access question). A lot of the Cambridge linked data work revolved around this. Re-usable. If you can't reuse other people's data, the whole idea of linked data falls down, even if you can search it. Re-distributable Re-purposable. You can use it for purposes beyond its original intention. "Open data is information that is available for anyone to use, for any purpose, at no cost. Open data has to have a licence that says it is open data. Without a licence, the data can’t be reused. The licence might also say: that people who use the data must credit whoever is publishing it (this is called attribution) that people who mix the data with other data have to also release the results as open data (this is called share-alike)"—The Open Data Institute.

The Web of Data Use URIs as names for things
Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) Include links to other URIs so that they can discover more things. Tim Berners-Lee (2006) When people use the phrase Linked Data they are actually referring to a Web of Data compared to web of documents, using specific principles, i.e. open data in RDF. URI: URIs can be URLs or URNs. URLs can be http, ftp, etc. URNs are not web actionable HTTP: I.e. over the web. If you don't have http, you cannot easily go and look up more information. Useful info: Basically description, something about it, as on a web page you'd provide information in HTML, in linked data you provide information in RDF (of which more in a second). You can search it using SPARQL (of which more from Owen after lunch) Links: Crucial. You can find out more from other URIs, much as links on a web page allow references and explanations, and further information to be explored. Note: All this is independent of libraries and proceeds rather from the W3C. Linked data is not a formal W3C standard but RDF is, like HTML. The Web of Data is the basis of a semantic web, where meaning as well as text means that computers can make sense of it and act on it. Understanding RDF is important to understanding linked data and that's what I'm going to concentrate on for the rest of this session.

Brideshead Revisited was written by Evelyn Waugh.
English sentence Brideshead Revisited was written by Evelyn Waugh. This is a simple English sentence. It has a subject "Brideshead revisited" (the book) It has an object "Evelyn Waugh" It has a predicate "was written by" This is text. How can we turn this into data?

Entities and Relationship
Brideshead revisited created by Evelyn Waugh We'll start with dividing it into Entities and relationships, similar to the modelling behind FRBR: In this case, a Work, a Person and a relationship. These are still English text, ambiguous, unidentified, and not linked.

Adding URIs: Brideshead revisited
created by Evelyn Waugh Next, we can start to replace textual names of things with URIs. First, we'll give the book a URI. The book Brideshead Revisited is a resource, not in the RDA sense but anything that can be given an identity. We identify it in RDF using a URI.

Adding URIs: Waugh created by Second, the author Evelyn Waugh, the object.

Adding URIs: creator http://id.loc.gov/authorities/names/no97080492
Lastly we can add a URI for the predicate, the relationship. In linked data, even the relationships are established or authorised, not just names and works and subjects. Everything.

RDF Statement <http://id.loc.gov/authorities/names/no97080492>
< < . Third, the creator relationship. This is now a what is called a RDF Triple and, with some punctuation, a valid piece of RDF! However, it is split over three lines. To make this easier to read, especially when there are lots of triples, we can write this out in a different way: Turtle.

RDF (Turtle) @prefix lc_names: < dc: < . lc_names:no dc:creator lc_names:n The first two lines are actually an attempt to make this easier to read. After the it gives a prefix (could be anything: it's just to make it easier to read; lastly is the base of a URI) The triple itself now all fits more easily on one line. This is a Triple, which is the foundation of RDF. It is an assertion (not necessarily a fact). Subject Predicate Object It is important and significant that this lone triple stands alone as a piece of data. It doesn't need to be part of a record as such. We can follow the URIs to find out more. Triples can get quite a lot more complicated. There are ways of saying more nuanced things. In fact, one problem some have with efforts like Bibframe is that the abstraction goes too far. There are also some obvious problems with this: Provenance. Without a record, how can we demonstrate who said this, how reliable it is. There are ways round this and initiatives to do so. Complexity. Redundancy. What if DC and LC shut up shop or change their minds? I promise not to put a lot of XML in this presentation but I will mention triples a lot in this format as they really are fundamental to RDF.

Brideshead Revisited @prefix lc_names: < lc_languages: < dc: < . lc_names:no dc:creator lc_names:n ; dc:created "1945" ; dc:extent "1 volume" ; dc:language lc_languages:eng ; dc:title "Brideshead revisited" ; dc:type < . A fuller description using DCMI terms. Keep thinking of it being in three columns. The subject column is not repeated here because the subject is the same for all the triples. The predicates and objects change. This says…. {run through triples}

Brideshead Revisited @prefix lc_names: < lc_languages: < dc: < . lc_names:no dc:creator lc_names:n ; dc:created "1945" ; dc:extent "1 volume" ; dc:language lc_languages:eng ; dc:title "Brideshead revisited" ; dc:type < . Now, each of those URIs can of course be followed, and I've picked one out to follow. If we follow the link (or screenshot), we can see what information the URI gives us. As we are using a browser, the URI resolves to a HTML page. If we were a computer programme we could request this page as data. We can do so through the web page too by clicking on one of the links at the bottom. We'll get a page in what is called N Triples.

LC Authorities Linked Data in HTML (screenshot)

LC Authorities Linked Data in N Triples (screenshot)

LC Name Authority for Waugh (excerpt)
@prefix lc_names: < rdf: < mads: < viaf: < . lc_names:n rdf:type mads:PersonalName ; rdf:type mads:Authority ; mads:authoritativeLabel "Waugh, Evelyn, ; mads:hasExactExternalAuthority viaf: This is an excerpt of the LC Name Authority linked data converted to Turtle.

RDF Serialisations N Triples
< < < < < < < < "Waugh, Evelyn, < < < N Triples: Simple. Easy to see the triples and how many there are. Hard to read each element. Hard to fit it on a page!

RDF Serializations Turtle and N3
@prefix lc_names: < rdf: < mads: < viaf: < . lc_names:n rdf:type mads:PersonalName ; rdf:type mads:Authority ; mads:authoritativeLabel "Waugh, Evelyn, ; mads:hasExactExternalAuthority viaf: Easy to read and fit on a screen although Longer for snippets due to the prefixes. Also, syntax can get more complicated to cope with abbreviation.

RDF Serialisation RDF/XML
<?xml version="1.0"?> <rdf:RDF xmlns:lc_names=" xmlns:mads=" xmlns:rdf=" xmlns:viaf=" <mads:PersonalName rdf:about=" <rdf:type rdf:resource=" /> <mads:authoritativeLabel xml:lang="en">Waugh, Evelyn, </mads:authoritativeLabel> <mads:hasExactExternalAuthority rdf:resource=" /> </mads:PersonalName> </rdf:RDF> Originally the only RDF format Often confused with RDF itself Easy for computers to read Very hard for people to read!!

RDF Serialisation JSON-LD
{ " { " [ "type": "uri", "value": " } ], " [ "value": " }, "value": " " [ "lang": "en", "type": "literal", "value": "Waugh, Evelyn, " ] JSON (Javascript Object Notation) is increasingly favoured by programmers. It uses the same data structures as Javascript so can be dropped easily into a programme. It is also easy for other programming languages to use and is not even limited to RDF or even Javascript. {Curly brackets for Objects} [Square brackets for arrays]

RDF Serialisation RDFa
<div xmlns=" prefix=" rdf: mads: rdfs: > <div typeof="mads:PersonalName" about=" <div rel="rdf:type" resource=" <div property="mads:authoritativeLabel" xml:lang="en" content="Waugh, Evelyn, "></div> <div rel="mads:hasExactExternalAuthority" resource=" </div> I.e. for embedding into HTML pages

Microdata, RDFa, Schema.org
OCLC Worldcat uses embedded Schema.org:

Schema.org RDFa on Worldcat
Linked eg hypertext, you can find out more by clicking on links, although only internally. These links are still aimed at people: difficult for a computer, e.g. a search engine, to assess value. Copyright at the bottom!

<div xmlns=" id="microdata" prefix="xml: skos: library: gen-ont: pto: madsrdf: void: schema: oclc: rdf: umbel: bibo: foaf: cc: awol: owl: dct: blterms: rdfs: resource=" typeof=" href=" border="0" cellspacing="0"><tr><td><a href=" property="library:oclcnum"> </span>"</td></tr><tr><td><a href=" property="library:placeOfPublication" typeof=" border="0" cellspacing="0"><tr><td><a href=" href=" property="rdf:type" resource=" href=" property="schema:name">London</span>"</td></tr></table></div></td></tr><tr><td><a href=" property="library:placeOfPublication" resource=" typeof=" href=" border="0" cellspacing="0"><tr><td><a href=" href=" property="rdf:type" resource=" href=" href="info:oclcnum/ " property="owl:sameAs" resource="info:oclcnum/ ">info:oclcnum/ </a>></td></tr><tr><td><a href=" href=" property="rdf:type" resource=" href=" property="schema:about" typeof=" border="0" cellspacing="0"><tr><td><a href=" href=" property="madsrdf:isIdentifiedByAuthority" resource=" href=" href=" property="rdf:type" resource=" href=" property="schema:name">Electronic data processing.</span>"</td></tr></table></div></td></tr><tr><td><a href=" property="schema:about" resource=" typeof=" href=" border="0" cellspacing="0"><tr><td><a href=" href=" property="rdf:type" resource=" href=" property="schema:about" resource=" typeof=" href=" border="0" cellspacing="0"><tr><td><a href=" href=" property="rdf:type" resource=" href=" property="schema:name">Public administration--Data processing</span>"</td></tr></table></div></td></tr><tr><td><a href=" property="schema:about" resource=" typeof=" href=" border="0" cellspacing="0"><tr><td><a href=" href=" property="rdf:type" resource=" href=" property="schema:name">Electronic data processing</span>"</td></tr></table></div></td></tr><tr><td><a href=" property="schema:about" typeof=" border="0" cellspacing="0"><tr><td><a href=" href=" property="madsrdf:isIdentifiedByAuthority" resource=" href=" href=" property="rdf:type" resource=" href=" property="schema:name">Public administration--Data processing.</span>"</td></tr></table></div></td></tr><tr><td><a href=" property="schema:contributor" resource=" typeof=" href=" border="0" cellspacing="0"><tr><td><a href=" href=" property="madsrdf:isIdentifiedByAuthority" resource=" href=" href=" property="rdf:type" resource=" href=" property="schema:name">British Computer Society.</span>"</td></tr></table></div></td></tr><tr><td><a href=" property="schema:contributor" resource=" typeof=" href=" border="0" cellspacing="0"><tr><td><a href=" href=" property="madsrdf:isIdentifiedByAuthority" resource=" href=" href=" property="rdf:type" resource=" href=" property="schema:name">Operational Research Society.</span>"</td></tr></table></div></td></tr><tr><td><a href=" property="schema:contributor" resource=" typeof=" href=" border="0" cellspacing="0"><tr><td><a href=" href=" property="madsrdf:isIdentifiedByAuthority" resource=" href=" href=" property="rdf:type" resource=" href=" property="schema:name">Institution of Electrical Engineers.</span>"</td></tr></table></div></td></tr><tr><td><a href=" property="schema:contributor" resource=" typeof=" href=" border="0" cellspacing="0"><tr><td><a href=" href=" property="rdf:type" resource=" href=" property="rdfs:label">Berners-Lee, C. M.</span>"</td></tr></table></div></td></tr><tr><td><a href=" property="schema:datePublished">1965</span>"</td></tr><tr><td><a href=" property="schema:inLanguage">en</span>"</td></tr><tr><td><a href=" property="schema:name">Models for decision : a conference under the auspices of the United Kingdom Automation Council organised by the British Computer Society and the Operational Research Society</span>"</td></tr><tr><td><a href=" property="schema:numberOfPages">149</span>"</td></tr><tr><td><a href=" property="schema:publisher" typeof=" border="0" cellspacing="0"><tr><td><a href=" href=" property="rdf:type" resource=" href=" property="schema:name">English Universities Press</span>"</td></tr></table></div></td></tr></table></div></div> Here are the N-triples…

Worldcat Schema.org data for a book
@prefix rdf: < schema: < worldcat: < library: < viaf: < lc_authorities: < mads: < . worldcat: rdf:type schema:Book; library:oclcnum " "; schema:name "Models for decision : a conference under the auspices of the United Kingdom Automation Council organised by the British Computer Society and the Operational Research Society"; library:placeOfPublication _:1; schema:publisher _:4 . schema:datePublished "[1965]"; schema:numberOfPages "149"; schema:contributor viaf: ; schema:contributor viaf: ; schema:contributor viaf: ; schema:contributor viaf: ; _:1 rdf:type schema:Place; schema:name "London :" . _:4 rdf:type schema:Organization; schema:name "English Universities Press" . viaf: madsrdf:isIdentifiedByAuthority lc_authorities:n ; schema:name "British Computer Society." . viaf: madsrdf:isIdentifiedByAuthority lc_authorities:n ; schema:name "Operational Research Society." . viaf: madsrdf:isIdentifiedByAuthority lc_authorities:n ; schema:name "Institution of Electrical Engineers." . viaf: rdf:type schema:Person; schema:name "Berners-Lee, C. M." . ( : click Get Sample Data (OCLC)) The viaf id for Berners-Lee is followable: click on it and you get the VIAF record. Choose the RDF view to see the underlying RDF.

Lots of Ways To Do It @prefix schema: < . @prefix dc: < . @prefix viaf: < . @prefix rda_roles: < . @prefix cam: < . @prefix bnb_person: < . @prefix foaf: < . example:book0001 dc:creator cam:cambrdgedb_eeacef63d900c2acffc3daa400f3d4e4 . example:book0001 dc:creator bnb_person:WaughEvelyn example:book0001 schema:creator viaf: example:book0001 rda_roles:creator viaf: example:book0001 dc:creator lc_names:n example:book0001 dc:creator _:bnode001 . _:bnode001 foaf:name "Waugh, Evelyn, " . example:book0001 example:author example:author0001 These triples or pairs of triples all make the same assertion but use different vocabularies and different uris to do so. The sixth one asserts the creator's name as a string. This can't be done directly as dc:creator needs a resource- a URI- not a literal string.

Blank Nodes @prefix lc_names: < dc: < foaf: < . lc_names:no dc:creator _:bnode01 . _:bnode01 a foaf:Person . _:bnode01 foaf:name "Evelyn Waugh" . lc_names:no dc:creator [ a foaf:Person ; foaf:name "Evelyn Waugh" ] .

Introduction to Linked Data
@prefix foaf: < . @prefix dc: < . @prefix bibo: < . cigld_intro dc:creator _:bnode001 ; dc:created "2013" ; dc:title "Introduction to Linked Data" ; bibo:Series "Linked data: what cataloguers need to know" . _:bnode001 a foaf:person ; foaf:name "Thomas Meehan" ; foaf:mbox ; foaf:account _:bnode002 . _:bnode002 a foaf:OnlineAccount ; foaf:accountServiceHomepage " ; foaf:accountName .

References Worldcat record for Models for decision / C. Berners-Lee. What is open data? / The Open Data Institute. Linked Data : design issues / Tim Berners-Lee.

Introduction to Linked Data

Similar presentations

Presentation on theme: "Introduction to Linked Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Linked Data

Similar presentations

Presentation on theme: "Introduction to Linked Data"— Presentation transcript:

Similar presentations

About project

Feedback