Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linked data and the implications for library cataloguing: metadata models and structures in the Semantic Web Gordon Dunsire Presented at the Canadian Library.

Similar presentations


Presentation on theme: "Linked data and the implications for library cataloguing: metadata models and structures in the Semantic Web Gordon Dunsire Presented at the Canadian Library."— Presentation transcript:

1 Linked data and the implications for library cataloguing: metadata models and structures in the Semantic Web Gordon Dunsire Presented at the Canadian Library Association Annual Conference, 26-29 May 2011, Halifax, Nova Scotia

2 Outline  Context: evolution of the catalogue record  RDF 101  Library metadata models/schemas in RDF  FRBR, RDA, ISBD, DCT, BiBO,...  From record to triples: worked example

3 A short history of the evolution of the library catalogue record

4 Lee, T. B. Cataloguing has a future. - Audio disc (Spoken word). - Donated by the author. 1. Metadata In the beginning...... the catalogue card

5 Author: Title: Content type: Provenance: Subject: Lee, T. B. Cataloguing has a future Spoken word Audio disc Metadata Donated by the author Carrier type: From flat-file record...... to relational record Name: Biography:... Name authority Term: Definition:... Subject authority Bibliographic description

6 Author: Title: Content type: Provenance: Subject: Lee, T. B. Cataloguing has a future Spoken word Audio disc Metadata Donated by the author Carrier type: From flat-file description...... to FRBR record Name: Biography:... Name authority Term: Definition:... Subject authority Bibliographic description Item Manifestation Author: Content type: Subject: Spoken word Expression Work

7 Lee, T. B. Metadata From FRBR record...... to extinction! Name: Name authority Term: Subject authority Item Manifestation Expression Work Provenance: Donated by the author Subject: Author: Title: Cataloguing has a future Content type: Spoken word Audio disc Carrier type: Term: RDA content type Term: RDA carrier type Donor: Title: Amazon/Publisher

8 Where is the record?  Implicit, not explicit  Everywhere and nowhere  A semantic Web will allow machines to create the record just-in-time  We will not have to maintain records just-in-case  The user will have control over the presentation  I want to see an archive or library or museum or Amazon or Google or Flickr or ? display  And by avoiding duplication, we can all get on with describing new stuff...

9 The hyperdimensional (Tardis) card Lee, T. B. Cataloguing has a future. - Audio disc (Spoken word). - Donated by the author. 1. Metadata Audio shop Lee Museum Spoken word archive W3C Library “TARDIS four port USB hub, for office-bound Time Lords: Open a time vortex on your desk” – Pocket-lint

10 RDF 101

11 Semantic Web  “machine-readable metadata”  Faster! 24/7/365! Global!  Metadata expressed as “atomic” statements  A simple, single, irreducible statement  The title of this book is “Treasure island”  In a standard machine-processable format  Resource Description Framework (RDF)

12 Resource Description Framework  Metadata statement constructed in 3 parts  “Triple”  The title of this book is “Treasure island”  Subject of the statement = Subject: This book  Nature of the statement = Predicate: has title  Value of the statement = Object: “Treasure island”  This book – has title – “Treasure island”  subject – predicate - object

13 Identifiers  Need unambiguous way of identifying each part of the triple for efficient machine- processing  Human labels (“This book”, “has title”) no good  Same thing, different labels; different things, same label  Exploit the utility of the URL  Machine-readable, regular syntax, unambiguous  Uniform Resource Identifier (URI)

14 Uniform Resource Identifier  Can be any unique combination of numbers and letters  No intrinsic meaning; it’s just an identifying label  Can look like a URL  http://iflastandards.info/ns/isbd/elements/P1001  But does not lead to a Web page (in principle...)  RDF requires the subject and predicate of triple to be URIs  Object can be a URI, or a literal string (“Treasure island”)

15 Namespaces  URI can be constructed from a base plus a unique, identifying suffix  http://iflastandards.info/ns/isbd/elements/  + P1001  Base is known as a namespace  Can be abbreviated by human programmer  “isbd” = http://iflastandards.info/ns/isbd/elements/http://iflastandards.info/ns/isbd/elements/  isbd:P1001  Machine expands abbreviation for processing

16 Everything as triples in RDF  Every aspect of the metadata must be expressed in RDF to be machine-processable  Metadata about real-world objects (books, people, etc.)  Metadata about the predicates (definition, label, scope, etc.)  Common predicates apply to many types of thing (human-readable label, etc.)  High-level RDF namespaces (rdfs, owl)  RDF is expressed in RDF (“bootstrap”)

17 Library namespaces

18 Creating namespaces and URIs  FRBR/FRAD/FRSAD, ISBD, and RDA are using the Open Metadata Registry  Can assign a running “number” to the base to create a new URI  Set of properties for creating basic triples  Properties = predicates  rdfs:label for assigning a human-readable label to the subject  isbd:P1001 - rdfs:label - “has content form”

19

20

21

22

23 SubjectPredicateObject isbd:P1001rdfs:label“has content form”

24

25

26

27

28 SubjectPredicateObject isbdcf:T1008skos:prefLabel“spoken word”

29

30

31 Application profile  Need a way to specify how a useful “record” can be constructed from RDF triples  Which triples are involved, and from which namespaces?  Sequence? Repeatable? Mandatory?  Sub-component aggregations  Publication statement = place + name + date  Content rules?

32 MandatoryNot repeatableAggregation of simpler elements Syntax of aggregation (punctuation)

33 Getting triples from records

34 Linking Open Data cloud (LOD) Diagram by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

35 LOD: “Library” corner

36 Why get involved?  To share our data  We work for “society”  To share our expertise and experience  150 + years  To promote the power of libraries (and archives and museums)  To survive

37 From record to triples (in 9 stages)  Very large numbers of records  Catalogue records, finding aids, etc.  300 million; 1 billion?  High quality metadata  In comparison with other communities  Each record may generate many triples  200 “raw” triples (no inferences) per MARC record?  Very, very large numbers of triples  Billions? Trillions?

38 1. Take a record Field/attributeValue Record ID54321 TitleMuseum archives: an introduction AuthorWythe, Deborah Date2004 LCSHMuseum archives Media/GMDElectronic Content formText

39 2. Disaggregate to single statements RecordAttributeValue 54321(has) titleMuseum archives: an introduction 54321(has) authorWythe, Deborah 54321(has) date2004 54321(has) LCSHMuseum archives 54321(has) media typeElectronic 54321(has) content formText

40 3. Create URI for record  Must be unique, so 54321 no good on its own  http URIs are a good thing (W3C)  So add record ID to a unique http domain  E.g. http://MyLibraryX.com (unique to the library)  + 54321  http://MyLibraryX.com/54321  (or http://MyLibraryX.com#54321)  This is not a URL!

41 4. Replace record ID with URI URIAttributeValue mlx:54321(has) titleMuseum archives: an introduction mlx:54321(has) authorWythe, Deborah mlx:54321(has) date2004 mlx:54321(has) LCSHMuseum archives mlx:54321(has) media typeElectronic mlx:54321(has) content formText “mlx” = qname (xmlns) = shorthand for “http://MyLibraryX.com/”

42 5. Find URIs for attributes  Attributes are modelled as RDF properties (predicates) in “element set” namespaces  E.g. Dublin Core terms (dct); ISBD (isbd); FRBR (frbrer); RDA (rdaxxx); Bibliographic Ontology (bibo); etc.  Choose a namespace, find property with same (or closest) “meaning” (e.g. definition) as attribute  Nearest property minimises loss of information  Get URI for property  If no suitable property, choose another namespace  Properties do not have to come from single namespace  Match and mix!

43 5 (cont). Find URI for title  http://purl.org/dc/terms/title (dct:title)  http://iflastandards.info/ns/isbd/elements/P1 014 (isbd:P1014)  hasTitleProper  http://RDVocab.info/Elements/titleProper (rd aGR1:titleProper)

44 5 (cont). Find URI for author  dct:creator  rdarole:author  (isbd does not cover “headings”)

45 5 (cont). Find URI for date  dct:date  isbd:P1018  hasDateOfPublicationProductionDistribution  rdaGr1:dateOfPublication

46 5 (cont). Find URI for LCSH  LCSH is a subject vocabulary  Controlled terms  So attribute is really “subject”  And the term itself is the value  dct:subject

47 5 (cont). Find URI for media type  Assuming record uses new ISBD Area 0...  isbd:P1003  hasMediaType

48 5 (cont). Find URI for content form  Assuming record uses new ISBD Area 0...  isbd: P1001  hasContentForm

49 6. Replace attributes with URIs URI Value mlx:54321isbd:P1014Museum archives: an introduction mlx:54321rdarole:authorWythe, Deborah mlx:54321isbd:P10182004 mlx:54321dct:subjectMuseum archives mlx:54321isbd:P1003Electronic mlx:54321isbd:P1001Text

50 7. Find URIs for values  If object of a triple is a URI, it can link to the subject of another triple with the same URI  Linked data!  Values from controlled vocabularies may have URIs  Possible vocabularies: author, subject, ISBD Area 0  NOT: title, date  For author: Virtual International Authority File (VIAF)  For LCSH: Library of Congress Authorities & Vocabularies  For ISBD Area 0: Open Metadata Registry

51 7 (cont). Find URI for author  Author: Wythe, Deborah  VIAF: http://www.viaf.org/  viaf:31899419/#Wythe,+Deborah

52 7 (cont). Find URI for subject (LCSH)  LCSH: Museum archives  LoC: http://id.loc.gov/authorities/  lcsh:/sh85088707#concept

53 7 (cont). Find URIs for ISBD Area 0  Media type: Electronic  ISBD media type  isbdmt:T1002  Content form: Text  ISBD Content form  isbdcf:T1009

54 8. Replace values with URIs subjectpredicateobject mlx:54321isbd:P1014“Museum archives: an introduction” mlx:54321rdarole:authorviaf:31899419/#Wythe,+ Deborah mlx:54321isbd:P1018“2004” mlx:54321dct:subjectlcsh:/sh85088707#conce pt mlx:54321isbd:P1003isbdmt:T1002 mlx:54321isbd:P1001isbdcf:T1009

55 9. Publish triples (linked data) mlx:54321 | isbd:P1014 | “Museum archives: an introduction” mlx:54321 | rdarole:author | viaf:31899419/#Wythe,+Deborah mlx:54321 | isbd:P1018 | “2004” mlx:54321 | dct:subject | lcsh:/sh85088707#concept mlx:54321 | isbd:P1003 | isbdmt:T1002 mlx:54321 | isbd:P1001 | isbdcf:T1009

56 Linked data chains mlx:54321 | dct:subject | lcsh:/sh85088707#concept lcsh:/sh85088707#concept | skos:related | rameau:XXX rameau:XXX | frbrer:isSubjectOf | mly:98765 rameau:XXX | skos:prefLabel | “archives du musée” mly:98765 | rda:titleOfTheWork | “Managing archives in museums”

57 Linked data cluster = “record” mlx:54321 | isbd:P1014 | “Museum archives: an introduction” mlx:54321 | rdarole:author | viaf:31899419/#Wythe,+Deborah mlx:54321 | isbd:P1018 | “2004” mlx:54321 | dct:subject | lcsh:/sh85088707#concept mlx:54321 | isbd:P1003 | isbdmt:T1002 mlx:54321 | isbd:P1001 | isbdcf:T1009

58 Metadata focus Shift of focus of metadata creation, maintenance, storage, preservation (by professionals, amateurs, machines) From RecordTo Statement(s) = triple(s) But metadata display...... aggregates triples (from multiple sources) to create records on the fly

59 Thank you  gordon@gordondunsire.com gordon@gordondunsire.com  Open Metadata Registry  http://metadataregistry.org/


Download ppt "Linked data and the implications for library cataloguing: metadata models and structures in the Semantic Web Gordon Dunsire Presented at the Canadian Library."

Similar presentations


Ads by Google