Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification.

Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification Section Executive Committee Forum, ALA Annual, 24 June 2011

Overview  Introduction to linked data and the Semantic Web  From record to statement: a paradigm shift  Some issues

Linked data and RDF  Resource Description Framework (RDF)  Designed for machine-processing of metadata at global scale (Semantic Web)  24/7/365  Trillions of operations per second  Everything must be dis-ambiguated  Machines are dumb  Simplicity helps!  Machine-readable identifiers

RDF triple  Metadata expressed as “atomic” statements  A simple, single, irreducible statement  The title of this book is “Cataloguing is fun!”  Constructed in 3 parts  “Triple”  The title of this book is “Cataloguing is fun!”  Subject of the statement = Subject: This book  Nature of the statement = Predicate: has title  Value of the statement = Object: “Cataloguing is fun!”  This book – has title – “Cataloguing is fun!”  subject – predicate - object

Machine-readable identifiers  Uniform Resource Identifier (URI)  Can be any unique combination of numbers and letters  No intrinsic meaning; it’s just an identifier  Can look like a URL  “Cool” URI: exploits existing processes developed for the World-Wide Web  http://iflastandards.info/ns/isbd/elements/P1001  But does not lead to a Web page (in principle...)  RDF requires the subject and predicate of triple to be URIs  Object can be a URI, or a literal string (“Cataloguing is fun!”)

Title:Cataloguing is fun! Author:Mary MacDonald Content type: Media type: LCSH: microform text Cataloging Bibliographic record: 12345 b12345Author“Mary MacDonald” b12345Title“Cataloguing is fun!” b12345Content type“text” b12345Media type“microform b12345LCSH“Cataloging” subjectpredicateobject Name authority record: 8765 Heading:MacDonald, Mary n8765Heading“MacDonald, Mary” n8765 t1234Preferred label“microform” t1234 lc1234 Heading“Cataloging”lc1234Preferred label“text”t9876

Identifiers for properties  Predicates are known as properties in RDF  http://iflastandards.info/ns/isbd/elements/P1004 http://iflastandards.info/ns/isbd/elements/P1004  “has key title”  Properties can be mixed’n’matched  Chosen from different sources (element sets)  Different element sets contain similar properties  http://RDVocab.info/Elements/keyTitleManifestation http://RDVocab.info/Elements/keyTitleManifestation  “Key title (Manifestation) ”  Some element sets are not available in RDF  E.g. MARC21

Choosing properties/URIs for legacy records  Closest inclusive meaning  Minimises information loss  Check the definition  ISBD’s “has title proper” better than Dublin Core’s “title” (a name given to the resource.)  Check other semantic constraints  RDA’s “titleManifestation” implies a triple’s subject URI is a Manifestation  No good for non-FRBRized records

Metadata rights  Potential legal minefield  Multiple agencies contributing to one record  Anxiety that “others” may use open triples to build rival, competitive services  Main rights associated with the record?  i.e. As an aggregation of triples  Can a triple be copyrighted if component URIs are openly published?

“Minting” URIs for resources  Specific subject of a triple  Mainly bibliographic resources  URIs for Persons, Places, etc. taken from RDF “authorities”  FRBRized records need separate URI for the Work, Expression, Manifestion, (Item)  “Standard” identifiers only a partial solution  ISBN, ISSN, national bibliography numbers, etc.  Risk of different agencies creating different URIs for the same resource  Inefficient, and costly to maintain namespaces

Other costs  Providing access to triples  Data-dump, triple store, data query (SPARQL)  URIs should last forever  Preservation and archive regime required  De-referencing services  Providing human- and machine-readable information about a URI  Cost of re-engineering systems, re-designing interfaces, re-training cataloguers...  But long-term benefits will justify the investment

The Semantic Web ecosystem  Not just professionally-generated triples  Machines generate triples by parsing content and semantic inferencing  RDA anticipates...  User-generated tags  The madness (or wisdom) of crowds  Other communities generate relevant triples  Memory institutions, publishers, reference services  Everybody uses triples  In ways beyond our dreams...

Thank you  gordon@gordondunsire.com  Sponsors  ALA  Cataloging & Classification Quarterly  MARCIVE, Inc.

Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification.

Similar presentations

Presentation on theme: "Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification.

Similar presentations

Presentation on theme: "Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification."— Presentation transcript:

Similar presentations

About project

Feedback