Linked Library Data Modeling Metadata for the [Semantic] Web Presented Columbia University Digital Library Seminar Series Corey A Harper
Harper - Linked Library Data - Columbia University2 Topical Overview Semantic Web Intro Linked Open Data –Graphs: Entity – Attribute – Value –A Few Examples Library Data
Harper - Linked Library Data - Columbia University3 Topical Overview (cont) Linked Library Data –SKOS and Authority Control –FRBR and Bibliographic Data –National Libraries Resource Description and Access (RDA) Dublin Core Metadata Initiative
Harper - Linked Library Data - Columbia University4 Semantic Web TBL’s original vision –“Weaving the Web” – 1999 Then: Focus on Machine Reasoning –Scientific American Article Now: Focus on things & links –Reasoning becoming lower level
Harper - Linked Library Data - Columbia University5 Semantic Web Originally: –Metadata standard built on XML –Metadata about “Web” things Eventually: –Metadata about all things –Metadata about relationships between things
Harper - Linked Library Data - Columbia University6 Semantic Web Terminology Resource: Any thing Class: Abstraction of a type of thing Individual: An instance of a class Property: An attribute of an individual Ontology: A domain specific collection of classes and properties Statement/Triple: –A Resource (subject) - Nodes –A Property (predicate) - Arcs –A Value (object) - Nodes
Harper - Linked Library Data - Columbia University7 Semantic Web Terminology Graphs: Representations of statements about resources Nodes: The Subjects and Objects in a Graph Arcs: The Predicates in a Graph Literals: “Objects” represented as strings (constant values) rather than things (URI References) Domains and Ranges: Constraints on Nodes For Example…
Harper - Linked Library Data - Columbia University8
Harper - Linked Library Data - Columbia University9 RDF Resource Description Framework Formally Begun in 1999 Ideas from 1995 Finalized in 2004 Frighteningly complex at times… –“Directed Labeled Graphs”
Harper - Linked Library Data - Columbia University10 SemWeb Value Proposition Formally Modeled (Meta) Data Formal Semantics Declaration Increased Granularity compared to record-based Metadata Improved Interoperability
Harper - Linked Library Data - Columbia University11 “The vast bulk of data to be on the Semantic Web is already sitting in databases … all that is needed [is] to write an adapter to convert a particular format into RDF and all the content in that format is available.” - Tim Berners-Lee in an interview with the Consortium Standards Bulletin
Harper - Linked Library Data - Columbia University12 Linked Open Data Use URIs as names for things Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information. Include links to other URIs. so that they can discover more things.
Harper - Linked Library Data - Columbia University13
Harper - Linked Library Data - Columbia University14
Harper - Linked Library Data - Columbia University15
Harper - Linked Library Data - Columbia University16 Linked Data Cloud Automated generation –Comprehensive Knowledge Archive Network (CKAN)Comprehensive Knowledge Archive Network (CKAN) –Vocabulary of Interlinked Datasets (voiD)Vocabulary of Interlinked Datasets (voiD) –Basically, catalog your metadata! Recent criticism: data quality
Harper - Linked Library Data - Columbia University17 Data in the Cloud Hubs in the May 2008 Version: –FOAF –DBPedia Myriad Sources coming online: –Thompson Reuters –New York Times –British Broadcasting Corporation –Google and Facebook –More and More Library Data –Geonames –MusicBrains
Harper - Linked Library Data - Columbia University18 DBpedia Structured Wikipedia Data Genres, Influences, External Links Multi-lingual / Multi-script labels Rich Semantics Many linkages to other datasets
Harper - Linked Library Data - Columbia University19 DBpedia 3.4 Million “things” described Ontology based on “infoboxes”Ontology –1.5 million things classified Approx. 50,000 “Properties” –Approx. 1,200 defined in ontology Brief Example
Harper - Linked Library Data - Columbia University20 Domain Modeling Starting from application / goal / function “To guide and evaluate our designs, we need objective criteria that are founded on the purpose of the resulting artifact, rather than based on a priori notions of naturalness or Truth.” – Gruber, 1993 Does this apply to Libraries? FRBRer?
Harper - Linked Library Data - Columbia University21 DBPedia Model Partial basis in data entry conventions InfoBox’s, and InfoBox Templates Metadata Entry Format Partial source of Ontology –Class Structure –Vocabulary Design
Harper - Linked Library Data - Columbia University22 DBpedia 3.4 Million “things” described Ontology based on “infoboxes” –1.5 million things classified – Approx. 50,000 “Properties” –Approx. 1,200 defined in ontology
Harper - Linked Library Data - Columbia University23
Harper - Linked Library Data - Columbia University24
Harper - Linked Library Data - Columbia University26 More Examples British Broadcasting Corporation –Programmes, Music, Wildlife Google Refine and NY Times
Harper - Linked Library Data - Columbia University27 What *things* are in our data???
Harper - Linked Library Data - Columbia University28 …Library data is extremely complicated
Harper - Linked Library Data - Columbia University29 Bibliographic Data Rich stores of MARC, MODS, &c. Robust Controlled Vocabularies –Subject Heading lists –Code lists –Thesauri Emerging data model in FR*
Harper - Linked Library Data - Columbia University30 Bibliographic Vocabs Bibliographic Ontology –Zotero, Omeka, EPrints and Others FRBR – unofficial –And now Official (Thank you IFLA!) ISBD
Harper - Linked Library Data - Columbia University31 Library Authority Data “Include links to other URIs. so that they can discover more things.” Short of providing and linking to URIs, this *is* authority data. This is what our authority files are for.
Harper - Linked Library Data - Columbia University32 Library Controlled Vocabularies: Benefits Reputation - Trusted Tradition Mature - Time tested and carefully developed General & Comprehensive - Cover large knowledge spaces
Harper - Linked Library Data - Columbia University33 SKOS Simple Knowledge Organization System Properties and Classes for describing Controlled Vocabulary RDF Page skos:primaryTopic skos:person
Harper - Linked Library Data - Columbia University34 LCSH in Dublin Core Encoding Scheme for DC Subject No easy way to draw on equivelent terms and cross-references Abstract Model, RDF and SKOS could enable applications to make use of the whole vocabulary
Harper - Linked Library Data - Columbia University35 LCSH as a Web Service! Uses principles of linked data -> People noticed when taken down Links to French Subject Headings URIs for Literal String lookup Wide Web
Harper - Linked Library Data - Columbia University36
Harper - Linked Library Data - Columbia University37 Other Vocabularies Thesaurus for Economics French Subject Headings Swedish Subject Headings IconClass (not on web yet) OCLC Terminology Services Dewey Decimal Classification Virtual International Authority File
Harper - Linked Library Data - Columbia University38 Linked Library Data VIAF, LCSH, MARC Codes Open Library, XC, Kualli OLE Library of Congress, OCLC Hungarian, German, British, Swedish National Libraries Formalized Efforts: W3C, IFLA & RDA
Harper - Linked Library Data - Columbia University39 Kungliga Biblioteket Image courtesy of Martin Malmstem
Harper - Linked Library Data - Columbia University40 National Széchényi Library “ Our RDFDC, FAOF and SKOS statements are linked together. Our name authority is matched with the DBPedia name files and URI aliases are handled as owl:sameAs statements.” - Adam Horvath
Harper - Linked Library Data - Columbia University41 W3C LLD XG “Incubator Group” Membership: –Researchers, Consultants, Librarians –National Libraries: Germany, France, LoC, Sweden –OCLC & IFLA
Harper - Linked Library Data - Columbia University42
Harper - Linked Library Data - Columbia University43 W3C LLD XG Goals Collecting, Curating and Clustering over 50 Use Cases Mining use cases for functional requirements and design patterns Recommendations to W3C –Should lead to Working Groups
Harper - Linked Library Data - Columbia University44 RDA Development RDA elements, roles and vocabularies have been provisionally registered IFLA FRBRer and ISBD elements and vocabularies have been officially registered Discussions about long term maintenance of both RDA and the vocabularies Effort to create multi-language RDA Vocabularies RDA Slides Adapted from Diane Hillmann
Harper - Linked Library Data - Columbia University45 RDA Elements Listing 334!
Harper - Linked Library Data - Columbia University46 RDA Elements Listing 334! Base material
Harper - Linked Library Data - Columbia University47 Detail: Base Material
Harper - Linked Library Data - Columbia University48 Detail: Base Material URI
Harper - Linked Library Data - Columbia University49 RDA Base Material Vocabulary
Harper - Linked Library Data - Columbia University50 RDA WEMI Relationships
Harper - Linked Library Data - Columbia University51 Detail: RDA WEMI Relationship
Harper - Linked Library Data - Columbia University52 Metadata Registries Formerly NSDL Registry –Now “Open Metadata Registry” –Managing Vocabularies –Providing Vocabulary Services DCMI Registry Community DCMI Architecture Forum
Harper - Linked Library Data - Columbia University53 DCMI and the Semantic Web Collaboration from the start Libraries (esp. OCLC) were at the table Perception of DCMI as DCMES –DCMI = Metedata Vocab / Framework –DCMES = Metadata Record Format
Harper - Linked Library Data - Columbia University54 DCMI and the Semantic Web Every example above had dcterms DCMI as Research Institute and Metadata Think Tank –Modeling Work –Metadata Registries –Application Profiles –Description Set Profiles –Singapore Framework
Harper - Linked Library Data - Columbia University55 Changing Role of DCMI Mike Bergman at DC2010: –Reference Metadata –Reference Concepts –Mapping Predicates “Mappings should be approximate” –Usage Guidelines Compliment to W3C Standards
Harper - Linked Library Data - Columbia University56 Why Does This Matter? Our descriptions no longer stand alone! Connect our data with the rest of the WEB Allow others to reuse more easily –FOAF –DBPedia –Geonames –MusicBrains –New York Times –Thompson Reuters –Government Data - –British Broadcasting Corporation
Harper - Linked Library Data - Columbia University57 Conclusions Distributed bibliographic control environment –Linking Data –Focus on identification over description “In short, by treating values as non- literal resources and assigning URIs to them we give ourselves (and others) the hooks on which to hang further descriptions.” - Andy Powell
Harper - Linked Library Data - Columbia University58 Endless possibilities This barely scratches the surface The Giant Global Graph!! With more soundly modeled bibliographic and authority data… –Terminology Services –Context sensitive interfaces –Customized Exhibits –Mashups –Web Services –User Profiling –Collaboration tools
Harper - Linked Library Data - Columbia University59 Continuing Challenges Emerging Technology Design Patterns Complexity (http-range14) Existing Technical Infrastructure Bootstrapping Business Cases
Harper - Linked Library Data - Columbia University60 More Information W3C LLD XG: ALA LLD Interest Group: – IFLA Semantic Web SIG –
Harper - Linked Library Data - Columbia University61 Thanks! Questions?