Linked Data Best Practices and BibFrame December 15 th, 2015 Rob Sanderson (google doc) CNI 2015 F ALL F ORUM.

Slides:



Advertisements
Similar presentations
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
The Semantic Web – WEEK 4: RDF
RDF AND LINKED DATA Jenn Riley Head, Carolina Digital Library and Archives The University of North Carolina at Chapel Hill.
An Introduction to MODS: The Metadata Object Description Schema Tech Talk By Daniel Gelaw Alemneh October 17, 2007 October 17, 2007.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Open Annotation Collaboration Rob Sanderson, Herbert Van de Sompel DMSS Meeting, May 14-15, Stanford, CA Robert Sanderson –
RDF Kitty Turner. Current Situation there is hardly any metadata on the Web search engine sites do the equivalent of going through a library, reading.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
From SHIQ and RDF to OWL: The Making of a Web Ontology Language
1 CMPT 275 Software Engineering Requirements Analysis Process Janice Regan,
Configuration Management
Metadata : Concentrating on the data, not on the scheme Imma Subirats FAO of the United Nations Marcia Zeng Kent State University euroCRIS Meeting Bologna.
National libraries and identity in the Semantic Web Gordon Dunsire BNE, Madrid, 14 Dec 2011.
Context and Prosopography: Putting the 'Archives' Into LOD-LAM Corey A Harper SAA MDOR
1Hydra Connect 2: Working Group Framework Empowering the Community through a Framework for Interest Groups and Working Groups Robin Ruggaber University.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
Logics for Data and Knowledge Representation
RDF and OWL Developing Semantic Web Services by H. Peter Alesso and Craig F. Smith CMPT 455/826 - Week 6, Day Sept-Dec 2009 – w6d21.
By: Dan Johnson & Jena Block. RDF definition What is Semantic web? Search Engine Example What is RDF? Triples Vocabularies RDF/XML Why RDF?
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
AMPol-Q: Adaptive Middleware Policy to support QoS Raja Afandi, Jianqing Zhang, Carl A. Gunter Computer Science Department, University of Illinois Urbana-Champaign.
© 2012 IBM Corporation Best Practices for Publishing RDF Vocabularies Arthur Ryman,
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Design Management: a Collabortive Design Solution ECMFA 2013 Montpellier, France Maged Elaasar (Presenter) Senior Software Engineer, IBM
Integrating Modeling Tools in the Development Lifecycle with OSLC Miami, October 2013 Adam Neal (Presenter) Maged.
RDF and XML 인공지능 연구실 한기덕. 2 개요  1. Basic of RDF  2. Example of RDF  3. How XML Namespaces Work  4. The Abbreviated RDF Syntax  5. RDF Resource Collections.
All the Reasons to be a Fan of PCC's Strategic Directions Shifting from Authorities to People, Places, Events, Awards… Steven Folsom | Metadata.
07/09/04 Johan Muskens ( TU/e Computer Science, System Architecture and Networking.
Ricardo Pereira Software Engineer TDWG Infrastructure Project (TIP)
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
ModelPedia Model Driven Engineering Graphical User Interfaces for Web 2.0 Sites Centro de Informática – CIn/UFPe ORCAS Group Eclipse GMF Fábio M. Pereira.
RELATORS, ROLES AND DATA… … similarities and differences.
Company LOGO Digital Infrastructure of RPI Personal Library Qi Pan Digital Infrastructure of RPI Personal Library Qi Pan.
The future of the Web: Semantic Web 9/30/2004 Xiangming Mu.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Role of national bibliographic agencies in linked data environment Gordon Dunsire Presented to staff of the Bibliothèque nationale de France, Paris, 25.
Understanding RDF. 2/30 What is RDF? Resource Description Framework is an XML-based language to describe resources. A common understanding of a resource.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Ontology Resource Discussion
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
Doc.: IEEE /0169r0 Submission Joe Kwak (InterDigital) Slide 1 November 2010 Slide 1 Overview of Resource Description Framework (RFD/XML) Date:
BIBFRAME Update Session  Library of Congress pilot and development  Beacher Wiggins – Pilot project  Sally McCallum – Vocabulary development  A supplier’s.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla.
Using DSDL plus annotations for Netconf (+) data modeling Rohan Mahy draft-mahy-canmod-dsdl-01.
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
DC Architecture WG meeting Wednesday Seminar Room: 5205 (2nd Floor)
ODATA DESIGN PRINCIPLES July 26, BUILD ON HTTP, REST OData is a RESTful HTTP Protocol Build on HTTP Entities modeled as Resources Relationships.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
MARC Tags to BIBFRAME Vocabulary: a new view of metadata Sally McCallum Library of Congress ALA - January 2014.
Linked Library (+AM) Data Presented LITA Next-Generation Catalog IG Corey A Harper Publish, Enrich, Relate and Un-Silo.
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
SysML v2 Model Interoperability & Standard API Requirements Axel Reichwein Consultant, Koneksys December 10, 2015.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Semantic Web Application Patterns: Pipelines, Versioning and Validation David Booth, Ph.D. (Consultant) W3C Linked Enterprise Data Patterns Workshop 7-Dec-2011.
Software Configuration Management
Linked Data Web that can be processed by machines
Jessie Kennedy Rob Gales, Robert Kukla
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
ALA Practical Linked Data With Open Source
Grid Computing 7700 Fall 2005 Lecture 18: Semantic Grid
Ontology Evolution: A Methodological Overview
The Re3gistry software and the INSPIRE Registry
Structure–Feedback on Structure ED-2 and Task Force Proposals
Grid Computing 7700 Fall 2005 Lecture 18: Semantic Grid
LOD reference architecture
Metadata The metadata contains
WebDAV Design Overview
Presentation transcript:

Linked Data Best Practices and BibFrame December 15 th, 2015 Rob Sanderson (google doc) CNI 2015 F ALL F ORUM W ASHINGTON DC

Overview Quick Linked Data Refresher Best Practices Assessment Define Your Domain Use URIs for Identity Provide Useful Information Reuse Existing Work 700s vs 500s Practical Concerns

Linked Data Refresher: RDF RDF encodes a Graph Data Structure W3C standard(s) Nodes are identified by URIs Data is included as literal values Anyone can assert anything about anything Blank nodes don't have URIs Order is hard

Linked Data Refresher: RDF RDF encodes a Graph Data Structure W3C standard(s) Nodes are identified by URIs Data is included as literal values Anyone can assert anything about anything Blank nodes don't have URIs Order is hard No honestly, it is really hard!

Linked Data Refresher: LOD Linked Open Data provides best practices for RDF 1.Use URIs as the names of things 2.Use HTTP URIs so people can look up those names 3.When someone does, provide useful information, using standards 4.Include links to other URIs, so they can discover more things

Linked Data Best Practices Linked Open Data provides a more consistent framework Constraints on RDF derived from successful usage Adapted and extended Demonstrable improvement for adoption and usability

Linked Data Best Practices Linked Open Data provides a more consistent framework Constraints on RDF derived from successful usage Adapted and extended Demonstrable improvement for adoption and usability Does BibFrame follow them? Do recent "2.0" updates help?

Define Your Domain A Domain Model keeps you honest by constraining scope Define appropriate terms from your domain model Define only terms from your domain model Define only one pattern for each feature Consider dynamic resources carefully

Define Your Domain Define appropriate terms from your domain model Define only terms from your domain model Define only one pattern for each feature Consider dynamic resources carefully ✔ Work, Instance, Item, Title, Identifier... ✗ Person, Place, Annotation, Relator, Resource,... ✗ title vs titleStatement, parts, notes vs relationships,... - Unclear. Circulation? Score: 1 / 4

Define Your Domain: 2.0 Define appropriate terms from your domain model Define only terms from your domain model Define only one pattern for each feature Consider dynamic resources carefully ✔ Work, Instance, Item, Title, Identifier... ½ Person, Place, Annotation, Relator, Resource,... ½ title vs titleStatement, parts, notes vs relationships,... - Unclear. Circulation? Score: 2 / 4

Use URIs for Identity URIs are globally unique identifiers, fundamental to Linked Open Data Use URIs, not strings URIs must identity one thing Use HTTP URIs Use Natural Keys in URIs Clients treat URIs as opaque Avoid Dates, Hash URIs

Use URIs for Identity Use URIs, not strings URIs must identity one thing Use HTTP URIs Use Natural Keys in URIs Clients treat URIs as opaque Avoid Dates, Hash URIs ✗ MANY uses of strings for identity, esp authorities ✗ Resource vs Metadata, Part vs Whole ✔ HTTP used in general But blank nodes overused? ✔ Ontology & examples are good; compare RDA's P10001 ✔ No URI construction or inferencing required ✔ No dates, versions in URIs, no recommendation for hashes Score: 4 / 6 (charitable)

Use URIs for Identity: 2.0 Use URIs, not strings URIs must identity one thing Use HTTP URIs Use Natural Keys in URIs Clients treat URIs as opaque Avoid Dates, Hash URIs ½ Fewer uses of strings for identity, esp NOT authorities ✔ Resource vs Metadata, Part vs Whole ✔ HTTP used in general But blank nodes overused? ✔ Ontology & examples are good; compare RDA's P10001 ✔ No URI construction or inferencing required ✔ No dates, versions in URIs, no recommendation for hashes Score: 5½ / 6 (somewhat charitable)

Provide Useful Information Someone else will do the most interesting thing with our data Provide useful information when URI is requested Describe your own resources... individually Include links to other resources Avoid reification, lists, and blank nodes

Provide Useful Information Provide useful information when URI is requested Describe your own resources individually Include links to other resources Avoid reification, lists, and blank nodes ✔ Promoted for main classes, Identifier needs attention - Not discussed but Annotations need attention ✗ Only internal references, not external (e.g. language) ✗ Reification (e.g. related) ✔ lists ✗ Blank nodes (everywhere) Score: 2 / 6

Provide Useful Information: 2.0 Provide useful information when URI is requested Describe your own resources individually Include links to other resources Avoid reification, lists, and blank nodes ✔ Promoted for main classes, Identifier got needed attention ✔ Annotations got attention ✗ Only internal references, not external (e.g. language) ✗ Reification (e.g. contribution) ✔ lists ✗ Blank nodes (still everywhere) Score: 3 / 6

Reuse Existing Work Don't ignore giant shoulders! Reuse existing vocabularies Define terms in your own namespace Relate new terms to appropriate existing ones Name terms consistently Only define what matters Inverse relationships matter

Reuse Existing Work Reuse existing vocabularies Define terms in your own namespace Relate new terms to appropriate existing ones Name terms consistently Only define what matters Inverse relationships matter ✗ Fundamental and ignored ✔ Somewhat faint praise? ✗ Minimally related outside of BibFrame ✗ Terms don't follow best practices, or internal convention ✗ Over engineered, doesn't reuse own properties ✗ Very inconsistently done Score: 1 / 6 (charitable)

Reuse Existing Work: 2.0 Reuse existing vocabularies Define terms in your own namespace Relate new terms to appropriate existing ones Name terms consistently Only define what matters Inverse relationships matter ✗ Fundamental and still ignored ✔ Still faint praise ✗ Minimally related outside of BibFrame ½ Terms don't follow best practices, or internal convention ½ Over engineered, doesn't reuse own properties ✗ Very inconsistently done Score: 2 / 6 (still charitable)

Conclusion: Improvement in 2.0! Charitable Scores: BibFrame total score: 8 / 22 = 36%... fail BibFrame 2.0 total score: 12.5 / 22 = 57% 57% is still a C grade, but now passing But... Perfection is the enemy of the good enough Work Still Needed: Reuse existing ontologies and vocabularies More consistency in design More linking = more part of the semantic web Drop remaining strings that provide identity

700s vs 500s

Okay, Who needs a hint?

700s vs 500s Hints: Not: Added Entries vs Notes

700s vs 500s Hints: Not: Added Entries vs Notes sh vs sh

700s vs 500s Hints: Not: Added Entries vs Notes sh vs sh dewey:700 vs dewey:500 Ahem... natural keys???

700s vs 500s Hints: Not: Added Entries vs Notes sh vs sh dewey:700 vs dewey:500 Ahem... natural keys???

Art vs Science They're more like... guidelines Any assessment is subjective There are no unbreakable rules Context and expected use are important to consider

Practical Concerns Documentation is currently insufficient for third parties to develop implementations Needs to be updated and maintained MARC to BibFrame converter from LC Wrapped by Stanford for local data improvements Wrapped by Cornell for ontology improvements Wrapped by... BibFrame "Lite" converter from Zepheira... Doesn't really implement BibFrame

Conversion Now Written in XQuery Suited for XML, not Graphs Very limited community Limited functionality Lack of Tests / Documentation Difficult to determine if current behavior is correct Difficult to know if code changes break existing behavior Inflexible Difficult to customize for local requirements Difficult to keep up to date with changing ontology Difficult to use external enrichment or transformation tools

Insufficient for Production We will need to: Re-run repeatedly as data, ontology and code change Handle enhancements to the MARC data Handle enrichments to the resulting graphs Customize it for local practices Know when it doesn't work And what needs to be fixed Share the development, configuration and understanding

Desirable Conversion Features Supported and developed by the community! Documented Testable and auditable Efficient Configurable Robust Integrated with local systems

How to Get There? Thoroughly document the ontology If it's too hard to document, it's too hard to implement! Use proposals process to solicit feedback on updates Document transformation processing algorithms used Per MARC field, and/or per BibFrame feature Updating with new proposals Engage with community to determine requirements Make it possible for stakeholders to implement their own patterns Seek partners for development LD4L keen to participate!

Thank You! (google doc: final draft of analysis report) (these slides on slideshare) R OB S /