Linked data implementations—

Slides:



Advertisements
Similar presentations
WDL Technical Architecture Working Group (TAWG) June 2010 Achievements and Recommendations Co-chaired by Noha Adly, Bibliotheca Alexandrina Babak Hamidzadeh,
Advertisements

Linked Library Data Miiya Holmes October 6-7, 2012.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
Highs and Lows of Library Linked Data Adrian Stevenson UKOLN, University of Bath, UK (until end Dec 2011) Mimas, Libraries and Archives Team, University.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
National libraries and identity in the Semantic Web Gordon Dunsire BNE, Madrid, 14 Dec 2011.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Archival description and linked data: Opportunities and implementation challenges Karen F. Gracy, Ph.D., Kent State University The Metadata Vocabulary.
The world’s libraries. Connected. WorldShare platform & Management Services Integrate all of your collections: print, licensed & digital Chris Thewlis.
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
Linked data the next network?. The Web of documents is for people The Web of data is for computers The Web of documents is difficult for computers to.
OCLC Research: Selected projects Eric Childress Larry Olszewski Presentation for Dpto. Biblioteconomía y Documentación Universidad Carlos III de Madrid.
Extending Access To Information Resource Discovery Service William E. Moen, Ph.D. Kathleen R. Murray, Ph.D. School of Library and Information Sciences.
Introduction to the Semantic Web and Linked Data
EuroCRIS strategic membership meeting Barcelona – 9-11 November 2015 Role of ISNI in research information management Titia van der Werf-Davelaar Senior.
ADLUG Roma (Italy) What is known must be shared Building on the insights from OCLC Research.
CNI Spring 2016 Membership Meeting San Antonio TX Linked Data Implementations— Who, What and Why? Karen Smith-Yoshimura OCLC Research.
Build Your Own Identity Hub Ted Lawless Code4Lib 2016 – March 8 th, 2016.
Linked Library (+AM) Data Presented LITA Next-Generation Catalog IG Corey A Harper Publish, Enrich, Relate and Un-Silo.
The world’s libraries. Connected. Linked Data A View of OCLC’s Strategy Ted Fons Executive Director, Data Services,& WorldCat Quality ALA Annual Conference,
MICHAEL Culture Association WP4 Integration of existing data structure into Europeana ATHENA, WP4 Working group technical meeting Konstanz, 7th of May.
MICHAEL and the European Digital Library: promoting teaching, learning and research The MICHAEL Project is funded under the European Commission eTEN Programme.
District Engagement with the WIDA ELP Standards and ACCESS for ELLs®: Survey Findings and Professional Development Implications Naomi Lee, WIDA Research.
Introduction to SHERPA RoMEO and its Significance for Publishers
eContentplus 2008 Work Programme
Building a Data Warehouse
Linked Open Data Approaches within the ARIADNE project
Publishing DDI-Related Topics Advantages and Challenges of Creating Publications Joachim Wackerow EDDI16 - 8th Annual European DDI User Conference Cologne,
Putting Linked Data at the Service of Libraries
Digital Library Development in Australia
DATA COLLECTION METHODS IN NURSING RESEARCH
FAST at the British Library
Multiple approaches to archival description
Linked Data and Libraries
Customer Services Excellence (CSE) workshop
Programme Board 6th Meeting May 2017 Craig Larlee
CREATIVE COMMONS FOR CULTURAL HERITAGE
Linking persistent identifiers at the British Library
Reinventing Cataloging: Models for the Future of Library Operations
ORCID y la comunidad global
Decisions, Decisions: How to Determine the Appropriate Method of Cataloging Special Collections in the 21st Century Presented by Patricia Falk, Music Catalog/Metadata.
Module 6: Preparing for RDA ...
Data Management: Documentation & Metadata
End of Year Performance Review Meetings and objective setting for 2018/19 This briefing pack is designed to be used by line managers to brief their teams.
What’s changed in linked data implementations in the last three years?
Reporting Based on Data in Archivists’ Toolkit
Assessing the Assessment Tool
DRIVER Digital Repository Infrastructure Vision for European Research
PREMIS Tools and Services
2. An overview of SDMX (What is SDMX? Part I)
Name authority control in an evolving landscape
Accommodating local cataloguing traditions in a global context
Why listen to me? Sr. Digital Marketing Specialist for Fastline Media Group Social media is my world Fastline has seen a… 1,044% growth in Facebook audience.
Why do Companies Invest in Multilingual Content Initiatives?
Mendeley Overview VISHAL GUPTA Customer Consultant South Asia
User Views on Quality Reporting
IdRef – Service of reference frames for Higher Education and Research
Foster Carer Retention Project Michelle Galbraith Project Manager
Journal Usage Statistics Portal (JUSP): a simpler way to measure use and impact
DELNET – Developing Library Network
TOOLS & Projects overview
Use Patterns of Print and Electronic Journals
A modest attempt at measuring and communicating about quality
Low-bandwidth Semantic Web
Mendeley Overview VISHAL GUPTA Customer Consultant South Asia
QoS Metadata Status 106th OGC Technical Committee Orléans, France
Australian and New Zealand Metadata Working Group
The Role of Metadata in Census Data Dissemination
AUC’s Role In Facilitating Access To Knowledge In The Arab World
Using FAST (Faceted Application of Subject Headings) in CONTENTdm
Presentation transcript:

Linked data implementations— Semantic Web in Libraries (SWIB18), Bonn, Germany 28 November 2018 Linked data implementations— who, what, why? Karen Smith-Yoshimura OCLC Research Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/ by CC-BY-SA license.

International Linked Data Surveys for Implementers Background: The impetus for an “International Linked Data Survey for Implementers” were discussions with OCLC Research Library Partner metadata managers who were aware of a number of linked data projects or services but felt there must be more “out there”. In consultation with a number of colleagues and after some beta testing with a group of linked data implementers with the survey instrument, we conducted an initial survey in July – August 2014. The target audience were those who had already implemented a linked data project or service, or were in the process of doing so. Questions were asked both about publishing linked data and consuming linked data. I published the results in a series of posts on our HangingTogether blog. One of the first criticisms we received were that the results did not include some leading linked data implementers such as the national libraries of France and Germany. So we repeated the survey between 1 June and 31 July 2015. <click> The results were published in D-Lib Magazine. We were then curious about what might have changed in linked data implementations since 2015, so we repeated the survey again between 17 April and 25 May 2018. 2015 2014

International Linked Data Surveys for Implementers A total of 143 institutions reported one or more linked data project or service, either ones publishing linked data, consuming linked data, or both in the three surveys conducted. This chart shows the breakdown of the institutions responding to one or more surveys. 24 institutions (17%) responded to all three; 35 (24%) responded to only the 2018 survey. Of the 81 institutions that responded to the 2018 survey, 35 (24%) had not responded to previous surveys. So all comparisons are with the caveat that the respondent pool differs,

Geographic breakdown of 143 responding institutions 23 countries represented 2018 German Respondents Bavarian State Library German National Library North Rhine-Westphalian Library Service Centre (HBZ) These are the countries represented by the 143 institutions which have implemented or are implementing at least one linked data project or service. US respondents numbered 60, or 42% of the total. Spain and the UK each had 14 respondents (10% each of the total) followed by The Netherlands’ 10 (7%). Canada and Germany complete the “top 5” non-US countries represented with 5 institutions each. Respondents from the United States to the 2018 survey accounted for 42% of the 81 responding institutions, with 34 institutions, followed by Spain (12), the United Kingdom (8), and The Netherlands (4). Canada, Germany, and Norway (3 each). <click> These are the three German institutions that responded to the 2018 survey.

Responding institutions by type We categorized the responding institutions by type. We were successful in our attempts to solicit responses from more national libraries in the 2015 survey compared to the 2014 survey, and almost the same number responded in 2018. Given that the survey was generally distributed on networks primarily used by libraries, most respondents in all three surveys tended to be from the library domain, with research libraries and national libraries the top two. The biggest change was for the first time <click> we received responses from service providers, which provide linked data services for their customers.

How long linked data project or service in production 2018 2015 Not yet in production 26 37 Less than one year 14 19 More than one year, less than two years 15 10 More than two years 18 46 More than four years 31 The 81 institutions responding to the 2018 survey described 104 linked data projects/services, compared to 71 institutions responding to the 2015 survey in which 112 were described. Institutions who had responded to a previous survey did not always describe the same linked data projects or services. Of the 104 linked data projects or services described, only 42 had been described previously. Even when the same project or service was described, the respondent sometimes differed from the one who responded previously. Some respondents did not answer every question, so the totals for each question may vary. 75% of the linked data projects/services described in 2018 are in production, slightly higher than the 67% reported in 2015. 40% of the linked data projects/services described in 2018 have been in production for more than four years. Total 104 112

How linked data is used 2018 2015 Consume linked data 34 38   2018 2015 Consume linked data 34 38 Publish linked data 5 10 Both consume & publish 65 64 In both the 2015 and 2018 surveys, most projects/services both consume and publish linked data. Relatively few only publish linked data, and the number of publish-only linked data projects/services decreased.

Success assessment 2018 2015 Indicators of success: Usage Data re-use More respondents reported that their linked data project or service was successful or “mostly” successful in 2018 than in 2015: 56% compared to 41%. Fewer didn’t know yet as their projects were still at an early stage (pre-implementation or early implementation. Comments from respondents whose linked data project or service has been in production for at least four years included the following indicators of success. <click> Usage: Most respondents noted substantial increases in usage over the years, and more contributors. Data re-use: Other applications make use of their linked data implementation; the number of bulk downloads. Interoperability: Their linked data service provides access to their other resources. User satisfaction: Linked data offers users a richer experience that is much more contextualized and inter-related. One pointed to better support of multilingualism by fetching multilingual labels from linked data vocabularies. One noted that their “happy users” are “probably unaware that the service is based on linked data.” Influence: The success of a project gains attention and illustrates what’s possible; they’ve influenced other initiatives and moved the discussion on linked data in the library community. Professional development: Even absent metrics demonstrating linked data’s value to others, linked data projects still provide professional development for staff. Indicators of success: Usage Data re-use Interoperability User satisfaction Influence Professional development

Most accessed Most respondents with linked data implementations in production either did not know the average number of requests the linked data project or service received daily over the previous six months, or did not keep or have access to usage statistics. Of those that do, these are the eight linked data implementations “most used” as measured by average number of requests per day in the 2018 survey – all reported over 100,000 requests per day, and all have been in production over four years. All had responded to the 2015 survey as well. American Numismatic Society’s nomisma is a thesaurus of numismatic concepts Bibliothèque nationale de France’s data.bnf.fr, provides access to the BnF’s collections and is a hub among different resources Europeana aggregates metadata for digital objects from museums, archives, and audiovisual archives across Europe Library of Congress’ Linked Data Service provides access to over 50 vocabularies; it receives 500,000 to a million requests a day National Diet Library’s NDL Search,provides access to bibliographic data from Japanese libraries, archives, museums and academic research institutions. North Rhine-Westphalian Library Service Center (HBZ) Linked Open Data service provides access to bibliographic resources, libraries and related organizations, and authority data. OCLC’s Virtual International Authority File (VIAF) aggregates 50 authority files from different countries and regions OCLC’s WorldCat Linked Data provides experimental access to its catalog of over 400 million bibliographic records in linked data form. The two I’ve circled here <click> reported that their usage has more than doubled over the last three years.

Publishing Linked data

Reasons for publishing linked data (n=92 in both 2015 & 2018) 2018 Expose to larger audience on the Web 74% 73% Demonstrate what could be done with datasets as linked data 65% 64% Heard about linked data and wanted to try it out by exposing our data as linked data. 45% 47% Needed to publish linked data in order to consume it 25% * The key motivations to publish linked data appear unchanged. Because some 2015 survey respondents had noted the need to publish linked data in order to consume it as an “other” motivation, we offered it as an option in the 2018 survey; it became the fourth most commonly cited reason.

Types of data published as linked data Given the relatively large representation of libraries among respondents, no surprise that descriptive, bibliographic and authority data are the most common types of data published in both surveys, with slight variations in numbers. “Data about people” was common enough in the “other” responses in the 2015 survey that we added it as an option in 2018 – it tied for fourth, with ontologies and vocabularies. There were also slight increases in the number of linked data projects/services publishing digital collections and geographic data in 2018.

RDF vocabularies/ontologies used We can see more differences in the RDF vocabularies and ontologies used in the last three years. This chart shows the top eight reported in 2018 (used by at least 20% of respondents) compared to 2015. We observe substantial increases in Schema.org and BibFrame, with decreased usage of SKOS and FOAF in particular.

Serializations used The most common serializataion of linked data continues to be RDF/XML. The most visible differences between the 2015 and 2018 surveys is an uptick in JSON-LD and a decreae in RDFa. NB Number of respondents to this question was almost the same: 91 in 2018 vs. 94 in 2015

Barriers to publishing linked data 2018 Steep learning curve for staff 41 Inconsistency in legacy data 38 Selecting appropriate ontologies to represent our data 26 Lack of resources 23 Little documentation or advice on how to build the systems Establishing the links 22 The top six barriers cited by the 2018 survey respondents were almost the same as the 2015 survey respondents, “Lack of resources” was added as an option (a popular “other” response in 2015) and tied with little documentation. “Establishing the links” was the 4th top barrier in 2015.

Consuming Linked data

Reasons for consuming linked data 2018 2015 Provide our users with a richer experience. 54 51 Enhance our own data by consuming linked data from other sources. 49 50 Heard about linked data and wanted to try it out by using linked data sources. 23 17 More effective internal metadata management. 21 32 Experiment with combining different types of data into a single triple store. 20 Greater accuracy and scope in our search results 19 27 See if consuming linked data would improve our Search Engine Optimization (SEO) 7 The top two reasons given for consuming linked data are the same among respondents to the 2015 and 2018 surveys. But there are drops in those motivated by <click> improving SEO (-18%), more effective internal metadata management (-17%), and achieving greater accuracy and scope in search results (-12%). NB Number of respondents to this question was almost the same: 69 in 2018 vs. 68 in 2015

Linked Data sources consumed These are the top 10 linked data sources consumed by the 2018 survey respondents compared to 2015. The biggest change <click> was the surge in consuming Wikidata, more than four times that in the linked data implementations in 2015. We also see big increases in consuming WorldCat.org and ISNI. The asterisks indicate the ones which also responded to the survey. These could be considered successful publishers of linked data by the degree to which others consume the data provided. NB Number of respondents to this question was almost the same: 69 in 2018 vs. 68 in 2015

Some that started consuming Wikidata I looked at the responses that included Wikidata as a linked data source in 2018 but did not use it in 2015. as they contributed to that surge in the use of Wikidata in the previous slide. They include a couple of the “most accessed” implementations cited previously. Some of the ones that added Wikidata as a linked data source: AECID Digital Library, which aggregates resources about Latin America, Islam, and Arabic countries Digital Public Library of America (DPLA), providing national-scale aggregation of cultural heritage metadata. The Ignacio Larramendi Foundation’s Polymath Virtual Library, which aggregates works of the most important Hispanic polymaths and establishes relationships between them National Library of Spain’s Datos), providing a semantic view of bibliographic and authority data OCLC’s Faceted Application of Subject Terminology (FAST), faceted subject heading schema derived from the Library of Congress Subject Headings to make the schema easier to understand, control, apply, and use. Pratt Institute’s Linked Jazz), a research project to uncover meaningful connections to the personal and professional lives of jazz artists. National Library of Finland noted, ““Wikidata is becoming more and more significant for cultural heritage institutions, including our library.”

Barriers to consuming linked data 2018 2015 Matching, disambiguating and aligning source data and linked data resources 48% 39% What's published as linked data isn’t always reusable/lacks URIs 31% 27% Size of RDF dumps 28% 20% Unstable endpoints 17% Service reliability 26% 15% Mapping of vocabulary 29% Understanding how data is structured before using it 24% This lists the top 7 barriers in the 2018 survey responses. The top barrier in both the 2018 and 2015 surveys was matching, disambiguating, and aligning source data and linked data sources. The biggest difference: an uptick in the number of responses pointing to unstable endpoints and service reliability. NB Number of respondents to this question was almost the same: 58 in 2018 vs. 59 in 2015

Some New examples In production I noted earlier that 40% of the linked data projects/services described in the 2018 survey had been in production for over four years, several of which appeared in the “sources most consumed” slide. I’ve selected a few examples from the 47 linked data implementations described that have put into production since the last survey. Not so easy, as they are meant for machines to read, not a human like me. This is just a sampling.

The National Library of Finland’s service to access its bibliographic database has been in production for less than one year. It noted, “Creating this service has been a major milestone” and makes it clear what the possibilities and limitations are.” <Cllick> The National Library of Hungary first converted its Hungarian common catalog into linked data and then provided this new service less than a year ago to provide access to its electronic library through SPARQL queries. http://data.nationallibrary.fi http://v.mek.oszk.hu/FlintSparqlEditor/index-mek.html

https://dati.cobis.to.it/ CoBIS is a collaboration among six libraries in Turin, providing linked data access to their shared library collections, launched in December 2017. It noted it has been a “great success” in making their collections visible and is one of the “very few” such projects in Italy.

http://data.carnegiehall.org http://archesproject.com/ In the United States, Carnegie Hall has released its performance history data from 1891 through the end of 2015 as linked data, launched less than a year ago. <Click> The Getty Conservation Institute collaborated with the World Movements Fund to develop Arches, a geospatially-enabled software platform for cultural heritage inventory and management.

Some Advice from Implementers & Implications

Advice from implementers Learn from others Focus on your use cases Collaborate with others Integrate linked data into your daily workflows Analyze what legacy data should be converted Never underestimate the amount of data cleanup that will be required. Use existing identifiers and ontologies Iterate based on user feedback Expect benefits only after reaching scale Learn from others: Take advantage of the many more exemplars of good practice and information related to linked data; read as widely as possible, including W3C recommendations, and consult with community experts. Focus on your use cases: Match use cases and user needs with what the data can provide, and be prepared to work with multiple data models. Define the scope and requirements before starting any work. Collaborate with others rather than doing everything yourself Use existing identifiers and ontologies: Use identifiers already created by authoritative linked data sources whenever possible. Create your own ontologies only to fill gaps. Select linked data sources by both the content and number of links to other linked data sources.

Implications Emergence of service providers Growing diversity of linked data implementations Most linked data projects and services remain experimental or educational in nature. The emergence of service providers may lead to fewer individual institutions launching their own linked data projects. Among the 2018 survey respondents, 37% relied at least partially on a system vendor, corporation, or external consultants or developers to implement their linked data project or service, and several institutions were clients of providers who also responded to the survey. Forty-two percent of the “new” responses to the survey (not described in previous surveys) were outside the library domain. Besides the new category of service providers noted earlier, we see more linked data initiatives from research institutions and cultural heritage organizations. Few of the linked data implementations are integrated into daily workflows. Observes the Oslo Public Library, “As far as I can see, Oslo public library is still the first and only library with its production catalogue and original cataloguing workflows done directly with linked data.” “This is the future of data for libraries and the longer we wait the further behind we're going to fall.”  

Details of all responses oc.lc/ld-survey-responses We’ve added the responses to the 2018 survey to the Excel workbook with the 2014 and 2015 responses (without the contact information which we promised we’d keep confidential). Feel free to make your own comparisons, or focus on the institutions that most resemble yours.

Thank you! Contact: Karen Smith-Yoshimura @KarenS_Y smithyok@oclc.org Semantic Web in Libraries (SWIB18), Bonn, Germany 28 November 2018 Thank you! Contact: Karen Smith-Yoshimura smithyok@oclc.org @KarenS_Y ©2018 OCLC. This work is licensed under a Creative Commons Attribution 4.0 International License. Suggested attribution: “This work uses content from Linked Data Implementations—Who, What, Why? © OCLC, used under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0/.”