Wikidata, a target for Europeana’s semantic strategy Valentine Charles, Hugo Manguinhas, Antoine Isaac: Europeana Vladimir Alexiev: Ontotext Corp GLAM.

Slides:



Advertisements
Similar presentations
Cultural Heritage in REGional NETworks REGNET Review Meeting (REV-01-01), , Brussels.
Advertisements

The DART-Europe E-theses Portal Martin Moyle Digital Curation Manager UCL Library Services, UK ETD 2009, University of Pittsburgh, June.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
Data modeling at Europeana Antoine Isaac METS Workshop at the Digital Libraries 2014 Conference London, Sept. 11, 2014.
Introduction KWizCom Business Card Founded in 2005 Headquartered in Toronto Global provider of add-ons and services customers worldwide Business.
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
Union Catalog and Knowledge Engineering for TELDAP Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow.
SKOS and Linked Data Antoine Isaac ISKO, London, Sept. 14th 2010.
A web-based repository service for vocabularies and alignments in the Cultural Heritage domain Lourens van der Meij Antoine Isaac Claus Zinn.
Notes on ThoughtLab / Athena WP4 November 13, 2009 Antoine Isaac
Aligning Thesauri for an integrated Access to Cultural Heritage Collections Antoine ISAAC (including slides by Frank van Harmelen) STITCH Project UDC Conference.
ÆKOS: A new paradigm for discovery and access to complex ecological data David Turner, Paul Chinnick, Andrew Graham, Matt Schneider, Craig Walker Logos.
National libraries and identity in the Semantic Web Gordon Dunsire BNE, Madrid, 14 Dec 2011.
Carlos Lamsfus. ISWDS 2005 Galway, November 7th 2005 CENTRO DE TECNOLOGÍAS DE INTERACCIÓN VISUAL Y COMUNICACIONES VISUAL INTERACTION AND COMMUNICATIONS.
Networking Session: Global Information Structures for Science & Cultural Heritage - The Interoperability Challenge «INTEROPERABILITY FROM THE CULTURAL.
1/ 27 The Agriculture Ontology Service Initiative APAN Conference 20 July 2006 Singapore.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
4th project meeting 27-29/05/2013, Budapest, Hungary FP 7-INFRASTRUCTURES programme agINFRA agINFRA A data infrastructure for agriculture.
Archival description and linked data: Opportunities and implementation challenges Karen F. Gracy, Ph.D., Kent State University The Metadata Vocabulary.
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
CompuBase Data for CRM / PRM Integration How compuBase fits to an existing CRM / PRM system? Last review 25/03/2007.
Europeana and Open Data Robina Clayphan Interoperability Manager, Europeana LDBC TUC meeting, 19 November, 2013.
The OAI-ORE based data model of Europeana and the Digital Public Library of America: implications for educational publishing Dov Winer MAKASH – Advancing.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Using Taxonomies Effectively in the Organization v. 2.0 KnowledgeNets 2001 Vivian Bliss Microsoft Knowledge Network Group
Metadata, the CARARE Aggregation service and 3D ICONS Kate Fernie, MDR Partners, UK.
METADATA QUALITY IN EUROPEANA , Den Haag.
Information Systems & Semantic Web University of Koblenz ▪ Landau, Germany Semantic Web - Multimedia Annotation – Steffen Staab
Europeana as a Linked Open Data case (in progress) Antoine Isaac ISKO UK Seminar “Making Metadata Work” London, June 23, 2014.
Europeana and semantic alignment of vocabularies Antoine Isaac Jacco van Ossenbruggen, Victor de Boer, Jan Wielemaker, Guus Schreiber Europeana & Vrije.
Linked data the next network?. The Web of documents is for people The Web of data is for computers The Web of documents is difficult for computers to.
© Copyright 2008 STI INNSBRUCK Media Meets Semantic Web – How the BBC Uses DBpedia and Linked Data to Make Connections.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
Aligning library-domain metadata with the Europeana Data Model Sally CHAMBERS Valentine CHARLES ELAG 2011, Prague.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker.
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena.
EDM Europeana Data Model Guus Schreiber with input from Carlo Meghini, Antoine Isaac, Stefan Gradmann, Maxx Dekkers et al. from Europeana V1.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Digitization – Basics and Beyond workshop Interoperability of cultural and academic resources New services for digitized collections Muriel Foulonneau.
How Linked Open Data helps Museums Collaborate, Reach New Audiences, and Improve Access to art Information Eleanor E. Fink Manager, American Art Collaborative.
Trait ontology approach Marie-Angélique LAPORTE NCEAS June 7 th 2010.
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
Antoine Isaac Europeana – VU University Amsterdam Dagstuhl Multilingual Semantic Web seminar.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
CNI Spring 2016 Membership Meeting San Antonio TX Linked Data Implementations— Who, What and Why? Karen Smith-Yoshimura OCLC Research.
LoCloud Conference - Sharing local cultural heritage online with LoCloud services Microservices in LoCloud Walter Koch Gerda Koch
Linked Library (+AM) Data Presented LITA Next-Generation Catalog IG Corey A Harper Publish, Enrich, Relate and Un-Silo.
Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Semantic & Multilingual Interoperability in Cultural Heritage Information Systems Vivien Petras Berlin School of Library and Information Science 14 November.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Linked Open Data Approaches within the ARIADNE project
AIT Austrian Institute of Technology
Semantic Database Builder
Cataloging the Internet
From a thesaurus standard to a general knowledge organization standard?! 04/12/2018.
Antoine Isaac.
PREMIS Tools and Services
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Antoine Isaac SEMIC conference
Metadata supported full-text search in a web archive
Presentation transcript:

Wikidata, a target for Europeana’s semantic strategy Valentine Charles, Hugo Manguinhas, Antoine Isaac: Europeana Vladimir Alexiev: Ontotext Corp GLAM Wiki 2015, Den Haag

Europeana.eu, Europe’s cultural heritage portal 40M objects from 2,200 galleries, museums, archives and libraries

Europeana has many data challenges: diversity  Aggregates metadata from the cultural heritage sector in Europe Large amount of references to places, agents, concepts, time

Europeana has many data challenges: diversity  Metadata in more than 30 languages  From all EU countries

Europeana’s priority 1: Improve data quality  Europeana Data Model (EDM), a framework for richer data Re-uses several existing Semantic Web-based models Dublin Core, OAI-ORE, SKOS, CIDOC-CRM… EDM gives support for contextual resources (semantic layer)  Rely on vocabularies to solve a problem of data interlinking Encourage data providers to contribute their own vocabularies and benefit from data links made at data providers’ level

Vocabularies currently provided to Europeana

Europeana also manages its own vocabularies

External Dataset and Vocabulary External Dataset and Vocabulary Europeana performs automatic enrichment based on vocabularies Goal: Contextualization which reaches outside the scope of a particular platform Object

Automatic enrichment process in Europeana Selection of metadata fields in resource descriptions Selection of potential rules to match Selection of metadata fields in resource descriptions Selection of potential rules to match Analysis Matching the values of the metadata fields to values of the contextual resources Adding contextual links Matching the values of the metadata fields to values of the contextual resources Adding contextual links Linking Selecting the values from the contextual resource Augmentation of the search index with the labels from the vocabulary Selecting the values from the contextual resource Augmentation of the search index with the labels from the vocabulary Augmentation

Enrichment Types and Current Vocabularies Enrichment TypeTarget vocabulary Source metadata fields PlacesGeoNamesdcterms:spatial, dc:coverage ConceptsGEMET, DBpediadc:subject, dc:type AgentsDBpediadc:creator, dc:contributor TimeSemium Time dc:date, dc:coverage, dcterms:temporal, edm:year

Europeana enrichment - an example

How Wikidata fits in Europeana’s semantic strategy?

Wikipedia's Relevance for Cultural Heritage  Authority Lists and Thesauri have central importance in CH  Wikipedia being "the sum of all knowledge" has broader reach than any institutional authority list  Only large-scale aggregations like VIAF (35 institutions) and LCSH (about 10 libraries around LoC) are comparable  While some facts are inaccurate and disputable, Wikipedia has a great role as a source of stable URLs on all kinds of topics

How Big is Wikidata?  Name data sources for semantic enrichment (Europeana Creative D2.4) gives DBpedia and Wikidata stats Name data sources for semantic enrichment  Wikidata: 3y old, 14M items, 209M edits  2.7M humans, 5k families, 22k literary characters  215k organizations  66k creative orgs (bands, radio/TV stations, newspapers…)  30k educational institutions  20k non-profit orgs  13k GLAM orgs: 0.5k galleries,1k libraries, 0.2k archives, 9k museums  500k creative works  110k heritages sites and monuments  40k family names, 20k first names

Is this big enough?  Wikidata: 2.7M humans, 215k organizations, 800k places, 500k works  VIAF: 35M personal names, 5.4M orgs/conferences, 410k places, 1.7M works  GeoNames: 9M places  Only 1.1M persons are coreferenced, see Authority Addicts: The New Frontier of Authority Control on WikidataAuthority Addicts: The New Frontier of Authority Control on Wikidata  VIAF much bigger but still Wikidata is very important for GLAM:  Wikidata is active in Authority Control and Coreferencing  (VIAF) Moving to Wikidata: will get 1M persons/orgs, and many multilingual names (see next) (VIAF) Moving to Wikidata  Authority Files have barely more than names & dates; Wikipedia often has a lot more info

Wikidata Multilingual Coverage  Wikidata/DBpedia has huge multilingual coverage  Each entity is represented in 2.11 Wikipedias on average (see Europeana food and drink classification scheme, EFD D2.2) Europeana food and drink classification scheme  But popular entities are present in many more (up to 180); and even in one Wikipedia there are many languages  E.g. Lucas Cranach in Wikidata: 57 lang tags, representing 44 languages and 13 language variants  Languages are consistently marked  Important for semantic enrichment (Named Entity Recognition)  Even though language labels in Europeana are not consistent 

Name Variants for Lucas Cranach  Wikidata and VIAF each have 70 variants and dominate the "Wikipedia tradition" and "Library tradition" datasets respectively (see Name data sources for semantic enrichment) Name data sources for semantic enrichment  Only 5 variants are in common (see Interactive Venn diagram) Interactive Venn diagram  Excellent complementarity. VIAF has more variants, Wikidata more multilingual names  VIAF's move to sync to Wikidata will narrow the gap

Wikidata is connected to other vocabularies  Europeana prefers using pivot vocabularies that are connected to many other vocabularies It is key to avoid duplication and redundancy  Wikidata has lot of coreferences to other vocabularies that can be used to create extra links, and extract missing data trol trol shots and news Please tweet!

VIAF-Wikidata Coreferences for Lucas Cranach  Can be leveraged to fill the gaps, e.g. bring RKDartists into VIAF VIAFid in VIAFWikidataid in Wikidata viafID VIAF BAVADV BNC.a BNEXX BNFcb hBNF h DNB GND ISNI ISNI JPG ULAN LCn LCCNn LNBLNC NDL NKCjn NLA NLI , , NLPa NTA NTA PPN NUKATvtls SELIBR SUDOC WKPLucas_Cranach_the_ElderMany Wikipedias IMAGINET7238,T267474Cantica Commons CreatorLucas Cranach (I) Commons categoryLucas Cranach d. Ä. Freebase/m/0kqp0 RKDartists18978 SIMBADCRANACH, Lucas the Elder Your Paintings lucas- ​ the- ​ elder- ​ cranach

Wikidata Coreferencing (1)  Excellent Mix-n-Match tool by Magnus Manske. 54 catalogs loaded!!Mix-n-Match tool  Decent auto-matching and excellent crowd-sourcing features

Wikidata Coreferencing (2)  Excellent Authority Control navbox in Wikipedia  E.g. matching British Museum person-institution thesaurus (currently not coreferenced to anything: high value to BM)

Europeana Food and Drink  How do you define such wide area as Food and Drink, which is so pervasive in every day life and culture?  Europeana food and drink classification scheme (EFD D2.2, or presentation) studies ~20 datasets for relevance to FD Europeana food and drink classification schemepresentation  Concludes that Wikipedia is our playing ground, and we should try to use Wikipedia Categories to delineate the topic AGROVOC has 32k concepts but on production/science Wikipedia/DBpedia has 6.6k proper Foods (with infoboxes and ingredients) But I estimate M things relevant to FD in all Wikipedias  Background image: 2 levels of Food_and_drink cat hierarchy2 levels of Food_and_drink cat hierarchy

Wikidata is Easily Accessible  It is important for Europeana to have the data Technically available: Data dump preferably as Linked Data (RDF) SPARQL end-point or other query mechanism (e.g. WDQ) Properly documented and structured Wikidata has an excellent Property Proposal process Wikidata integrity constraints are excellent In contrast, no Class creation process, so the classes are quite a mess (16k of which 2/3 have less than 5 instances) Data templates should be made more visible and be used as references Open access

Wikidata Property Integrity Constraints  E.g. ULAN id constraints help to find records to merge / splitULAN id constraints  E.g. Communist Party of the Russian Federation has 5 LCNAF id's, what's up? Is it so popular with the Library of Congress?Communist Party of the Russian Federation

How Wikidata will be used by Europeana  Semantic Enrichment of Europeana data with additional information With a specific focus on entities such as persons and concepts  Linking Europeana objects with Wikidata Approach similar to aintings aintings But would be extended to the whole Europeana dataset Links would be added in the Europeana data  Structure (data template) for CH objects (e.g. paintings) still not very rich on Wikidata, e.g. Measurements not there Improvements are made all the time, but see next

Wikidata Items as Linking Hubs  Still, they're great as stable URLs  Providing the basic info (who, when, where, what)  And acting as coreferencing hubs  I don't expect Wikidata CH objects to ever be described in the full richness & complexity of professional art research. E.g. see British Museum Mapping to CIDOC CRM British Museum Mapping

Wikidata and DBpedia  Wikidata and DBpedia are the two structured representations of Wikipedia  Wikidata: initially populated from Wikipedia, manually curated, will master structured data for Wikipedia. Synchronized through an assortment of bots  Data is fairly accurate but data depth is still small  DBpedia: automatically extracted from Wikipedia, live update, one- way extraction only.  Data reach is deep, but there are many problems in ontology and individual mappings, especially for non-English. E.g. United Nations is extracted as "Country". See DBpedia Ontology and Mapping Problems.DBpedia Ontology and Mapping Problems Should they be together?

GLAMs should add to Wikipedia or Wikidata!  EFD project. Swiecenie Koszyczek, "blessing of the baskets", a colorful Polish tradition  There's no article in pl.wikipedia.org, so we can't relate such artifacts to anything  Content partner's museum staff have no time to make a proper Wikipedia article  But adding a Wikidata item is quick & easy  Appropriate categories (Easter Traditions, Easter- related Foods) will put it in context

Thank you Valentine Charles, Vladimir Alexiev, Hugo Manguinhas,