Semantic & Multilingual Interoperability in Cultural Heritage Information Systems Vivien Petras Berlin School of Library and Information Science 14 November 2012 The Case of Europeana
Contents Dimensions of Semantic & Multilingual Interoperability Europeana: History Europeana: Portal Who is the Europeana User? Multilingual Interoperability in Europeana Semantic Interoperability in Europeana Semantic Enrichment in Europeana Previewing the new Europeana 2 Image:
Cultural Heritage Information Systems Collect, store, preserve, organize, search and display cultural heritage objects or their (metadata) representations in a digital environment Answer questions: who, why, where, how, when, what (Bearman & Trant, 2002) Answers depend on users and their cultural context where the content is coming from representation of content 3 Bearman & Trant (2002). Issues in Structuring Knowledge and Services for Universal Access to Online Science and Culture. Nobel Symposium (NS 120) “Virtual Museums and Public Understanding of Science and Culture”. Stockholm, Sweden. Image:
Interoperability Aggregate information resources from different information systems Enable seamless information access by mapping /merging: Formats Vocabulary Types of access Result representation Forms of interaction (Meaning? / Context?) 4 Image:
Dimensions of Multilingual Interoperability Interface Search – Query translation – Document translation Result presentation Browsing 5 Image:
Dimensions of Semantic Interoperability Data formats Metadata content Content terminology – Knowledge Organization Systems – Names User terminology – Search vocabulary – Technical vocabulary 6 Image:
Europeana 7 “A digital library that is a single, direct and multilingual access point to the European cultural heritage.” European Parliament, 27 September 2007
Europeana: History 2005 Google Books & the French 2005 EC: creation of European Digital Library (i2010 Strategy) digitization 2006 Working group on technical & functional interoperability 2007 EDLnet Functional Specification Nov Portal launch 8
Europeana Portal 2008 /
Europeana: History 2005 Google Books & the French 2005 EC: creation of European Digital Library (i2010 Strategy) digitization 2006 Working group on technical & functional interoperability 2007 EDLnet Functional Specification Nov Portal launch Spring 2009 Portal Re-launch 2010 Rhine Release Fall 2011 Re-Design Fall 2012 Open Data 2013 Re-launch 10
Europeana Today 23.5 million objects – 14.5 million images – 8.4 million textual objects – 400,000 sound files – 200,000 video files More than 2,200 institutions 33 countries Image:
Who is the Europeana User? European citizens EU = 27 member states, ca. 1/2 billion people 23 official EU languages 60 regional / minority languages 12 Image: 20% children/young (0-19 yrs)30% basic school education 60% adults45% high school diploma 20% retired25% college degree
Who is a Cultural Heritage User? cultural heritage professionals organizing or providing content cultural heritage professionals producing or selling content cultural heritage professionals creating content educational users studying or teaching objects or cultural heritage in general tourist users interested in visiting or providing guidance to cultural heritage objects or sights general users interested in culture (the “informed citizen”) 13 Image:
Challenges for User-centered Design Who are we designing for? Representing different cultural, political and societal perspectives (both on the producer and user sides) in a multilingually balanced way One default language is not an option. Most Europeana objects are language-independent (images), metadata is sparse and needs to be translated. valid for spoken & technical languages! 14 Image:
Multilingual Interoperability in Europeana Interface translated into 31 languages Query translation: prototype (EuropeanaConnect) Query result filtering by language Document translation (user enabled) Semantic data layer – Multilingual alignment of controlled vocabularies – Multilingual enrichment of metadata 15 Image:
Multilingual Interoperability in Europeana? Interface translated into 31 languages static content only Does not affect search User awareness? 16
Multilingual Interoperability in Europeana? Query translation: prototype (EuropeanaConnect) How many languages? How to deal with ambiguities? How much user influence? Which software? 17
Multilingual Interoperability in Europeana? Query result filtering by language Dependent on metadata record information Language of record or language of content? What is „multilingual“? 18
Multilingual Interoperability in Europeana? Document translation (user enabled) Only after record has been found 19
Multilingual Solutions in Europeana? Semantic data layer – Multilingual alignment of controlled vocabularies – Multilingual enrichment of metadata Europeana Semantic Data Layer Excurs: Europeana Data Model Semantic Alignment 20 Image:
Semantic Interoperability in Europeana library archive museum Europeana Semantic Data Layer: Bridging „isles of information“ by connecting objects from different domains via cross-vocabulary links. Doerr, M.; Gradmann, S.; Hennicke, S.; Isaac, A.; Van de Sompel, H. (2010). The Europeana Data Model (EDM). 76th IFLA General Conference and Assembly August 2010, Gothenburg, Sweden. 21
Europeana Semantic Data Layer = linked metadata records + linked contextual resources (KOS) Allows seamless structured search across different collections Allows faceted browsing across different collections Allows search across different vocabularies Who does the linking of metadata records? Who does the mapping of contextual resources? 22 Image:
Semantic Alignment of Contextual Resources Irish vocabulary Cousins, Jill (2010). Europeana Overview. Europeana Open Cultures Conference, October Amsterdam Norwegian vocabulary SKOS Mapping skos:exactMatch Identify and convert relevant semantic resources Pivot vocabularies for relevant categories (subject, persons, places…)
Semantic Alignments of Vocabularies Datacloud as developed in EuropeanaConnect, 2011 How many metadata records are covered? How successful is the matching? Is this useful for search?
Semantic Similarity: „similar content“ function Based on textual similarity of metadata (title, subject, description) Semantic Enrichment: Add mapped (multilingual) concepts from selected vocabularies to metadata records Concept, agent, period, place Increase search vocabulary Increase semantic interoperability Increase multilingual access 25 Image: Semantic Interoperability in Europeana
Semantic Similarity in Europeana 26
Semantic (& Multilingual) Enrichment in Europeana 27 Image: VocabularyTag typeEnriched metadata fields GEMET Thesaurus Conceptdc:subject dc:type dcterms:alternative DBpediaAgentdc:contributor dc:creator Semium Time Ontology Perioddc:date dc:coverage dcterms:temporal GeoNamesPlacedc:coverage dcterms:spatial
Semantic (& Multilingual) Enrichment in Europeana 28
Poisonous India…and other Enrichment Problems Olensky, M., Stiller, J., Dröge, E. (2012). Poisonous India or the Importance of a Semantic and Multilingual Enrichment Strategy. In: Proc. of MTSR 2012: Metadata and Semantics Research Conference, Nov. 2012, Cádiz, Spain. Study of enrichments of 200 Europeana metadata records Common errors and causes Query: „poison“ Result: Indian movie posters (in Swiss collections) Reason: India (French) Inde Inde = (Latvian) Poison 29 Image:
Enrichments – Problem Diagnosis Incorrect metadata (incorrect fields) Inconsistent name structures Bongiorno, Michelangelo, Fr Michelangelo (Buanarrotti) Inconsistent date structure Inconsistent field structure / refinements Choice of enrichment fields dc:type (?) Named entity treatment Common terms history and its enrichments 30 Image:
Enrichments – Problem Diagnosis Syntax correct, semantics wrong (context needed) Córdoba = Spain | Argentina Daniel Richter = French trade unionist | German artist Non-domain-specific enrichment vocabulary GEMET print (German) Druck pressure Cross-lingual ambiguity (false friends) electrical Power (German) Strom (Czech) strom (English) tree 31 Image:
Enrichment – Problem Areas Records: – metadata quality / structure – data cleaning / normalization before enrichment Vocabulary – domain-specificity, appropriateness – language ambiguity – scope of enrichment Work flow – Named entities – Matching rules Unsolved problem at this scale! 32 Image:
Previewing the new Europeana Re-launch with EDM-based metadata structure Improved interface Improved mobile access More structure in search Dynamic query suggestions More structure in result representation (no automatic enrichments?) 33 Image:
Previewing the new Europeana 34
Result List 35
Single View 36
EDM in Action 37
Summary Major efforts have gone into improving information access in Europeana. Lots of challenges still remain. The dynamic growth of Europeana requires dynamic solutions. What are the consequences of opening the data? What are the consequences of moving to EDM? Can collaborative features (collective intelligence) be the answer? The good news: plenty of work for us! 38 Image:
Questions, comments, suggestions? 39 Image: