Antoine Isaac Europeana – VU University Amsterdam Dagstuhl Multilingual Semantic Web seminar
Europeana 24 M objects (images, text, sound and video) From over libraries, museums, archives From 33 countries For everyone “A digital library that is a single, direct and multilingual access point to the European cultural heritage.” European Parliament
Multilingual Access in Europeana
Dimensions of multilingual access Interface Search (q uery translation or document translation) Result presentation Browsing
Europeana's efforts Interface translated into 26 languages Query translation: only prototype Query result filtering by country/language Document translation (user enabled) Semantic contextualization of objects Multilingual enrichment/annotation of metadata
Making metadata work for multilingual access
Current metadata in Europeana Simple object records Flat (text values) Without language tags! Only language-related info on metadata is at collection level Can be "mul" Need to change! a new Europeana Data Model (EDM)
"Semantic layer" of contextual resources (concepts, persons, places, events...) Networked objects Cultural artefact Painting Sculpture Buildling Exploiting semantic relations e.g. “broader concept”, “place of birth”, “involved person”…
Multilingual metadata
Fetching already available linked data E.g., from libraries
Interoperability Encouraging the use of RDF + common and simple elements
Interoperability Encouraging the use of common and simple data elements Piano carré Pianoforte a tavolino Square pianoforte Tafelklavier Tafelpiano Taffel Pianofortes
Interoperability mixed nature of eligible contextual resources: dictionaries, synonym/translation lists, thesauri, authority lists, gazetteers… interplay: “semantic” data next to multilingual data
Simultaneous approaches Getting richer semantic/multilingual metadata from providers Fetching third-party contextual data and linking it to “un- contextualized” objects Linking contextual data from an institution to another more general / more commonly used contextual dataset Dbpedia.org, VIAF.org …
Status and challenges
Current status All this is work in progress and will take time R&D prototypes (EuropeanaConnect) showing the challenges of gathering appropriate multilingual tools and dataEuropeanaConnect First tests of simple techniques in production portal: GeoNames (places) and GEMET (concepts) GeoNamesGEMET Encouraging, but illustrate issues with too naïve approaches (no NLP) and incomplete data Cheval Poison
Problems & requirements For providers & Europeana Continue work on metadata Benchmarking (cf. CHiC CLEF)CHiC lab Positioning as consumers and contributors of data (cf Asun’s slides) data.europeana.eu For language-intensive tools and resources Availability: open resources Interoperability Simplicity But not always! E.g., not only “first hit” translations Scale: scalability of tools, number and scope of datasets Many languages, some lesser-resourced (wrt. English)
Another illustration: VOICES project S o m e t h i n g e n t i r e l y d i f f e r e n t b u t n o t c o m p l e t e l y u n r e l a t e dVOICES Voice-based community-centric mobile services for social development Easing communication on agricultural trade Listing of products/prices via phone/radio Pilot in Mali Challenges Data-centric project, but language technology plays a crucial role Objects should be provided with textual and audio labels (text-to- speech system) in different languages Local languages: e.g., Bambara Lack of resource: need low-cost, easy-to-adapt solutions Victor de Boer, VU Amsterdam
Thank you Some slides based on Marlies Olensky and Juliane Stiller - Multilingual Web Workshop, June 11, 2012, Dublin