European Lexicographic Infrastructure

Slides:

Advertisements

Similar presentations

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.

Advertisements

Project Overview Slide 2 of 15 Overview Project in a Nutshell ◦Motivation ◦Aims and Objectives ◦Expected Outcomes PlanetData Programs Join PlanetData.

Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.

FP7: Cross-border Research NCP Networks: Collaboration Platforms providing for full and easy access to the tools for successful participation ResPotNet:

Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.

Cultural Content and Digital Heritage Bernard Smith European Commission INFSO/D2.

WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.

ENeL: European Network of e-Lexicography COST Action IS1305.

Steven KrauwerCLARIN-NL Launch CLARIN-EU: Where do we stand? Steven Krauwer Utrecht institute of Linguistics UiL OTS CLARIN-EU Coordinator.

Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe A.Gómez-Pérez (UPM) Project Coordinator.

National Science Foundation: Transforming Undergraduate Education in Science, Technology, Engineering, and Mathematics (TUES)

IST and Tourism cross fertilisation Information Society Technologies for Tourism Brussels, 9th July 2001.

WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.

GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.

Language resources, standardization and modern trends in NLP Simon Krek Jožef Stefan Institute, Artificial Intelligence Laboratory, Slovenia.

Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.

Exploring Europe's Television Heritage in Changing Contexts Connected to: Funded by the European Commission within the eContentplus programme

European Life Sciences Infrastructure for Biological Information ELIXIR

Institute of Informatics and Telecommunications – NCSR “Demokritos” Bootstrapping ontology evolution with multimedia information extraction C.D. Spyropoulos,

1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.

The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.

EStream – Best Practice in the Use of Streaming Media © A. Knierzinger, C. Weigner Increasing the use of Streaming technology in school education in Europe.

C ross-European data sharing made easy EDAF Luxembourg.

1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands.

 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Semantic Web services Interoperability for Geospatial decision.

Results of the HPC in Europe Taskforce (HET) e-IRG Workshop Kimmo Koski CSC – The Finnish IT Center for Science April 19 th, 2007.

VIRTUAL INFORMATION AND KNOWLEDGE ENVIRONMENT FRAMEWORK IP-FP

National Library of Estonia in the TEL-ME-MOR project IST4Balt workshop in Estonia June 2006 Baltic ICT Community.

The European Localisation Exchange Centre Karl Kelly Event Coordinator LRC electonline.org.

DataTAG Research and Technological Development for a Transatlantic Grid Abstract Several major international Grid development projects are underway at.

CLARIN work packages. Conference Place yyyy-mm-dd

1 Direction scientifique Networks of Excellence objectives  Reinforce or strengthen scientific and technological excellence on a given research topic.

ENeL WG3 meeting: Automatic Knowledge Acquisition for Lexicography Herstmonceux, August 2015 STARTS AT 2:30 PM.

Second International Seville Seminar on Future-Oriented Technology Analysis (FTA): Impacts on policy and decision making 28th- 29th September 2006 The.

WSMO in Knowledge Web 2nd SDK cluster f2f meeting Rubén Lara Digital Enterprise.

DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.

9th Meeting of the Steering Committee 1 / 26 Lisbon, 10 November 2009 Climate Impact Research (and Response) Coordination for a Larger Europe CIRCLE (2)

1 Open Discovery Space Overview Argiris Tzikopoulos, Ellinogermaniki Agogi Open Discovery Space [CIP-ICT-PSP ][elearning] A socially-powered and.

COST Action and European GBIF Nodes Anne-Sophie Archambeau.

CLARIN ERIC Franciska de Jong Oxford April 2016

Ellinogermaniki Agogi Research and Development Department DigiSkills Network DigiSkills: Network for the enhancement of Digital competence skills.

ENeL Training school 2016 Tools and methods for creating innovative e-dictionaries.

Learning with Crowdsourcing Techniques (enetCollect)

eContentplus 2008 Work Programme

From CLEF to TrebleCLEF Promoting Technology Transfer

European Topic Centre on Sustainable Consumption and Production (ETC/SCP) Lars Fogh Mortensen, Head of Group Sustainable Consumption and Production.

ICT22 – 2016: Technologies for Learning and Skills ICT24 – 2016: Gaming and gamification Francesca Borrelli DG CONNECT, European Commission BRUXELLES.

GISELA & CHAIN Workshop Digital Cultural Heritage Network

B. Piringer R. Barbera, A. Calanducci, C. Carrubba, D. Davidovic, G

Information Day on “Search Engines for Audio-Visual Content”

ICT PSP 2011, 5th call, Pilot Type B, Objective: 2.4 eLearning

Towards multilingual cultural lexicography the russian dialect dictionary as an interdisciplinary knowledge resource eveline wandl-vogt1, kira kovalenko2,

Antonella Fresa Technical Coordinator

European Network of e-Lexicography

LINX project Total budget: EUR 11 MIO Industry led priorities

Knowledge Translation

Darja Fišer CLARIN ERIC Director of User Involvement

#MINKTyourmind EU-CAll AMIF-2017-AG INTE

Enterprise and Industry Directorate General

eContentplus Programme (2005 – 2008)

Common Solutions to Common Problems

DARIAH General Meeting Ljubljana, 2015 April 22nd

GISELA & CHAIN Workshop Digital Cultural Heritage Network

Bridging Open Scholarship with Humanities

Web archives as a research subject

New Platform to Support Digital Humanities in the Czech Republic

Mr Sanopoulos Dimitrios

DARIAH – Competence Centre in a nutshell

European collaboration for knowledge exchange & Innovation

Presentation transcript:

European Lexicographic Infrastructure ELEXIS European Lexicographic Infrastructure Call: Integrating Activities for Starting Communities INFRAIA-02-2017

Call: INFRAIA-02-2017 Types of action: RIA Research and Innovation action Deadline Model: two-stage 1st stage Deadline: 30 March 2016 17:00:00 2nd stage Deadline: 29 March 2017 Work Programme Part: European Research Infrastructures (including e-Infrastructures) Duration: 4 years Budget: 5M EUR Results: 3 months (1st stage)

Abstract – access, bridge the gap The project proposes to integrate, extend and harmonise national and regional efforts in the field of lexicography, both modern and historical, with the goal of creating a sustainable infrastructure which will enable efficient access to high quality lexical data in the digital age, and bridge the gap between more advanced and lesser-resourced scholarly communities working on lexicographic resources. The need for such an infrastructure has clearly emerged out of the lexicographic community within the European Network of e- Lexicography COST Action which will end in 2017.

NLP, LOD, SW Current lexicographic resources, both modern and historical, have different levels of structuring and are not equally suitable for application in other fields, e.g. Natural Language Processing. The project will develop strategies, tools and standards for extracting, structuring and linking lexicographic resources to unlock their full potential for Linked Open Data and the Semantic Web, as well as in the context of digital humanities. The project will help researchers create, access, share, link, analyse, and interpret heterogeneous lexicographic data across national borders, paving the way for ambitious, transnational, data-driven advancements in the field, while significantly reducing a duplication of effort across disciplinary boundaries.

Consortium, CLARIN, DARIAH ELEXIS will be carried out by a balanced consortium with distributed geographical origins. It is composed of content-holding institutions and researchers with complementary backgrounds - lexicography, digital humanities, language technology and standardisation - a crucial feature required to address the multi-disciplinary objectives of the project. In cooperation with CLARIN and DARIAH, it will focus on defining and providing common interoperability standards, workflows, conceptual models and data services as well as training and education activities focusing on user needs and cross-disciplinary fertilisations.

Impact: Efficient access and user needs lexicographic communities working on scholarly and language standardisation dictionaries will gain access to modern and retrodigitised historical lexicographic data, large amounts of linked and integrated semantic data and extracted structured data from text corpora and multimodal resources, online training materials in DARIAH services, networking events (meetings, conferences, workshops). computational linguistics and language resources communities will gain access to currently inaccessible data from quality lexicographic resources and interlinked semantic data, as well as extracted data from corpora and multimodal resources; digital humanities communities will gain simplified and efficient access to modern and historical lexicographic resources as cultural and historical artefacts, supporting research in a wide area of humanities disciplines such as history, religion, gender studies, literature and education.

Impact: Inter-infrastructure synergies and optimisation currently isolated European language infrastructures working on lexical description of individual languages in national language institutes and standardisation bodies will be joined in one pan- European infrastructure; close links and synergies will be established between CLARIN and DARIAH, with ELEXIS working on top of existing services provided by both as a new user community.

Impact: Innovation and industry industrial partners in ELEXIS will be able to take the role of intermediaries between research and industry in language technology and language learning, as well as lexicography and lexical content publishing in general. Interest of industry is visible from participating partners and from the letter of interest by an important stakeholder in the field; information from quality lexicographic resources and interlinked semantic data will be opened up and made available for use in commercial scenarios, based on ELEXIS work on IPR issues currently hindering the accessibility of the data; lexicographic data will be evaluated by industry-supported data seal of approval.

Impact: Research and education online training courses on innovative e-lexicography with suggested ECTS produced by education partners (from universities) will be incorporated into existing curricula, language teaching and language learning communities will be able to develop and use new improved training materials, based on the (open) access to lexica interlinked on a large scale.

Impact: Cross-disciplinary fertilisations, academia and industry both computational linguistics and lexicography will be able to achieve a higher level of language description and text processing in a virtuous cycle of cross-disciplinary exchange of knowledge and data; research or study of lexica in linguistic studies and related disciplines will be enabled by massive interlinking of previously isolated lexicographic resources, which can lead to new discoveries, particularly in the semantic domain; in humanities disciplines, such as history, religion, gender studies, literature and education, new resources and services can be used for cross-lingual studies, based on interlinked and integrated semantic data; artificial intelligence systems will be able to make use of lexicographic data in repositories, interlinked semantic data and extracted data from multilingual and multimodal resources.

Impact: Integration of knowledge-based resources 1 previously non-integrated modern and historical lexicographic resources available as isolated incompatible data will be linked, integrated and enriched on different levels. A scalable, multilingual and multifunctional, language resource will be created by: ➔ linking resources: this means providing links between different elements of dictionary entries (lemmas/headwords, senses, definitions, multi-word expressions, etymologies, etc.) enabling any dictionary (element) to be linked with all other dictionaries (or dictionary elements). Result: a growing network of existing dictionaries linked across common concepts via a huge (multilingual) index.

Impact: Integration of knowledge-based resources 2 ➔ integrating resources: this means taking information from individual resources and putting them together in a new resource / aligning them to create a combined resource. Result: any combination of existing (linked) resources resulting in a new resource available for immediate use or as a starting point for creating a novel individual lexicographic resource. Example: http://www.dwds.de/ (cf. Geyken, Vossen) - DWDSWB, WDG and EtymWB are completely aligned and the alignment is actively exploited on the project’s website to achieve a synchronised display of equivalent lexical entries. This makes it possible to use EtymWB as an etymological extension to the synchronous view of the present-day dictionaries.

Impact: Integration of knowledge-based resources 3 ➔ enriching resources with multimodal data (image, sound, video), and unstructured text (corpora, news feeds, social media etc.) Result: a portal with cross-lingual, cross-media information on word usage. Example: X-LIKE (http://www.xlike.org/, http://eventregistry.org/) and X-LIME (http://xlime.eu/) projects working on extracting knowledge from different media channels and languages and relating it to cross- lingual, cross-media knowledge bases. In case of ELEXIS, “knowledge” is focused on lexicographic description.

Impact: Integration of knowledge-based resources 4 ultimate goal: a universal (integrated and enriched) registry/network of semantic relations used as a semantic intermediary language for global knowledge exchange, focused on difficult polysemous vocabulary (single-word and multi-word), modern and historical; the realisation of a universal lexicographic metastructure; a matrix dictionary spanning across languages and time.

Virtuous cycle of e-lexicography

Consortium 1 Providers of lexicographic data and expertise: Institute for Dutch Language, Leiden, Netherlands (Carole Tiberius) Danish Society for Language and Literature, Copenhagen, Denmark (Lars Trap- Jensen) K Dictionaries, Tel Aviv, Israel (Ilan Kernerman) Providers of lexicographic data and computational linguistics expertise: Institute for Bulgarian Language “Prof. Lyubomir Andreychin”, Sofia, Bulgaria (Svetla Koeva, Diana Blagoeva) Research Institute for Linguistics, Hungarian Academy of Sciences, Budapest, Hungary (Tamas Varadi) Institute for Computational Linguistics “A. Zampolli”, Pisa, Italy (Monica Monachini)

Consortium 2 Standardisation partners: Faculty of Social Sciences and Humanities, Universidade Nova de Lisboa, Lisbon, Portugal (Rute Costa) Digital humanities, training and outreach partners: Austrian Academy of Sciences, Austrian Centre for Digital Humanities, Vienna, Austria (Eveline Wandl-Vogt) Belgrade Center for Digital Humanities, Belgrade, Serbia (Toma Tasovac)

Consortium 3 Technology partners: Artificial Intelligence Laboratory, Jožef Stefan Institute, Ljubljana, Slovenia (Simon Krek, Marko Grobelnik, Tomaž Erjavec, Iztok Kosem), leading partner Sapienza University of Rome, Italy (Roberto Navigli) Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland (John McCrae, Paul Buitelaar) University of Saarland, Saarbrücken, Germany (Thierry Declerck) Lexical Computing CZ s.r.o., Brno, Czech Republic (Miloš Jakubiček)

Questions simon.krek@ijs.si simon.krek@guest.arnes.si