WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.

Slides:



Advertisements
Similar presentations
The Open Language Archives Community: Building a worldwide library of digital language resources Gary Simons, SIL International LSA Tutorial on Archiving.
Advertisements

A centralized approach to language resources Piek Vossen S&T Forum on Multilingualism, Luxembourg, June 6th 2005.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.
MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora Tomaž Erjavec Department of Knowledge Technologies Jožef.
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Comparing L1 and L2 reading
Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
GSK: Development and Distribution of Resources Hitoshi ISAHARA GSK: Gengo Shigen Kyokai (Language Resource Association) National Institute of Information.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
KOS and the Conduct of Science© Straits Knowledge 2011 Knowledge Organisation Systems as Enablers to the Conduct of Science Patrick Lambe.
Project Proposal.
WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.
Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva.
ENeL: European Network of e-Lexicography COST Action IS1305.
Package for Learning Fundamental Knowledge on Geospatial Technology Morishige Ota Fellow, Kokusai Kogyo Co., Ltd. Guest Researcher, The University of Tokyo.
Multilingual multimedia thesaurus for conservation and restoration collaborative networked model of construction Lucijana Leoni University of Dubrovnik.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
New Slovene corpora within the »Communication in Slovene« project Nataša Logar BergincSimon Krek University of LjubljanaAmebis, Kamnik Faculty of Social.
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
Provisional draft 1 ICT Work Programme Challenge 2 Cognition, Interaction, Robotics NCP meeting 19 October 2006, Brussels Colette Maloney, PhD.
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
Claudia Marzi Institute for Computational Linguistics (ILC) National Research Council (CNR) - Italy.
Language resources, standardization and modern trends in NLP Simon Krek Jožef Stefan Institute, Artificial Intelligence Laboratory, Slovenia.
1 Use of electronic information resources among the Croatian scientists in the field of social sciences in a pre-digital library environment: obstacles.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.
Location of JSI EuropeSlovenia Micro-location of JSI Department of Knowledge Technologies Jožef Stefan Institute Ljubljana.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
Linguistics & AI1 Linguistics and Artificial Intelligence Linguistics and Artificial Intelligence Frank Van Eynde Center for Computational Linguistics.
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Competencies and capacities for the digital library David Bawden Polona Vilar Vlasta Zabukovec.
Changing the way the world learns English 1. Intellectual leadership A few years from now, anyone wanting to know about teaching or learning English.
CLARIN work packages. Conference Place yyyy-mm-dd
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Kirrkirr: A flexible and approachable software interface to indigenous dictionaries Christopher Manning & Kristen Parton Computer Science and Linguistics,
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
A Survey of English Lexicology
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
ENeL WG3 meeting: Automatic Knowledge Acquisition for Lexicography Herstmonceux, August 2015 STARTS AT 2:30 PM.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
1 Centre for Intelligent Systems and their Applications Division of Informatics, University of Edinburgh Draft for AKT July Workshop Jessica Chen-Burger.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
SEESCOASEESCOA SEESCOA Meeting Activities of LUC 9 May 2003.
GDEX: Automatically finding good dictionary examples in a corpus Auckland 2012Kilgarriff: GDEX1.
INTRODUCTION TO APPLIED LINGUISTICS
GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.
MICHAEL Culture Association WP4 Integration of existing data structure into Europeana ATHENA, WP4 Working group technical meeting Konstanz, 7th of May.
Coordination and Policy Development in Preparation for a European Open Biodiversity Knowledge Management System Supported by the European Commission through.
ENeL Training school 2016 Tools and methods for creating innovative e-dictionaries.
Lexicons, Concept Networks, and Ontologies
WG4 report: Lexicography and Lexicology from a Pan-European Perspective Eveline Wandl-Vogt, Krzysztof Nowak.
Embeddings can be viewed as a reaction to lexicography (e. g
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin

European Network of e-Lexicography
Darja Fišer CLARIN ERIC Director of User Involvement
ENETCOLLECT - WG2 Simon Krek.
Presentation transcript:

WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands

Programme 11:15-11:35INFO & PRACTICALITIES 11:35-12:15WORK PLAN & TIME-TABLE 12:15-12:40TASKS FOR BOLZANO 12:40-12:50THE LORENTZ CENTER 12:50-13:00AOB AND CLOSING

PRACTICALITIES short introduction and presentation of the chair and vice-chair overview of countries (and dictionaries) represented in WG3 topics - what do we mean by an innovative e- dictionary in WG3? sharing tasks e-publications

WG3 chair – Simon Krek employment DZS Publishing House, dictionary editor Faculty of Arts, Uni-Ljubljana Amebis, d.o.o., Kamnik 2007-Jožef Stefan Institute 2013-Faculty of Social Sciences, Uni-Ljubljana projects The Oxford®-DZS Comprehensive English- Slovenian Dictionary, editor-in-chief FIDA Corpus, coordinator FidaPLUS Corpus, coordinator Communication in Slovene, coordinatior

Communication in Slovene project ( )

WG3 vice-chair – Carole Tiberius 1992degree in translation (Russian-French), Antwerp, BE 1995MA in computational linguistics, Nijmegen University, NL 2001PhD in Multilingual Lexical Knowledge Representation, Brighton University, UK Research fellow Surrey Morphology Group, Surrey University, UK Computational linguist (ANW, Taalportaal) Instituut voor Nederlandse Lexicologie (INL)

Working group 3 WG3 Innovative e-dictionaries: This WG will coordinate the development of born-digital dictionaries, focusing on the latest developments in e-lexicography and the interface between lexicography and computational linguistics.

General background (c) In the past few years, innovative electronic dictionaries have been created that no longer resemble traditional paper dictionaries but try to fully exploit the new possibilities of the digital medium.

General background ctd. Though serious attempts have already been made at embedding electronic lexicography into a theoretical framework, a new research paradigm and common standards for electronic lexicography are still lacking. And so are common standards and cooperation for the interlinking of the content of digitized dictionaries and innovative e- dictionaries.

Scientific focus (b) mapping current and possible future trends for the creation of born-digital dictionaries, focusing on the latest developments in e-lexicography and the interface between lexicography and computational linguistics (d) exploring the possibilities of extensive linking of dictionary content from different European languages

Other WGs In this WG, requirements from WG1 dealing with linking information between dictionaries and with the user interface will be taken into account. Interaction will also take place with WG4 to be able to take into account the new insights into the lexicographical description of the vocabularies of the different European languages.

WORK PLAN & TIME-TABLE topics (from the original proposal) meetings (6) – results – outputs training school (year 3)

Topics 1.description of the workflow for corpus-based lexicography 2.overview of existing software needed in this workflow 3.Dictionary Writing Systems (and Corpus Query Systems) 4.Analysis of the possible impact of automatic acquisition of lexical data (distributional thesauri etc.) 5.Analysis of the interface between dictionaries and computational lexica (cf. wordnets) and syntactically and semantically annotated corpora (Framenet, Semcor, Senseval) 6.Investigation of possible use of dictionary content for computational linguistic applications

July 2014 Workflow of corpus-based lexicography; Software to support lexicographical workflow (DWS and CQS, also backup, version control etc.) responsibility: – Carole Tiberius result: – better understanding of the workflow (including an overview of software that is necessary for a smooth workflow) which results in better planning of future projects

January 2015 Software to support lexicographical workflow: DWS and CQS responsibility: – Simon Krek result: – description of DWSs and in particular the newly developed (web) applications for querying corpora

July 2015 Automatic acquisition of lexical data and its impact (what works, what doesn’t work – example sentences, collocations, neologisms, definitions, word senses) responsibility: – Carole Tiberius result: – exploring the possibility of automation of particular tasks within corpus-based lexicography as support to lexicographers / lexicographical workflow

January 2016 Between Corpora and Dictionaries – analysis of the interface between dictionaries and computational lexica and corpora responsibility: – Simon Krek result: – exploring the possibiltiy of collecting lexically and semantically organized data in a completely automated process where the data could be used for immediate visualization for human users interested in lexical behaviour of words

July 2016 The use of lexicographical data in computational linguistics – investigation of possible use of dictionary content for computational linguistic applications responsibility: ? Result: – better understanding of the need of computational linguistic community for lexicographically organized data and vice versa

Other topics presentation, layout, design issues of e- dictionaries as well as access routes? which other topics do we miss? is the proposed order of the topics OK?