Survey – WG3 ENeL Automatic Knowledge Acquisition for Lexicography Carole Tiberius, Institute for Dutch Lexicology, Leiden, the Netherlands Kris Heylen,

Slides:



Advertisements
Similar presentations
European EAM related higher education in Europe: An overview Thomas Fischer & Urmila Jha-Thakur Presented in Seminar on Experiences in S Korea, Japan and.
Advertisements

ENHANCING ATTRACTIVENESS OF ENVIRONMENTAL ASSESSMENT AND MANAGEMENT HIGHER EDUCATION Seminar on Experiences in China and the EU Nankai University, Tianjin,
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.
English Lexicography.
The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva.
ENeL: European Network of e-Lexicography COST Action IS1305.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Multilingual eLearning in LANGuage Engineering. Project Overview  Project span: Oct 2004 – Oct 2007  Kick-off meeting Oct  Project goals:
References Kempen, Gerard & Harbusch, Karin (2002). Performance Grammar: A declarative definition. In: Nijholt, Anton, Theune, Mariët & Hondorp, Hendri.
Erasmus Thematic Network Sanne Hirs, Project coordinator Faculty of Law, Utrecht University.
Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Wien, 6th December 2002 THEIERE partners and EAEEIE members Club EEA Commissions for teaching & International Relationships TASK 1: European Curricula.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
QUALETRA FINAL CONFERENCE Sandrine PERALDI JUST/2011/JPEN/AG/2975 QUALETRA JUST/2011/JPEN/AG/2975 With financial support from the Criminal Justice Programme.
COST 356 EST - Towards the definition of a measurable environmentally sustainable transport CONTACTS Dr Robert Joumard, chairman, INRETS, tel
Claudia Marzi Institute for Computational Linguistics (ILC) National Research Council (CNR) - Italy.
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
Language resources, standardization and modern trends in NLP Simon Krek Jožef Stefan Institute, Artificial Intelligence Laboratory, Slovenia.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
European Life Sciences Infrastructure for Biological Information ELIXIR
Galina Bogdanova, Konstantin Rangochev, Desislava Paneva-Marinova, Nikolay Noev Institute of Mathematics and Informatics, Bulgarian Academy of Sciences.
Medical Device Localisation Michael Kemmann ADAPT Localization Services.
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
IATE EU tool for translation-oriented terminology work
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
JRC-Ispra, , Slide 1 Next Steps / Technical Details Bruno Pouliquen & Ralf Steinberger Addressing the Language Barrier Problem in the Enlarged.
The Open University UK in Europe Professor Alan Tait Pro Vice-Chancellor The Open University UK MESI Moscow, March
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Flexible Text Mining using Interactive Information Extraction David Milward
Introducing MorphoLogic to LIRICS Gábor Prószéky MorphoLogic Pázmány Péter Catholic University Faculty.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
CLARIN work packages. Conference Place yyyy-mm-dd
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Chapter 3 Monolingual Dictionaries II Arabic Dictionaries.
SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.
EIPA CAF Resource Centre CAF CAF activities – state of affairs Patrick Staes & Ann Stoffels EIPA CAF Resource Centre Berlin, 8-9 February 2007.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
ENeL WG3 meeting: Automatic Knowledge Acquisition for Lexicography Herstmonceux, August 2015 STARTS AT 2:30 PM.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
Communicative and Academic English for the EFL Professional.
Information Retrieval
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
Digital University of Pisa Alessandro Lenci CoLing Lab – Laboratorio di Linguistica Computazionale Università di Pisa Aix-Marseille Université.
1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.
13-Jul-07 State of the art of the ISCO-08 implementation.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
INTRODUCTION TO APPLIED LINGUISTICS
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
ENeL Training school 2016 Tools and methods for creating innovative e-dictionaries.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
European Network of e-Lexicography
Patrick Staes and Ann Stoffels
Introduction of KNS55 Platform
Heili Orav & Kadri Vider
Presentation transcript:

Survey – WG3 ENeL Automatic Knowledge Acquisition for Lexicography Carole Tiberius, Institute for Dutch Lexicology, Leiden, the Netherlands Kris Heylen, University of Leuven, Belgium Simon Krek, „Jožef Stefan“ Institute, Ljubljana, Slovenia

Purpose of the survey Create an inventory of different types of automatic knowledge acquisition which are currently used within the framework of lexicographical projects

Automatic Acquisition of Knowledge Knowledge (data) which is automatically obtained from corpora of authentic language use (both synchronic and diachronic); forms either the input for lexicographers (who further inspect and edit the data) or is included as is in the published dictionary (possibly marked as being knowledge which has been automatically derived from corpus data).

Types of Automatically Acquired Knowledge (Candidate) Lemma list Overall Lemma Frequency information Form variation (e.g. irregular morphology, orthographic variants) Example sentences (cf. Vienna COST workshop) Multiword expressions (i.e. sequences of words with some unpredictable properties such as "to count somebody in" or "to take a haircut", ranging from collocations and phrasal verbs, (pragmatic) frozen expressions (e.g. of course, good morning) to traditional idioms, proverbs etc.)

Types of Automatically Acquired Knowledge … Neologisms Definitions Translation Equivalents Knowledge Rich Contexts (i.e. in terminography, a sort of hybrid of a good example and a definition, illustrating the meaning characteristics of a term, but not being a formal definition.) Lexical-semantic relations (e.g. synonyms, antonyms, hypernyms) Word senses Grammatical patterns (e.g. word profiles, valency) Linguistic labels (domain/ region/ dialect/ register/ style/ time/ slang and jargon/ attitude/ offensive terms)

General Web address: Questions: 134 (variables: 134) Pages: 18 Completed: 45 Partially completed: 6 Total valid: 51 All units in database: 196 First entry: , Last entry:

Coverage

Positions lexicographer researcher software developer computational linguist nlp researcher terminologist (associate) professor project manager/director phd student

Automatic Knowledge Acquisition Q: Do you or your institution use a form of automatic knowledge acquisition within a lexicographic project(s)?: AnswersFrequency 1 (YES)36 2 (NO)14 Valid50

Other types of AKA Prioritizing lemmas (Denmark - Society for Danish Language and Literature) Word formation information ( France - Université de Franche-Comté, Besançon) Semantic relations (France - Université de Franche-Comté, Besançon) Termhood probability (France - Université de Franche-Comté, Besançon) Selectional preferences (Hungary - Research Institute for Linguistics of the Hungarian Academy of Sciences) Discourse markers (Denmark / France Aarhus University, Business and Social Sciences, Department of Business Communication; Université de Bourgogne, Maison des Sciences de l\'Homme)

Meeting Programme: herstmonceux-2015/

Q: Do you or your institution use automatic knowledge acquisition for generating a candidate lemma list? AnswersFrequency 1 (YES)23 2 (NO)9 Valid32

Q: Do you or your institution use automatic knowledge acquisition to extract frequency information, e.g. overall lemma frequency information? AnswersFrequency 1 (YES)23 2 (NO)9 Valid32

Q: Do you or your institution use automatic knowledge acquisition to extract information on form variation e.g. irregular morphology, orthographic variants? AnswersFrequency 1 (YES)11 2 (NO)21 Valid32

Q: Do you or your institution use automatic knowledge acquisition to extract example sentences (cf. Vienna workshop)? AnswersFrequency 1 (YES)18 2 (NO)13 Valid31

Q: Do you or your institution use automatic knowledge acquisition to extract multiword expressions ( i.e. sequences of words with some unpredictable properties such as "to count somebody in" or "to take a haircut", ranging from collocations and phrasal verbs, (pragmatic) frozen expressions (e.g. of course, good morning) to traditional idioms, proverbs etc.)? AnswersFrequency 1 (YES)14 2 (NO)15 Valid29

Q: Do you or your institution use automatic knowledge acquisition to extract neologisms? AnswersFrequency 1 (YES)10 2 (NO)19 Valid29

Q: Do you or your institution use automatic knowledge acquisition to extract definitions? AnswersFrequency 1 (YES)5 2 (NO)23 Valid28

Q: Do you or your institution use automatic knowledge acquisition to extract translation equivalents? AnswersFrequency 1 (YES)8 2 (NO)19 Valid27

Q: Do you or your institution use automatic knowledge acquisition to extract knowledge rich contexts (i.e. a sort of hybrid of a good dictionary example and a definition in the sense that is extracted from a corpus, illustrates the meaning of a term, but it is not a formal definition) ? AnswersFrequency 1 (YES)2 2 (NO)26 Valid28

Q: Do you or your institution use automatic knowledge acquisition to extract lexical-semantic relations (e.g. synonyms, antonyms, hypernyms) AnswersFrequency 1 (YES)7 2 (NO)21 Valid28

Q: Do you or your institution use automatic knowledge acquisition to extract word senses? AnswersFrequency 1 (YES)7 2 (NO)21 Valid28

Q: Do you or your institution use automatic knowledge acquisition to extract grammatical patterns (e.g. word profiles, valency) AnswersFrequency 1 (YES)16 2 (NO)12 Valid28

Q: Do you or your institution use automatic knowledge acquisition to extract linguistic labels (e.g. domain/region/dialect/register/style/time/slang and jargon/ attitude/ offensive terms)? AnswersFrequency 1 (YES)7 2 (NO)21 Valid28

Q: Is the automatically acquired knowledge directly integrated in the published dictionary without human intervention? Integrated without human intervention: Lemma lists Frequency information Example sentences Translation equivalents Lexical-semantic relations

Integrated with human intervention: Form variation MWE Neologisms Knowledge Rich Contexts Word senses Grammatical patterns Linguistic labels Q: Is the automatically acquired knowledge directly integrated in the published dictionary without human intervention?

Q: How do the lexicographers judge the quality of the automatically acquired knowledge? Lemma lists4 Frequency information4 Form variation3 Example sentences4 MWEs3 Neologisms3 Definitions3 Translation equivalents3 - 4 Knowledge Rich Contexts3 – 4 Lexical-semantic relations3 Word senses3 – 4 Grammatical patterns4 Linguistic labels3

Wishes/ Comments automatic extraction of contrastive data annotated syntactic and semantically There is a huge need for methods and tools for these tasks - if EU languages shall be supported with high quality dictionaries published by EU institutions or publishers - otherwise US IT giants will dominate the future. For publishers the rights are important, as the model is changing from licensing/royalty models to ownership models. But also for public institutions, that might publish for free, the ownership is an important issue. There will be a certain degree of skepticism about these methods and tools, and it will be hard to convince the community about the quality and ROI.... We use a LOT of knowledge acquisition, but it is not strictly applied to lexicography yet.

Wishes/ Comments My work focuses on definitions. I have developed systems for extracting encyclopedic definitions from encyclopedic text, also for extracting hypernyms from definitions, and for learning taxonomies from free text using the previous systems. Part of my work also focuses in harvesting semantic relations from the web, and disambiguating them where possible. For my PhD work, I would like to have a system that given a set of documents which belong to a certain domain, is able to identify candidate definitions and score them according to their relevance to the document in which they are included, the corpus to which document belongs, and finally the domain to which such corpus belong. Several researchers have applied types of automatic knowledge acquisition in their individual research, e.g. to generate candidate lemma lists (Bratanić, Ostroški Anić and Radišić 2010, Aviation English Terms and Collocations (An alphabetical checklist). Zagreb: Sveučilište u Zagrebu, Fakultet prometnih znanosti.), to extract terms and collocations or to extract frequency information (Stojanov and Vučić, Korpusnojezikoslovna obradba tekstova Sportskih novosti. N-gramsko modeliranje dohvaćanja podataka i vizualizacija. Filologija 59, ).

Wishes/ Comments We use Sketch Engine functions (Word List, Collocates, Frequency) to analyze concordances, e.g. in order to find form variations (irregular morphology, orthographic variants). We also used function Word Sketch for extraction of collocations. Domain sensitivity is a crucial lexicographical parameter and therefore, automated processes can't be developed and exploited in the same way as in lexicography for genreral purposes. The survey doesn't seem to include innovative aspects on the analysis and representation of specialised knowledge as such, so full automation has still a long way to go. Since we are working in the academic monolingual dictionary the level of the corpus AAK is for us quite satisfying. In the case of such dictionary it is always important to leave a space for a deeper semantic investigation. more for word sense disambiguation and definition extraction

AKA per institution

Basque country - Elhuyar Foundation Lemma list Frequency information Example sentences (experimental level) Multiword expressions Neologisms Translation equivalents Grammatical Patterns (experimental level) Elhuyar Hiztegiak ( Basque-Spanish dictionary ZTH-Dictionary of Science and Technology (zthiztegia.elhuyar.org) Laneki Hiztegia ( Automotive Dictionary ( (en, es and eu terms) Ihobe Hiztegia environmental dictionary (intranet) CAF railway dictionary (intranet) on-going projects: Osakidetza (Basque Health System); Social work (provincial governments of Araba, Bizkaia and Gipuzkoa)

Belgium - KU Leuven Lemma List Frequency information Corpus support to third party lexicographic publication on Belgian Dutch: "Typisch Vlaams Woorden en uitdrukkingen" [Typical Flemish words and expressions] Translation equivalents TermWise: Resources for Specialized Language Use

Bulgaria Institute for Bulgarian Language Lemma List Frequency Information Neologisms Dictionary of Bulgarian Language Lexical-semantic relations Bulgarian WordNet; Dictionary of Bulgarian Language

Czech republic Masaryk University, Faculty of Arts Lemma List Low-cost ontology development, paper -> Word senses Currently, in pilot - we are trying to create a new semantic network based on combination of manually annotated data which are confirmed automatically by corpus. This testing process can be also used for extending dictionary.

Czech republic NLP Centre, Faculty of Informatics, Masaryk University Lemma List Thesaurus for Geography Domain Frequency Information DEB dictionary browser Example sentences Czech Sign Language dictionary Grammatical patterns Verbalex, verb valency lexicon

Czech republic- Lexical Computing Lemma List Frequency Information Form variation Example sentences Multiword expressions Neologisms DIACRAN Definitions Experimental Translation equivalents Lexical-semantic relations Distributional thesaurus in SketchEngine Word senses Clustering of word sketches Grammatical patterns Linguistic labels (deliveries to publishers, IT companies)

Denmark Society for Danish Language and Literature Other We use an experimental mix of many of the methods mentioned above, to check existing dictionary entries and to select and prioritize new ones. We do not use these methods consequently and thoroughly, as suggested with this survey; this does not fit with our dictionary-writing process.

Estonia Institute of the Estonian Language Lemma List Frequency Information Example sentences Estonian Collocations Dictionary

France Université de Franche-Comté, Besançon Lemma List Frequency Information Form Variation Definitions Lexical-semantic relations Grammatical patterns Sensunique project (already finished)

France Université de Franche-Comté, Besançon Other The Sensunique platform extracts or calculates from the corpora information about : a) Functional Category of composed candidate terms (eg. Noun for stem cells); b) Head and Expansion of composed candidate term (e.g. cells is a Head and stem is an Expansion of stem cells) ; c) different associations between candidate terms (e.g. inclusion : cells is totally included in stem cells; e.g. partial association : stem cells is partially associated with dendritic cells) ; d) information relative to termhood probability. The information extracted from corpora is enriched with the information retrieved from the selected external resources (e.g. existing terminology databases), such as definitions, variants, semantic classes.

Germany Institut für Deutsche Sprache, Abteilung Lexik Lemma List Frequency Information Example sentences elexiko Multiword expressions Usuelle Wortverbindungen

Greece Institute for Language and Speech Processing, Athena RIC Lemma List Frequency Information Multiword expressions Polytropon Project: Conceptual Dictionary of Modern Greek. (Under development) Fotopoulou, A. and Giouli, V. From \"Ekfrasis\" to Polytropon. Towards a dictionary of the Modern Greek Language Conceptually organised. Paper accepted at the International Conference in Greek Linguistics (in Greek). The Greek High School Dictionary Giouli, V., Gavrilidou, M., Lambropoulou, P The Greek High School Dictionary: Description and issues. In Proceedings of the XIII Euralex International Congress (EURALEX 2008). July 2008, Barcelona, Spain. eMiLang Project Vakalopoulou, A., Giouli, V., Giagkou, M., and Efthimiou, E Online Dictionaries for immigrants in Greece: Overcoming the Communication Barriers. In Proceedings of the 2nd Conference “Electronic Lexicography in the 21st century: new Applications for New users” (eLEX2011), Bled, Slovenia, November Translation equivalents INTERA Project Gavrilidou, M., Labropoulou, P., Desipri, E., Giouli, V., Antonopoulos, V. & Piperidis, S. (2004). Building parallel corpora for {eContent} professionals. In COLING Geneva.

Hungary Research Institute for Linguistics of the Hungarian Academy of Sciences Lemma List Frequency Information Translation equivalents EFNILEX Lexicographers do not yet directly use the results, which are at the research stage yet. Multiword expressions Grammatical patterns Sass, Bálint and Pajzs, Júlia. FDVC -- Creating a Corpus-driven Frequency Dictionary of Verb Phrase Constructions for Hungarian. In: Sylviane Granger, Magali Paquot (Eds.) eLexicography in the 21st century: New challenges, new applications. Proceedings of eLex 2009, Louvain-la-Neuve, October Cahiers du CENTAL 7. Presses universitaires de Louvain, 2010., p Lexicographers manually added corpus based examples to the verb phrase constructions. Other Extending Hungarian WordNet With Selectional Preference Relations

Italy European Academy of Bolzano/Bozen (EURAC) Word senses For the ELDIT project ( in an experimental study

Italy University of Bologna, University of Pisa Multiword expressions Grammatical patterns CombiNet - Word Combinations in Italian ( We use the broad term \"Word combinations\" because we target both MWEs (e.g. phrasal lexemes, idioms, collocations) and more abstract combinatorial information (e.g. argument structure patterns, subcategorization frames, and selectional preferences).

Netherlands Instituut voor Nederlandse Lexicologie Lemma List Frequency Information Example sentences (work in progress) Neologisms (work in progress) Grammatical patterns (work in progress) Linguistic labels (work in progress) Algemeen Nederlands Woordenboek (ANW) Schoonheim, Tanneke en Rob Tempelaars (2014), ‘Algemeen Nederlands Woordenboek (ANW), A Dictionary of Contemporary Dutch’. In: ANW-2014.pdf Schoonheim, Tanneke and Rob Tempelaars (2010), \'Dutch Lexicography in Progress, The Algemeen Nederlands Woordenboek (ANW)\'. In: Anne Dykstra and Tanneke Schoonheim (eds.), Proceedings of the XIV Euralex International Congress. Ljouwert, Fryske Akademy/Afûk, 179 (abstract), de volledige tekst op de bijgevoegde cd-rom. TEMPELAARS_Dutch Lexicography in Progress_the Algemeen Nederlands Woordenboek_ANW.pdf

Poland Institute of the Polish Language PAS AKA types not further specified in survey

Poland Institute of the Polish Language at the Polish Academy of Sciences (IJP PAN) Frequency Information Form variation Example sentences Neologisms Word senses Grammatical patterns Linguistic labels Great Dictionary of Polish, Multiword expressions Great Dictionary of Polish, (idioms, proverbs, scientific multiword terms, other discontinuous textual units - so called functional units).

Portugal Centro de Linguística da Universidade de Lisboa Lemma List Reference Corpus of Contemporary Portuguese reference-corpus-of-contemporary-portuguese-crpc Frequency Information Multifunctional computational lexicon of contemporary portuguese contemporary-portuguese Example sentences Dicionário da Academia das Ciências de Lisboa Multiword expressions Word combinations in the Portuguese language teams/187-combina-pt-word-combinations-in-portuguese-languagehttp:// teams/187-combina-pt-word-combinations-in-portuguese-language Lexical-semantic relations

Portugal Centro de Linguística da Universidade Nova de Lisboa Faculdade de Ciências Sociais e Húmanas Lemma List Frequency Information Form variation Example sentences Multiword expressions Neologisms Translation equivalents Knowledge rich contexts Lexical-semantic relations Word senses Grammatical patterns For LSP

Slovakia Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences Lemma List Form variation Handbook of Slovak Nouns Example sentences Handbook of Slovak Nouns Parallel Corpora Phrases (en-sk, cs-sk, bg-sk) Frequency Information Handbook of Slovak Nouns Dictionary of Contemporary Slovak Slovak-Czech Dictionary Multiword expressions Dictionary of Slovak Collocations Translation equivalents phrases from parallel corpora (en-sk,cs-sk,bg-sk) Grammatical patterns Slovak Valency Dictionary (internal database, no URL yet)

Slovenia University of Ljubljana, Faculty of Arts; Trojina, Institute for Applied Slovene Studies; Jožef Stefan Institute Lemma List Frequency Information Multiword expressions Grammatical patterns Linguistic labels Communication in Slovene: Slovene Lexical Database: Sloleks - morphological lexicon: Termis: en.html Form Variation Communication in Slovene: Ortography Guide:

Spain Universidade da Coruña and Real Academia Galega Example sentences Neologisms Definitions Lexical-semantic relations Word senses Linguistic labels Spanish-Galician Dictionary of the Royal Galician Academy No publications on the automatic acquisition of knowledge

Spain University Institute for Applied Linguistics (Pompeu Fabra University) Lemma list Frequency information Example sentences Grammatical patterns Terminus 2.0, a web application for corpus and terminology managment Neologisms Buscaneo

Sweden University of Gothenburg, Dpt. of Swedish, Språkbanken Lemma lists 1. Kelly ( 2. Academic Wordlist (AO, 3. SVALex (ongoing, target Swedish as a second language lexicon); Form variation 1) Diabase ( 2) Mathir ( MWEs Constructicon ( Example sentences (HitEx ( Definitions (Semantic Interoperability and Data Mining in Biomedicine ( Lexical-semantic relations (SweFN++ ( Word senses (Distributional Methods to Represent the Meaning of Frames and Constructions ( Grammatical patterns (Culturomics ( Linguistic labels (1. A Swedish vocation list 2. ongoing PhD thesis on automatic readability classification of texts and sentences 3. Semantics in Storytelling in Swedish Fiction (list of relations, named entity recognition, aliases)

Switzerland École Polytechnique Fédérale de Lausanne Example sentences: Kamusi Global Online Living Dictionary

Denmark / France Aarhus University, Business and Social Sciences, Department of Business Communication; Université de Bourgogne, Maison des Sciences de l'Homme Lemma list Form variation Example sentences Multiword expressions Neologisms Knowledge rich contexts Oenolex, wine dictionary Other Discourse markers acquisition and discourse interaction markers