DuELME: database of multiword expressions (MWE)

Slides:



Advertisements
Similar presentations
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
Advertisements

Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
Standards in the Education Sector Content Standards Performance Standards Data Standards Interoperability Standards.
Database management system (DBMS)  a DBMS allows users and other software to store and retrieve data in a structured way  controls the organization,
MP IP Strategy public Stateye Training (Getting Started) Please enable author’s notes for a textual description of the slides. A audio file.
Psycholinguistic what is psycholinguistic? 1-pyscholinguistic is the study of the cognitive process of language acquisition and use. 2-The scope of psycholinguistic.
Tableau Lab 2 Calculations and Parameters. Data Set The fundraising data set uses a JOIN to combine two worksheets - Funds and Pledges - from one source.
Nov Copyright Galdos Systems Inc. November 2001 Geography Markup Language Enabling the Geo-spatial Web.
XMELLT Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.
Type shifting and coercion Henriëtte de Swart November 2010.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Automation Testing- QTP Rajesh Charles Batch No: Date: jan
Subcommittee 3D DATA SETS FOR LIBRARIES. SC 3D Exchange of dictionary data Cape Town, (Cape Town/Radley)3 Donald Radley Chairman, SC3D.
A stable interface to read and write IAEA phase-space files in Geant4 M. A. Cortés-Giraldo 1, R. Capote 2, J. M. Quesada 1 1 Dep. Física Atómica, Molecular.
CLARIN web services and workflow Marc Kemps-Snijders.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
Michalis Vafopoulos NTUA, GFOSS & The transformers GREEN CITY HACKATHON.
Workshop – 10, December 2014, Berlin ICCS / NTUA Greece Efthymios Chondrogiannis An Intelligent Ontology Alignment Tool Dealing with Complicated Mismatches.
From Multi-Domain Statistical Data to Complex Decisions and Actions: A Linked Data Based Approach Marta Sabou, Irem Önder, Adrian M.P. Brasoveanu.
CLARIN-NL Call 3 ISOcat follow-up 10/10/20121CLARIN-NL ISOcat Call 3 follow-up.
1 Define a model 2 Populate the lexicon. Core Model.
LIRICS Mid-term Review 1 LIRICS WP2 – NLP Lexica Monica Monachini CNR-ILC - Pisa 23rd May 2006.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.
CLARIN-NL Call 4 ISOcat follow-up 2/10/20131CLARIN-NL Call 4 ISOcat follow-up.
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
323 Morphology The Structure of Words 3. Lexicon and Rules 3.1 Productivity and the Lexicon The lexicon is in theory infinite, but in practice it is limited.
LEXUS a flexible web based lexicon tool LEXUS a flexible web based lexicon tool, august 21 th, 2005 Marc Kemps-Snijders Peter Wittenburg
ISO-PWI Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.
CLARIN-NL Requirements and Desiderata Jan Odijk CLARIN-NL Call 3 Info-session Utrecht, 25 Aug 2011.
MadX proposals R. De Maria, 29/8/2014. Makethin and thick dipoles Issues: Dipole edge elements cannot be used together thick dipole elements unless one.
ID Mapping to accessions from different databases. COST Functional Modeling Workshop April, Helsinki.
Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, Sep
1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
PARSEME Alpino MWE Encoding Jan Odijk PARSEME Meeting Iasi,
Developing OLIF, Version 2 Susan M. McCormick Christian Lieske OLIF2 Consortium SAP/Walldorf, Germany.
Patterns in caBIG Baris E. Suzek 12/21/2009. What is a Pattern? Design pattern “A general reusable solution to a commonly occurring problem in software.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Requirement Engineering with URN: Integrating Goals and Scenarios Jean-François Roy Thesis Defense February 16, 2007.
CERIF Interoperability and LOD Miguel-Angel Sicilia TG LOD leader, euroCRIS.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
L04 Source Code Review & Modification Options to Connect to AMS Data Hub (Xuesong Zhou) 1.
Metadata in the IERS Data and Information System Daniela Thaller, Wolfgang R. Dick, Allison Craddock.
The Data Consumer's Checklist Ulrich Atz · Open Data
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Relations between Data Categories
A tool for automated extraction of multi-word expressions
JavaScript Syntax and Semantics
How Can I Download My Transactions Directly Into Quicken
Linguistic Linked Open Data
عمل الطالبة : هايدى محمد عبد المنعم حسين
آشنايی با اصول و پايه های يک آزمايش
The system S.INTE.S.I.S.-Establishments
ID Mapping tools: Converting Accessions between Databases
دانشگاه شهیدرجایی تهران
تعهدات مشتری در کنوانسیون بیع بین المللی
Kristina Dourmashkin Eurostat Unit E4
Exporting Data Using H2 Central Office Query
Kristina Dourmashkin Eurostat Unit E4
ISOCAT ISOCAT Problems
بسمه تعالی کارگاه ارزشیابی پیشرفت تحصیلی
Jan Odijk LREC Miyazaki
ВОМР Подмярка 19.2 Възможности за финансиране
Споразумение за партньорство
Smart Integration Express
PX API 2.0 Petros Likidis.
Search in Token-annotated Corpora Search in Treebanks
Presentation transcript:

LLOD Use Case: MWE lexicon Jan Odijk CLARIAH-CORE LD4LR Workshop Utrecht, 2017-02-06/07

DuELME: database of multiword expressions (MWE) MWE-lexicon DuELME: database of multiword expressions (MWE) MWE: word combination that has idiosyncratic properties E.g. de plaat poetsen the plate polish = ‘to bolt’ DuELME= Set of MWE descriptions for MWEs Set of pattern descriptions

Component list (seq. of lemmas) Morphosyntactic properties MWE description Component list (seq. of lemmas) [plaat, poetsen] Morphosyntactic properties Conjugated with hebben Some semantic properties Takes [Human] argument Example sentence (strongly restricted) Hij heeft de plaat gepoetst

Parameters (to define its fine syntactic structure MWE description Pattern id: reference to the description of its global syntactic structure ec1 Parameters (to define its fine syntactic structure Plaat:sg def

Syntactic structure with open slots Pattern description Pattern id ec1 Syntactic structure with open slots [.VP [.obj1:NP [.det:D (1) ] [.hd:N (2) ]] [.hd:V (3) ]] Description Relation between components and example sentence …

Originally: Set of CSV files Later: LMF-compatible format Formats Originally: Set of CSV files Later: LMF-compatible format One minor deviation: error One deviation that [I think (now)] can be remedied

LOD? Any advantages of converting LMF format into some LOD format? Can Lemon model handle it? Or other LD-based lexicon models (if any)? Linking with other lexicons?

Thanks for your attention