Download presentation
Presentation is loading. Please wait.
1
XMELLT Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation Nancy Ide Department of Computer Science Vassar College
2
XMELLT Participants zDepartment of Computer Science, Vassar College zInternational Computer Science Institute, University of California, Berkeley zDepartment of Computer Science, New York University zComputing Research Laboratory, New Mexico State University
3
XMELLT Framework zPlanning project yone-year time frame zOriginally submitted as a joint NSF-EU project with additional European partners yIstituto di Linguistica Computazionale, CNR, Pisa yInstitut für Maschinelle Sprachverarbeitung, Stuttgart yLexiQuest, Paris
4
XMELLT Overall goal zdefine a core international infrastructure to support the creation of a multi-lingual multi-word expression lexicon incorporating both morpho-syntactic and semantic information
5
XMELLT Specific aims zdetermine the type and dimensions of information to serve the needs of critical NLP applications zspecify an overall architecture for a joint software and lingware development project
6
XMELLT Aims... Explore the possibilities for recognizing and acquiring multi-word lexical units from corpora by means of partial parsing, statistics, etc. zOutline a collaborative project to acquire and represent multi-word lexical entries for multiple languages
7
XMELLT Motivation Multi-word constructions are extremely frequent in language y~30%of the lexical stock zExisting resources do not adequately treat multi-word expressions
8
XMELLT Limitations zconstructed for particular system or application yincorporate tailored information (e.g., primarily syntax with little semantics) ynot reusable zmost devoted to a single language and/or approach
9
XMELLT Limitations... znot flexible, expandable to multiple languages yMT systems' lexicons are typically little more than "translation memories" yNo interface among single-word entries, multi-word entries, syntax, and semantics
10
XMELLT XMELLT Approach zBroad view of multi-word expressions yidioms, compounds, collocations, co-occurrence patterns zfocus on linking of individual language lexicons yindividual words and multi-word expressions y different types of multi-word expressions xe.g., English noun-noun vs Romance noun-PP
11
XMELLT Considerations zinternal variation zsub-categorization properties zidiosyncratic constraints on inflection zmeaning (non-)compositionality
12
XMELLT Encoding Model zCompatible and integrated with existing and de facto standards ye.g., EAGLES, PAROLE/SIMPLE, NOMLEX
13
XMELLT Activities zAssessment of existing lexical resources for multi-word expressions yDelivery of survey
14
XMELLT Activities... zCreation of a small set of sample entries yadd lexical information on support verb constructions to 50 nouns drawn from NOMLEX for English, Italian, German, and French ycreate lexical entries for 50 N-N English constructs from the PAROLE/SIMPLE lexicons and corresponding constructs in Italian, German, and French
15
XMELLT Activities... zDevelop preliminary specifications for structuring and encoding multi-lingual, multi-word expression lexicons yrequired linguistic information yharmonized data architecture and encoding format
16
XMELLT Activities... zExploration of techniques for automatic acquisition yMonths 1-6: Survey of acquisition techniques, typology of MWE yMonths 7-12: Design of architecture for MWE acquisition
17
XMELLT Project information zStart date: June (?) zWeb site: zContact: Nancy Ide (PI) Department of Computer Science Vassar College ide@cs.vassar.edu http://www.cs.vassar.edu/~ide/XMELLT.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.