Download presentation
Presentation is loading. Please wait.
Published byHope Riley Modified over 9 years ago
1
LIRICS Mid-term Review 1 LIRICS WP2 – NLP Lexica Monica Monachini monica.monachini@ilc.cnr.it CNR-ILC - Pisa 23rd May 2006
2
LIRICS Mid-term Review2 Summary of the presentation Overview of WP2 1° year objectives Main results in T2.1 and T2.2 Work done Synergies with other LIRICS WPs, ISO activities, meetings Priorities for future activities
3
LIRICS Mid-term Review3 WP2 overall objective Define a “family” of standards for NLP lexicons Two-level standards: the high level specifications provide structural elements, i.e. lexical classes and relations between them, the meta-model; the low level specifications provide standardized constants, i.e. data categories used to “adorn” the lexical classes ISO 12620
4
LIRICS Mid-term Review4 WP2 T2.1 overview and objectives From past and on-going standardization activities, gathering linguistic information considered relevant for lexical description and to be combined with the layers of the lexical model Coherent input to ISO Data Category Registry revision
5
LIRICS Mid-term Review5 WP2 T2.1 results Proposal for a unified set of lexical information and unified descriptors as draft set of Data Categories Maximum set of candidate lexical data categories subdivided along the layers of linguistic description: morphosyntax, syntax and semantics. Data Categories shared between WP2 and WP3 relevant to Morphosyntactic description have been incorporated in the Syntax Tool: the Morphosyntactic Profile.
6
LIRICS Mid-term Review6 WP2 T2.1 Deliverables 1st year2nd year3rd year M1M1 M2M2 M3M3 M4M4 M5M5 M6M6 M7M7 M8M8 M9M9 M10M10 M11M11 M12M12 M13M13 M14M14 M15M15 M16M16 M17M17 M18M18 M19M19 M20M20 M21M21 m22m22 M23M23 M24M24 M25M25 M26M26 M27M27 M28M28 M29M29 M30M30 WP2 T2.1 T2.2 I T2.3 I D.2.1 Survey and evaluation of existing standard for Lexica D.2.1 Survey and evaluation of existing standard for Lexica (revision) (version foreseen in conjunction with Data Cats to be issued together with the data model in T2.2) D.2.1 Survey and evaluation of existing standard for Lexica
7
LIRICS Mid-term Review7 WP2 T2.2 overview and objectives Define a lexical framework, a general and abstract meta-model as a set of structural nodes relevant for lexical description, enabling specific implementations on the basis of common Data Categories Definition of the common set of related Data Categories
8
LIRICS Mid-term Review8 WP2 T2.2 results Formulation of a high-level lexical meta-model, the Lexical Markup Framework, a flexible environment for user-defined mark-up languages Proof-of-concepts: mapping exercises of well known NLP lexicon practices against the model
9
LIRICS Mid-term Review9 WP2 T2.2 Deliverables 1st year2nd year3rd year M1M1 M2M2 MM3MM3 M4M4 M5M5 M6M6 M7M7 M8M8 M9M9 M10M10 M11M11 M12M12 M13M13 M14M14 M15M15 M16M16 M17M17 M18M18 M19M19 M20M20 M21M21 m22m22 M23M23 M24M24 M25M25 M26M26 M27M27 M28M28 M29M29 M30M30 WP2 T2.1 T2.2 I T2.3 I NLP Lexica standard for CD ballot (submitted beginning year 06) NLP Lexica standard for ISO DIS ballot Internal milestone for internal quality control
10
LIRICS Mid-term Review10 WP2 Activities, Meetings, Synergies... LIRICS WPs BI- TRI-LATERAL Working Meetings: CNR-ILC – MPI, 15.2.2005: PAROLE-SIMPLE lexical architecture and LEXUS tool WP2 internal meeting, 16.2.2005: basic structure of the meta-model for lexicons (core model + extensions) CNR-ILC – DFKI, 5.5.2005: convergences between morpho-syntactic and syntactic data; issues for the submission of the N W I on Syntax (SynAF) to ISO Pisa, 23-24.11.2005. WP2 internal meeting: basic structure of the meta-model for representation of Multiword expressions LIRICS Meetings Paris, 16-17.3.2005. Progress of work within WP2. Presentation of the standard core model for lexicons and the extensions for NLP lexicons Barcelona, 21-22.6.2005. LIRICS Industrial Advisory Board Meeting Barcelona, 22.6.2005 Presentation of first bulk of information relevant for lexical description Nancy, 8-9.12.2005. WP4 TDG3 Workshop: connections between lexico-semantic representation and semantic roles in lexicon ISO Meetings Berlin 8-9.4.2005. ISO TC37/SC4 WG4 Meetings Warsaw 21-26.08.05. Plenary meeting of ISO TC37/SC4. Task force for the purpose of designating generic data category sets for alignment with with the level of the metamodel; task force related to the representation of MWEs. Rome 27.10.2005. UNI-DIAM Commission: candidature of Italy as P-member in ISO TC37/SC4 (CNR-ILC reference expert)
11
LIRICS Mid-term Review11 provide a common model for the creation and use of lexical resources manage the exchange of data between and among these resources enable the merging of electronic resources to form extensive global resources. Range of topics: monolingual, bilingual multilingual lexical resources Scalability the same specifications are to be used for both small and large lexicons Coverage linguistic description range from morphology, syntax, semantic to multilingual representation languages are not restricted to European languages the range of targeted NLP applications is not restricted. What is LMF for?
12
LIRICS Mid-term Review12 Future activities/Priorities/Plans Data Categories deliver rev 2 of D2.1: candidate data categories will receive the necessary adjustments after discussion extend the ISO Registry to cover further layers of linguistic description: do we need an ISO Syntactic Profile (Bejin)? LMF model refine the NLP multilingual and MWE extensions XML representation of LMF linguistic objects in order to allow unified access to LMF conformant lexicons through APIs Provide implementation of test suite lexical entries: PAROLE-SIMPLE lexicons ready to be described according to LMF (LEXUS), to be put in the LMF server and made accessible via the web.
13
LIRICS Mid-term Review13 Structure of LMF Structural skeleton, with the basic hierarchy of information in a lexical entry extend a subset of core-model classes; are conformant to the core model; cannot be used regardless to the core model LMF specifications comply with modeling UML principles
14
LIRICS Mid-term Review14 Core package Container for managing the top level language components. The number of words or MWe of the lexicon is equal to the number of lexical entries in a given lexicon. Form consists of a text string that represents a single word or a multi-word expression Sense specifies or disambiguates the meaning and context of a form One to many Representation Frames can be associated with Form, each of which contains a form and data categories that specify the orthographic types and name of the word It is a cross- reference pivot that can link to many Lexical Entries within or across Lexicons.
15
LIRICS Mid-term Review15 Package for extensional morphology 1st strategy:describe the morphology representing explicitly all inflections
16
LIRICS Mid-term Review16 Package for inflectional paradigm 2nd strategy: declare an inflectional paradigm; use the inflectional paradigm extension for defining it
17
LIRICS Mid-term Review17 Package for NLP syntax Syntactic behavior represents one of the behaviors of one (or more) senses Construction describes one syntactic construction and can be shared by all words with the same syntactic behavior Self refers to the head lexical entry and describes syntactic properties Syntactic Argument describes a syntactic actant ConstructionSet regroups together various Syntactic Constructions and factorizes syntactic descriptions to have a minimum of syntactic behavior elements in the lexicon.
18
LIRICS Mid-term Review18 XML representation
19
LIRICS Mid-term Review19 Package for NLP semantics Predicative Representation describes the link between Sense and Semantic Predicate Semantic Predicate describes an abstract meaning Semantic Argument describes a semantic actant and is linked with its syntactic counterpart
20
LIRICS Mid-term Review20 Package for NLP semantics (cont.)
21
LIRICS Mid-term Review21 XML representation
22
LIRICS Mid-term Review22 Package for NLP semantics (cont.)
23
LIRICS Mid-term Review23 Package for Multilingual representation Sense Axis Relation describes the linking between two different Sense Axis Source and TargetTest permit to express conditions about the translation on the source/target language side
24
LIRICS Mid-term Review24 Package for Multiword expressions
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.