Barcelona Meeting 21/06/05 MM 1 LIRICS WP2 LIRICS WP2 NLP LEXICA Task Leader: ILC-CNR (Pisa) presented by: Monica Monachini.

Slides:



Advertisements
Similar presentations
Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
Advertisements

CODE/ CODE SWITCHING.
Why study grammar? Knowledge of grammar facilitates language learning
Statistical NLP: Lecture 3
Chapter 4 Basics of English Grammar
Units of specialized knowledge* “A unit of specialized knowledge (SKU) is a unit that represents specialized knowledge at the content level, and communicates.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
 2003 CSLI Publications Ling 566 Oct 16, 2007 How the Grammar Works.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
W3C XML Query Language Working Group Mark Needleman Data Research Associates ZIG Current Awareness Session July 13, 2000.
English Language and Literature Prelim Lesson: Investigating Language Use in ‘The Handmaid’s Tale’
Language Objectives. Planning Teachers should write both content and language objectives Content objectives are drawn from the subject area standards.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Designing and implementing of the NQF Tempus Project N° TEMPUS-2008-SE-SMHES ( )
Computational Investigation of Palestinian Arabic Dialects
LIRICS mid-term review 1 LIRICS WP3: Morpho-syntactic and syntactic annotations Thierry Declerck DFKI-LT - Saarbrücken 23rd May 2006.
LIRICS Mid-term Review 1 LIRICS WP2 – NLP Lexica Monica Monachini CNR-ILC - Pisa 23rd May 2006.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
24 Jan 2005 Kick off meeting (Luxembourg) 1 LIRICS Linguistic Infrastructure for Interoperable Resources and Systems ►Kick off meeting presentation ►Proposal.
ISLE: International Standards for Language Engineering A European/US joint project Martha Palmer University of Pennsylvania Tides Kickoff March 22, 2000.
24 Jan 2005 Kick off meeting (Luxembourg) 1 LIRICS Linguistic Infrastructure for Interoperable Resources and Systems ►Kick off meeting presentation ►Proposal.
Linguistics The first week. Chapter 1 Introduction 1.1 Linguistics.
CLARIN work packages. Conference Place yyyy-mm-dd
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
HYMES (1964) He developed the concept that culture, language and social context are clearly interrelated and strongly rejected the idea of viewing language.
Linguistic Essentials
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Towards a roadmap for standardization in language technology Laurent Romary & Nancy Ide Loria-INRIA — Vassar College.
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
Starter: reminder of the AS exam structure Paper 1: 3 questions assessing AOs 1, 3 and 4. – 2 questions on how language is used to create meanings and.
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
Levels of Linguistic Analysis
NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)
Developing OLIF, Version 2 Susan M. McCormick Christian Lieske OLIF2 Consortium SAP/Walldorf, Germany.
Project management and communication structure Nataša Urbančíková Faculty of Economics Technical University of Košice.
Slang. Informal verbal communication that is generally unacceptable for formal writing.
Genre and cultural purpose We recognize a genre when a text does something with language that we’re familiar with. Very often we are able state what kind.
Lec. 10.  In this section we explain which constituents of a sentence are minimally required, and why. We first provide an informal discussion and then.
Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level Network Events GCE English Language.
Text Linguistics. Definition of linguistics Linguistics can be defined as the scientific or systematic study of language. It is a science in the sense.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
WP4 Models and Contents Quality Assessment
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
An Introduction to Linguistics
Exam Practice Paper 1 AO1: Apply appropriate methods of language analysis, using associated terminology and coherent written expression. AO2: Demonstrate.
Statistical NLP: Lecture 3
OVERVIEW OF DISCOURSE ANALYSIS
Lirics mid-term review
Learning About Language Assessment. Albany: Heinle & Heinle
The key to your first draft [Outlines.pptx]
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
Contextual Analysis Context governs our linguistics choice.
A Level English Language
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
Levels of Linguistic Analysis
Members of Group Anjar Fatimah Avif Furoh L Mahraodatul Abidah
Linguistic Essentials
Jeopardy Game Grammar Edition
, editor October 8, 2011 DRAFT-D
Ling 566 Oct 14, 2008 How the Grammar Works.
Valence, Transitivity, Voice
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

Barcelona Meeting 21/06/05 MM 1 LIRICS WP2 LIRICS WP2 NLP LEXICA Task Leader: ILC-CNR (Pisa) presented by: Monica Monachini

Barcelona Meeting 21/06/05 MM 2 Task 1: Survey TO DO:  Gather from past and on-going standardization activities linguistic info as a coherent input to Data Cat Registry  Transversal coherence between lexicon and text annotation DONE  Draft unified inventory of lexical information, unified descriptors, short descriptions as kind of Pre-DataCats as input to Task2 Analysis of emerging standard initiatives for NLP lexica

Barcelona Meeting 21/06/05 MM 3 Task 1: Milestone/Deliverable 1st year2nd year3rd year M1M1 M2M2 M3M3 M4M4 M5M5 M6M6 M7M7 M8M8 M9M9 M10M10 M11M11 M12M12 M13M13 M14M14 M15M15 M16M16 M17M17 M18M18 M19M19 M20M20 M21M21 m22m22 M23M23 M24M24 M25M25 M26M26 M27M27 M28M28 M29M29 M30M30 WP2 T2.1 T2.2 I T2.3 I D.2.1 Survey and evaluation of existing standard for Lexica A Draft D2.1 Deliverable to be circulated before Summer Holidays

Barcelona Meeting 21/06/05 MM 4 Task 1: Bilateral Meeting ILC-DFKI Held in Pisa – 5th May 2005 Objectives:  Explore relationships between WP2 and WP3  Ensure transversal coherence of Data Cats to be produced within the two WPs  Exchange strategies for gathering linguistic information and producing the Deliverables containing the actual compilation of information as input to Data Cats needed for populating the lexical layers of the data model

Barcelona Meeting 21/06/05 MM 5 Task 1: Work done For the morphosyntactic layer …  Combined strategy between ILC-DFKI in order to ensure compatibility between linguistic information  Start from many past standardization activities, Eagles, Multext-East …  Try to make computationally manageable and browsable the bulk of information that are in the form of paper list

Barcelona Meeting 21/06/05 MM 6 Task 1: The ComboMF Tool  ComboMF is being developed by ILC, allowing to  Input morpho-syntactic lexical information for a given language  describe all constrained relations between  PoS and morphological features  features and values in presence of a given feature/value  formulate declarative rules that combine information for a given language  save all admitted combinations in a database  on the basis of a DTD, export in XML  The tool is an addition to WP2 outcomes for the mo-sy layer  It now contains combinations for the It-PAROLE IT-LcStar lexicons plus information coming from Eagles and Multext- East  Evaluate a possible integration of ComboMF in the LORIA tool and/or in the LEXUS tool, in order to support the definition of hierarchies between attributes and values while designing Data Cats for each language

Barcelona Meeting 21/06/05 MM 7 Task 1: Work done For the syntactic and semantic layers …  Lexical information has been gathered starting from PAROLE-SIMPLE lexicons, ISLE, the ELRA proposal for standards (on its turn based on ISLE)  Unified inventory of lexical information with unified descriptors for compiling the Data Cats of the relevant lexical layers

Barcelona Meeting 21/06/05 MM 8 Draft D2.1: morpho-syntax  XML export of  the maximal set of morphosyntactic info  the admitted combinations language by language are shown (to be checked by native speaker partners)  The accompanying DTDs (DTD specialised sections for each language where ALL agreed on morphological info relevant for the language are modelled)

Barcelona Meeting 21/06/05 MM 9 D2.1 Draft: syntax SubjectA subject is a grammatical relation that exhibits certain independent syntactic properties, such as the following: the grammatical characteristics of the agent of typically transitive verbs; the grammatical characteristics of the single argument of intransitive verbs; a particular case marking or clause position; the conditioning of an agreement affix on the verb; the capability of being obligatorily or optionally deleted in certain grammatical constructions, such as the following clauses:adverbial, complement, coordinate; the conditioning of same subject markers and different subject markers in switch-reference systems; the capability of coreference with reflexive pronouns ObjectAn object, traditionally defined, is either a direct object or an indirect object/An object, in some usages, is any grammatical relation other than subject. Direct ObjectA direct object is a grammatical relation that exhibits a combination of certain independent syntactic properties, such as the following: the usual grammatical characteristics of the patient of typically transitive verbs; a particular case marking; a particular clause position; the conditioning of an agreement affix on the verb; the capability of becoming the clause subject in passivization; the capability of reflexivization

Barcelona Meeting 21/06/05 MM 10 D2.1 Draft: semantics Telic Formal node in the hierarchy Telic Indirect_telic, : the Semantic Unit 1 and the Semantic Unit 2 are related through an underspecified indirect telic relation. The Semantic Unit 1 is usually the subject or the instrument-complement of the event in the Semantic Unit 2, which represents a purpose prototypically associated with the Semantic Unit 1. Telic Purpose, : the Semantic Unit 1 is the SemU being defined, and the Semantic Unit 2 is an event corresponding to the intended purpose of the Semantic Unit 1. TelicInstrumental Formal node in the hierarchy TelicInstrumentalUsed_for, : the Semantic Unit 2 is the typical function of the Semantic Unit 1. This relation usually applies to instruments or devices to connect them with the activity in which they are used or to their typical purpose. TelicInstrumentalUsed_by, : the Semantic Unit 1 is typically used by the Semantic Unit 2. TelicInstrumentalUsed_against, : the Semantic Unit 1 is used typically against the Semantic Unit 2. TelicInstrumentalUsed_as, : the Semantic Unit 1 is typically used with the function which is expressed by the Semantic Unit 2. TelicActivity Formal node in the hierarchy TelicActivityIs_the_activity_of, : the Semantic Unit 2 is the characterizing activity of the Semantic Unit 1. TelicActivityIs_the_ability_of painter>, : the Semantic Unit 2 is a typical ability of an individual in the Semantic Unit 1. TelicActivityIs_the_habit_of, : the Semantic Unit 2 is the typical habit of an individual in the Semantic Unit 1. TelicDirect_telic Formal node in the hierarchy TelicDirect_telicObject_of_the_activity, : the Semantic Unit 2 is an event whose direct object is typically the Semantic Unit 1, and expresses an activity which is the typical purpose of the Semantic Unit 1.

Barcelona Meeting 21/06/05 MM 11 Task 1: on-going work  Integrating info coming from speech community  Exploring convergences of lexical information encoded btw. written and spoken (at least at mo-sy level of encoding)  Increasing the coverage  Going in the direction of Data Cats agreed on between written and spoken

Barcelona Meeting 21/06/05 MM 12 CNR-ILC: coordination; integration of info from speech lexicons UFSD: info needed for languages of accessing countries MPI: info needed for non EU languages DFKI: link with parallel work on annotation UTil: link with parallel work on annotation UW: interdependencies with info typical in terminologies UPF: check soundness, effectiveness, completeness … Expected contributions from partners