Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa N. CalzolariNijmegen, August 20101.

Slides:



Advertisements
Similar presentations
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
Advertisements

Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.
A centralized approach to language resources Piek Vossen S&T Forum on Multilingualism, Luxembourg, June 6th 2005.
N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
… e Progetti Risorse Linguistiche (lessici, corpora, ontologie, …)
ICT Monica Monachini – 1° KYOTO Workshop – Amsterdam 2/ KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization Intelligent.
N. Calzolari12nd KYOTO Workshop, Gifu, Japan, January 2011 Nicoletta Calzolari Istituto di Linguistica Computazionale – CNR – Pisa
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
COLING Workshop Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.
MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information ISO TC37 SC4 WG Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary,
LIRICS International Standards in Lexicography Gerhard Budin University of Vienna August 2005.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
XMELLT Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
Boulder, March Nicoletta Calzolari Istituto di Linguistica Computazionale del CNR, Pisa, Italy CLARIN and FLaReNet: new European.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
Barcelona Meeting 21/06/05 MM 1 LIRICS WP2 LIRICS WP2 NLP LEXICA Task Leader: ILC-CNR (Pisa) presented by: Monica Monachini.
Future of MDR - ISO/IEC Metadata Registries (MDR) Larry Fitzwater, SC 32 WG 2 Convener Computer Scientist U.S. Environmental Protection Agency May.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)
Provo, 16 Aug 2007 LMF meeting 1 Lexical Markup Framework: ISO Provo meeting Gil Francopoulo.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Standards for language resources the ISO/TC 37(/SC 4) perspective
Mobile Topic Maps for e-Learning John McDonald & Darina Dicheva Intelligent Information Systems Group Computer Science Department Winston-Salem State University,
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
LIRICS Mid-term Review 1 LIRICS WP2 – NLP Lexica Monica Monachini CNR-ILC - Pisa 23rd May 2006.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Interfacing Registry Systems December 2000.
24 Jan 2005 Kick off meeting (Luxembourg) 1 LIRICS Linguistic Infrastructure for Interoperable Resources and Systems ►Kick off meeting presentation ►Proposal.
PREVIOUS EVENTS Panel on International Co-operation (LREC - Granada) Panel of the Funding Agencies (LREC - Granada) Post-LREC Workshop on “Multilingual.
ISLE: International Standards for Language Engineering A European/US joint project Martha Palmer University of Pennsylvania Tides Kickoff March 22, 2000.
24 Jan 2005 Kick off meeting (Luxembourg) 1 LIRICS Linguistic Infrastructure for Interoperable Resources and Systems ►Kick off meeting presentation ►Proposal.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
“D.A.I. & S.M. for KM” a synergy of complementary domains and challenges  the semantic web addicted people “please, raise your hands !”
Nicoletta Calzolari Berlin, October PWI ISO SC 4/WG 4 Lexicon-Ontology relations PWI Nicoletta Calzolari Exploratory meeting.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
LEXUS a flexible web based lexicon tool LEXUS a flexible web based lexicon tool, august 21 th, 2005 Marc Kemps-Snijders Peter Wittenburg
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
ISO-PWI Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.
LREC 2010, Malta, 20 May e Content plus Preparing the field for an Open and Distributed Resource Infrastructure: the role of the FLaReNet Network.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
TMF - Terminological Markup Framework Laurent Romary Laboratoire LORIA (CNRS, INRIA, Universités de Nancy) ISO meeting London, 14 August 2000.
Working with Ontologies Introduction to DOGMA and related research.
LIRICS mid-term review 1 WP5 Adam Funk University of Sheffield 23rd May 2006.
Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure Catia Cucchiarini, Walter Daelemans.
Towards Linguistically Grounded Ontologies Paul Buitelaar, Philipp Cimiano, Peter Haase, and Michael Sintek Proceedings of the 6 th European Semantic Web.
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
Towards a roadmap for standardization in language technology Laurent Romary & Nancy Ide Loria-INRIA — Vassar College.
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
Developing OLIF, Version 2 Susan M. McCormick Christian Lieske OLIF2 Consortium SAP/Walldorf, Germany.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
SemAF – Basics: Semantic annotation framework Harry Bunt Tilburg University isa -6 Joint ISO - ACL/SIGSEM workshop Oxford, January 2011 TC 37/SC.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
ISO TC37/SC4 N435 Nov 12, 2007 Presented by Miran Choi/ETRI Written by Jae Sung Lee/Chungbuk National Univ.
19-20 October 2010 IT Directors’ Group meeting 1 Item 6 of the agenda ISA programme Pascal JACQUES Unit B2 - Methodology/Research Local Informatics Security.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Ontologies Reasoning Components Agents Simulations An Overview of Model-Driven Engineering and Architecture Jacques Robin.
KYOTO (ICT ) Knowledge Yielding Ontologies for Transition-Based Organization Intelligent Content and Semantics The First KYOTO Workshop February.
Lirics mid-term review
Infrastructrural Language Resources and International Cooperation
Presentation transcript:

Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa N. CalzolariNijmegen, August 20101

2 MultiLex GeneLex AcquiLex Xxx-Lex A. Zampolli: Let’s be coherent: Xxx-Lex After the “Grosseto Workshop” (1985): a turning point N. Calzolari 2 Nijmegen, August 2010

N. CalzolariNijmegen, August Reusability  Reusability as key concept  true also today To avoid duplication of efforts, costs, etc. To allow synergies, integration, exchange of data,... To provide a model for new data creation & acquisition “feasible”priorities  Decide on “feasible” areas & state priorities  this is changing over time strong sign of maturity The feasibility of formulation of consensual standards as a strong sign of maturity in the field  we can’t propose standards if there are not enough results on which to base them EAGLES was launched in ‘93 EAGLES was launched in ‘93 Key issues: Do conditions exist for standardisation effort?

Main Results in Lexicon & Corpus WGs First Phase ( N. Calzolari 4Nijmegen, August 2010 Standard for morphosyntactic encoding of lexical entries, in a multi-layered structure, with applications for all the EU languages Standard for subcategorisation in the lexicon: a set of standardised basic notions using a frame-based structure Proposal for a basic set of notions in lexical semantics: focus on requirements of Information Systems and MT Corpus Encoding Standard (CES) from TEI Standard for morphosyntactic annotation of corpora, to ensure compatibility/ interchangeability of concrete annotation schemata Standard for morphosyntactic annotation of corpora, to ensure compatibility/ interchangeability of concrete annotation schemata Preliminary recommendations for syntactic annotation of corpora Dialogue annotation, for integration of written and spoken annotation

N. CalzolariNijmegen, August Content vs. Format/Representation LMF : In LMF : on the abstract meta-model

N. CalzolariNijmegen, August Flexibility in the Recommendations e.g. Morphosyntax Recommendation Level Information Type Recommendation Obligatory  L-0 Part-of-Speech Obligatory Recommended  L-1 Morphosyntactic agreement Recommended features Optional  L-2 Language-specific (or refined) Optional features

N. CalzolariNijmegen, August MERITS  Strengths (from EAGLES-ISLE)

N. CalzolariNijmegen, August Why Standards for Language Resources? (from EAGLES-ISLE)  important for workflows  essential for a LR Infrastructure  for evaluation campaigns

N. CalzolariNijmegen, August Applications: requirements for systems & enabling technologies Machine Translation Information Extraction Information Retrieval Summarisation Natural Language Generation Word Clustering Multiword Recognition + Extraction Word Sense Disambiguation Proper Noun Recognition ParsingCoreference…

N. CalzolariNijmegen, August The Multilingual ISLE Lexical Entry (MILE)

N. CalzolariNijmegen, August MILE – Modularity The building-block model syntactic frame phrase slot Syn feature Lexical Objects Sem feature Lexical entry 1 Lexical entry 2 Lexical entry 3 Allow to express different dimensions of lexical entries Enable modular specification of lexical entries Create ready-to-use packages to be combined in different ways Lexical Classes as the main building blocks of the lexical architecture  Done in LMF

N. CalzolariNijmegen, August The MILE Data Categories User-adaptability and extensibility HUMAN ARTIFACT EVENT ANIMAL GROUP AGE MAMMAL instance_of Core UserDefined MLC:SemanticFeature  OK in ISOCat

N. CalzolariNijmegen, August MILE Lexical Data Category Registry A library of pre-instantiated objects  DC Selections  To be done … in ISOCat

N. CalzolariNijmegen, August ISO - LMF Lexical Markup Framework

N. CalzolariNijmegen, August ISO LMF Structural skeleton, with the basic hierarchy of information in a lexical entry + various extensions  Modular framework  LMF specs comply with modelling UML principles  an XML DTD allows implementation Builds on EAGLES/ISLE NEDOAsianLang. The field is mature NICT Language- Grid Service Ontology ICTKYOTO LIRICS New initiatives … LexInfo

Barcelona, IEC, 7-8 juliol de 2009 Monica Monachini Mettere entrata PAROLE in XML LMF compliant Nijmegen, August 2010

Barcelona, IEC, 7-8 juliol de 2009 Monica Monachini Nijmegen, August 2010 DCR

N. CalzolariNijmegen, August Mapping experiment Major best practices: OLIF PAROLE/SIMPLE LC-Star (Speech Lexicon) WordNet - EuroWordNet FrameNet BDef formal database of lexicographic definitions derived from Explanatory Dictionary of Contemporary French from Monica Monachini

BioLexicon SIMPLE model & ISO-LMF standard N. Calzolari 19Nijmegen, August 2010 BLBL A unique large-scale computational lexicon in the biomedical domain in terms of coverage & typology of information Populated with info from available biomedical resources Semi-automatically populated from corpora: Population toolkit available Including both domain- specific & general language words Rich linguistic information ranging over different linguistic descriptions levels Conformant to international lexical representation standards Designed to meet Bio- Text Mining requirements from Monica Monachini

The BioLexicon: why Nijmegen, August 2010N. Calzolari20

ICT Nijmegen, August 2010 KYOTO: the lexical resource perspective

KYOTO SYSTEM N. Calzolari 22Nijmegen, August 2010 Linear MAF/SYNAF Linear SEMAF Term extraction Tybot Generic TMF Semantic annotation Linear Generic FACTAF Fact extraction Kybot Domain editing Wikyoto Wordnet Domain Wordnet LMF API Ontology Domain ontology OWL API Concept User Fact User from Piek Vossen Source Documents

ICT Nijmegen, August 2010 A common representation format for WordNets Wn IT Wn EN Wn EU Wn NL Wn JP Wn CH Wn ES  endow WordNet with a representation format allowing easy access, integration & interoperability among resources Wn IT Wn EN Wn EU Wn NL Wn JP Wn CH Wn ES

ICT Nijmegen, August 2010 N. Calzolari24 GlobalInformation Lemma Monolingual ExternalRef Monolingual ExternalRefs Sense LexicalEntry Statement Definition SynsetRelation SynsetRelations Monolingual ExternalRef Monolingual ExternalRefs Synset Lexicon Interlingual ExternalRef Interlingual ExternalRefs SenseAxis SenseAxes LexicalResource * * * * Meta 0..1 Meta 0..1 Meta 0..1 Meta 0..* * 0..* * Data Categories from Monica Monachini

ICT Nijmegen, August 2010 A list of 85 sem.rels as a result of a mapping of the KYOTO WordNet grid Inter-WN Intra-WN

ICT Nijmegen, August 2010 N. Calzolari26 SWN n <!ATTLIST SenseAxis id ID #REQUIRED relType CDATA #REQUIRED> <!ATTLIST Target ID CDATA #REQUIRED> <!ATTLIST InterlingualExternalRef externalSystem CDATA #REQUIRED externalReference CDATA #REQUIRED relType (at|plus|equal) #IMPLIED> IWN n WordNet-LMF Multilingual level - Cross-lingual Relations WN n groups monolingual synsets corresponding to each other and sharing the same relations to English link to ontology/(ies) specifies the type of correspondence from Monica Monachini

ICT Kyoto Knowledge Base Nijmegen, August 2010 WnIT Domain WnEN Domain WnEU Domain WnNL Domain WnJP Domain WnCH Domain WnES Domain Ontology Domain Ontology

LMF and Named Entity Lexicon Nijmegen, August 2010 from Monica Monachini N. Calzolari28

Named Entity Lexicon Nijmegen, August 2010 Wikip LR Onto from Monica Monachini N. Calzolari 29

N. CalzolariNijmegen, August LexInfo & Previous Models From Paul Buitelaar

LMF: ILC infrastructure Nijmegen, August 2010 N. Calzolari 31

Desiderata for Semantic Roles 32Nijmegen, August 2010 Martha Palmer N. Calzolari

Nijmegen, August Some steps for a “new generation” of LRs From huge efforts in building static, large-scale, general-purpose LRs To dynamic LRs rapidly built on-demand, tailored to specific user needs From closed, locally developed and centralized resources To LRs residing over distributed places, accessible on the web, choreographed by agents acting over them  From Language Resources To Language Services BUT Need of tools to make this vision operational & concrete

N. CalzolariNijmegen, August Lexical WEB & Content Interoperability As a critical step for semantic mark-up in the SemWeb ComLex SIMPLE WordNets FrameNet Lex_x Lex_y LMF with intelligent agents NomLex Standards for Interoperability Enough?? Global WordNet GRID BioLexicon SIMPLE-WEB

N. CalzolariNijmegen, August A new paradigm of R&D in LRs & LT A new paradigm of R&D in LRs & LT Distributed Language Services

N. CalzolariNijmegen, August A few Issues for discussion: “content”, guidelines, tools, priorities,... Semantic Web “content” interoperability:‘mature’ enough to converge For Semantic Web & “content” interoperability: is the field ‘mature’ enough to converge also for the semantic/conceptual level (e.g. to automatically establish links among different languages)? usability requirements of industrial applications For the standards to have impact, ensure their usability & gain industry support focusing on requirements of industrial applications Guidelines “usable product” To have Guidelines which are a “usable product” (to assist in creation or adaptation of lexicons, …) open-source reference implementation platform & toolsweb services Facilitate acceptance of the standards providing an open-source reference implementation platform & tools, related web services and test suites Spoken language Relation with Spoken language community further stepspriorities Define further steps necessary to converge on common priorities

N. CalzolariNijmegen, August Limits observed & needs of further work

N. CalzolariNijmegen, August Strengths

N. CalzolariNijmegen, August Future requirements & planning

N. CalzolariNijmegen, August FLaReNet Mission: structure the area of LR & LT of the future

N. CalzolariNijmegen, August International Cooperation Some results from FLaReNet Vienna Forum: International Cooperation

N. CalzolariNijmegen, August