PlWordNet as the Cornerstone of a Toolkit of Lexico-semantic Resources Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanis ł aw Szpakowicz* G4.19 Research.

Slides:



Advertisements
Similar presentations
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
Advertisements

Building Wordnets Piek Vossen, Irion Technologies.
A centralized approach to language resources Piek Vossen S&T Forum on Multilingualism, Luxembourg, June 6th 2005.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
CLARIN-PL CLARIN-PL – Research User-driven Language Technology Infrastructure Maciej Piasecki Wrocław University of Technology G4.19 Research Group
Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Cognitive Linguistics Croft & Cruse 9
Ontology From Wikipedia, the free encyclopedia In philosophy, ontology (from the Greek oν, genitive oντος: of being (part. of εiναι: to be) and –λογία:
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Ewa Rudnicka, Marek Maziarz, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Automated Analysis and Code Generation for Domain-Specific Models George Edwards Center for Systems and Software Engineering University of Southern California.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
School of Computer ScienceG53FSP Formal Specification1 Dr. Rong Qu Introduction to Formal Specification
Foundations This chapter lays down the fundamental ideas and choices on which our approach is based. First, it identifies the needs of architects in the.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Building the Valency Lexicon of Arabic Verbs Viktor Bielický Otakar Smrž LREC 2008, Marrakech, Morocco.
M.Hosseinzadeh EDC Translation Art or Skill Session.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Adam Pease and Christiane Fellbaum Presenter: 吳怡安
Ontology Development in the Sciences Some Fundamental Considerations Ontolytics LLC Topics:  Possible uses of ontologies  Ontologies vs. terminologies.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.
Jessica Chen-Burger A Framework for Knowledge Sharing and Integrity Checking for Multi-Perspective Models Yun-Heh (Jessica) Chen-Burger Artificial Intelligence.
LREC 2008 AWN 1 Arabic WordNet: Semi-automatic Extensions using Bayesian Inference H. Rodríguez 1, D. Farwell 1, J. Farreres 1, M. Bertran 1, M. Alkhalifa.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
SEMANTIC ANALYSIS WAES3303
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information Michael RothSabine Schulte im Walde.
Integrating lexical units, synsets and ontology in the Cornetto Database Piek Vossen 1, 2, Isa Maks 1, Roxane Segers 1, Hennie van der Vliet 1 1: Faculty.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Wordnet - A lexical database for the English Language.
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
Approaches to Machine Translation
Talp Research Center, UPC, Barcelona, Spain
Generating sets of synonyms between languages
Web Service Modeling Ontology (WSMO)

Linguistic Linked Open Data
WordNet: A Lexical Database for English
CSc4730/6730 Scientific Visualization
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
WordNet WordNet, WSD.
Approaches to Machine Translation
Automated Analysis and Code Generation for Domain-Specific Models
Presentation transcript:

plWordNet as the Cornerstone of a Toolkit of Lexico-semantic Resources Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanis ł aw Szpakowicz* G4.19 Research Group, Institute of Informatics Wroc ł aw University of Technology * School of Electrical Engineering and Computer Science University of Ottawa

Wordnet as a Lexical Resource Princeton WordNet defines de facto standard –large size and coverage –open access –thousands of applications Applications: dictionary vs knowledge representation Range of description Ideal size and natural development limits

plWordNet model: linguistic resource Wordnet vs ontology –O: a strict knowledge representation –W: concepts expressed entirely in a natural language –W: synonymy is a matter of degree –O: certainty and a rigorous construction –W: shaped by the lexico-semantic dependencies Alternative to formalisation –Corpus analysis and substitution tests –Minimal commitment: defining lexico-semantic relations without committing to any particular theory of lexical semantic or human cognition

plWordNet model: corpus-based development Main source of lexical knowledge: a very large monolingual corpus –tools for corpus browsing –semi-automatic knowledge extraction Additional sources: dictionaries and encyclopedias Lexical unit –lemma-sense pair –a linguistically motivated primitive

plWordNet model: synset definition Synsets –groups of lexical units sharing certain relations {afekt 1 `passion’, uczucie 2 `feeling’}  hypernym  {mi ł o ść 1 `love’, umi ł owanie 1 `affection’, kochanie 1 ~`loving’} Constitutive relations –fairly frequent (to describe many LUs) –shared among LUs (to define groups) –grounded in the linguistic tradition (to facilitate their consistent understanding) –used in other wordnets (to improve compatibility)

plWordNet model: non-relational aspects Constitutive features –stylistic registers, –verb aspect –and semantic verb classes Referred to in the relation definitions –e.g. relations limited to verbs of the same aspect and semantic class Glosses helps wordnet editors Usage examples: direct links to the corpus

Relation density Synset relation density in PWN 3.1 and in plWordNet 2.0

Size matters: lexical coverage Coverage of PWN/plWN for lemmas of different frequency in two similar 1.2G words corpora (Wikipedia)

Size matters: plWordNet 2.2 POSSynsetsLemmasLUsAverage synset Nouns Verbs Adjectives All

plWordNet: ongoing work

Size matters: comparison of wordnets

How many words are there? - existing dictionaries ● Woordenboek der Nederlandsche Taal 430k lemmas ● dictionary of Grimm brothers 330k lemmas ● Oxford English Dictionary 300k lemmas ● `Warsaw’ Polish Dictionary 280k lemmas ● contemporary Polish dictionaries 130k lemmas unabridged dictionaries

~174k (10+ lemmas) COBUILD data How many words are there? - approximation

# entries Polish dictionaries k plWordNet corpus (10+ lemmas) [K]174k doubled plWordNet corpus (0+ lemmas) [GT]+200k How many words are there? K - Krishnamurthy’s data (2002), GT - Good & Toulmin approximation (1956) plWordNet k lemmas

Toolkit of Lexico-semantic Resources Lexicon of lexico-syntactic structures of multi-word expressions plWordNet 3.0 (Słowosieć 3.0) plWordNet 3.0 to WordNet 3.1 mapping Semantic lexicon of proper names Mapping to an ontology And a valency lexicon linked to plWordNet

Lexicon of multi-word expressions Non-trivial morphology of Polish MWEs –more than 100 nominal structural patterns Description of the lexico-syntactic structures of MWEs Multi-word LUs as semantic atoms –no internal semantic relations Dynamic lexicon –a tool for automatic MWE extraction – described in the lexicon and plWordNet

Lexicon of Proper Names PNs are not a part of the lexicon PN is an instance of a type –characterised by referents –not by their semantic properties Linking PNs via a wordnet –some lexico-syntactic contexts signal instance of –PNs are represented in wordnets PNs as derivational bases for Common Nouns Dynamic lexicon with 2.5 milion PNs verified manually

plWordNet to WordNet 3.1 mapping plWordNet: built independently to obtain faithful description Manual mapping –bottom-up order –comparison of the relations structures –a cascading list of Interlingual-relations plWordNet verification as an important side effect Present state: N and Adj synsets mapped Target: complete plWordNet 3.0 mapped

Wordnet editor: WordnetLoom

WordnetLoom: editing the mapping

Mapping to ontology Ontology: unambiguous concepts defined formally Lexical meanings –imprecisely delimited –constrained by usage, stylistic register and sentiment Mapping to ontology –precise, formal description for meanings –association: concepts – their lexical embodiment SUMO selected –Princeton WordNet mapping –Semi-automated mapping of plWordNet

Expectations plWordNet 3.0 Valence lexiconMWE lexicon WordNet extension Proper Names Ontology: SUMO + intermediate level describes

Applications Strong universal basis –a comprehensive wordnet > lemmas resulting in ~ LUs and ~ synsets –one of the largest ever Polish dictionaries Modularly constructed toolkit –a layered architecture of large software systems –separate but linked layers –each layer based on limited set of notions and principles and exchangeable The core of the CLARIN-PL language technology infrastructure

Thank-you Thank you!