Download presentation
Presentation is loading. Please wait.
Published byQuinn Anson Modified over 10 years ago
1
plWordNet as the Cornerstone of a Toolkit of Lexico-semantic Resources Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanis ł aw Szpakowicz* G4.19 Research Group, Institute of Informatics Wroc ł aw University of Technology * School of Electrical Engineering and Computer Science University of Ottawa www.plwordnet.pwr.wroc.pl
2
Wordnet as a Lexical Resource Princeton WordNet defines de facto standard –large size and coverage –open access –thousands of applications Applications: dictionary vs knowledge representation Range of description Ideal size and natural development limits
3
plWordNet model: linguistic resource Wordnet vs ontology –O: a strict knowledge representation –W: concepts expressed entirely in a natural language –W: synonymy is a matter of degree –O: certainty and a rigorous construction –W: shaped by the lexico-semantic dependencies Alternative to formalisation –Corpus analysis and substitution tests –Minimal commitment: defining lexico-semantic relations without committing to any particular theory of lexical semantic or human cognition
4
plWordNet model: corpus-based development Main source of lexical knowledge: a very large monolingual corpus –tools for corpus browsing –semi-automatic knowledge extraction Additional sources: dictionaries and encyclopedias Lexical unit –lemma-sense pair –a linguistically motivated primitive
5
plWordNet model: synset definition Synsets –groups of lexical units sharing certain relations {afekt 1 `passion’, uczucie 2 `feeling’} hypernym {mi ł o ść 1 `love’, umi ł owanie 1 `affection’, kochanie 1 ~`loving’} Constitutive relations –fairly frequent (to describe many LUs) –shared among LUs (to define groups) –grounded in the linguistic tradition (to facilitate their consistent understanding) –used in other wordnets (to improve compatibility)
6
plWordNet model: non-relational aspects Constitutive features –stylistic registers, –verb aspect –and semantic verb classes Referred to in the relation definitions –e.g. relations limited to verbs of the same aspect and semantic class Glosses helps wordnet editors Usage examples: direct links to the corpus
7
Relation density Synset relation density in PWN 3.1 and in plWordNet 2.0
8
Size matters: lexical coverage Coverage of PWN/plWN for lemmas of different frequency in two similar 1.2G words corpora (Wikipedia)
9
Size matters: plWordNet 2.2 POSSynsetsLemmasLUsAverage synset Nouns102 613105 883140 701 1.37 Verbs21 89717 55432 1801.47 Adjectives15 14511 67718 787 1.24 All139 656135 115191 6691.37 www.plwordnet.pwr.wroc.pl
10
plWordNet: ongoing work
11
Size matters: comparison of wordnets
12
How many words are there? - existing dictionaries ● Woordenboek der Nederlandsche Taal 430k lemmas ● dictionary of Grimm brothers 330k lemmas ● Oxford English Dictionary 300k lemmas ● `Warsaw’ Polish Dictionary 280k lemmas ● contemporary Polish dictionaries 130k lemmas unabridged dictionaries
13
~174k (10+ lemmas) COBUILD data How many words are there? - approximation
14
# entries Polish dictionaries100-280k plWordNet corpus (10+ lemmas) [K]174k doubled plWordNet corpus (0+ lemmas) [GT]+200k How many words are there? K - Krishnamurthy’s data (2002), GT - Good & Toulmin approximation (1956) plWordNet 3.0 200k lemmas
15
Toolkit of Lexico-semantic Resources Lexicon of lexico-syntactic structures of multi-word expressions plWordNet 3.0 (Słowosieć 3.0) plWordNet 3.0 to WordNet 3.1 mapping Semantic lexicon of proper names Mapping to an ontology And a valency lexicon linked to plWordNet
16
Lexicon of multi-word expressions Non-trivial morphology of Polish MWEs –more than 100 nominal structural patterns Description of the lexico-syntactic structures of MWEs Multi-word LUs as semantic atoms –no internal semantic relations Dynamic lexicon –a tool for automatic MWE extraction –60 000 described in the lexicon and plWordNet
17
Lexicon of Proper Names PNs are not a part of the lexicon PN is an instance of a type –characterised by referents –not by their semantic properties Linking PNs via a wordnet –some lexico-syntactic contexts signal instance of –PNs are represented in wordnets PNs as derivational bases for Common Nouns Dynamic lexicon with 2.5 milion PNs verified manually
18
plWordNet to WordNet 3.1 mapping plWordNet: built independently to obtain faithful description Manual mapping –bottom-up order –comparison of the relations structures –a cascading list of Interlingual-relations plWordNet verification as an important side effect Present state: 72 000 N and Adj synsets mapped Target: complete plWordNet 3.0 mapped
19
Wordnet editor: WordnetLoom
20
WordnetLoom: editing the mapping
21
Mapping to ontology Ontology: unambiguous concepts defined formally Lexical meanings –imprecisely delimited –constrained by usage, stylistic register and sentiment Mapping to ontology –precise, formal description for meanings –association: concepts – their lexical embodiment SUMO selected –Princeton WordNet mapping –Semi-automated mapping of plWordNet
22
Expectations plWordNet 3.0 Valence lexiconMWE lexicon WordNet 3.1 + extension Proper Names Ontology: SUMO + intermediate level describes
23
Applications Strong universal basis –a comprehensive wordnet >200 000 lemmas resulting in ~285 000 LUs and ~210 000 synsets –one of the largest ever Polish dictionaries Modularly constructed toolkit –a layered architecture of large software systems –separate but linked layers –each layer based on limited set of notions and principles and exchangeable The core of the CLARIN-PL language technology infrastructure
24
Thank-you www.plwordnet.pwr.wroc.pl Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.