Presentation is loading. Please wait.

Presentation is loading. Please wait.

TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST.

Similar presentations


Presentation on theme: "TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST."— Presentation transcript:

1 TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST JANUARY 27-30, 2016

2 Outline A classified overview of the methods to validate wordnet hierarchies Graph-based methods The advantages of graph-based methods What kind of them are applied to Princeton WordNet Some new patterns and their examples Does it make sense to apply graph-based method on other wordnets? Summary Encouragement to wordnet developers to use these methods

3 What kind of methods different developers have used? Group of methods Use of corpus data, lexical resources Use the contents of a synset Popularity Corpus-based methods ++High Rule-based methods –+Medium Graph-based methods ––Low (yet!)

4 Corpus-based methods Different techniques for extracting the relevant information have been applied. Some of the well-known approaches include: Lexico-syntactic patterns (Hearst, 1992), (Nadig et al., 2008) Similarity measurements (Sagot and Fišer, 2012) Mapping and comparing to wordnet (Pedersen et al., others, 2013) Applying wordnet in NLP tasks (Saito et al., 2002) Group of methods Use of corpus data, lexical resources Use the contents of a synset Popularity Corpus-based meth.++High Rule-based meth.–+Medium Graph-based meth.––Low (yet!)

5 Rule-based methods These methods for validating hierarchies rely on lexical relations (word-word), semantic relations (concept-concept) and the rules among them. This includes the rules applied to the construction of WordNet (Fellbaum, 1998), and additional rules, such as the following: Metaproperties (rigidity, identity, unity and dependence) described in ontology construction (Guarino and Welty, 2002) Top Ontology concepts or “unique beginners” (Atserias et al., 2005; Miller, 1998) Specific rules for particular error detections (Gupta, 2002; Nadig et al., 2008). For instance, a rule proposed by (Nadig et al., 2008):“If one term of a synset X is a proper suffix of a term in a synset Y, X is a hypernym of Y” Group of methods Use of corpus data, lexical resources Use the contents of a synset Popularity Corpus-based meth.++High Rule-based meth.–+Medium Graph-based meth.––Low (yet!)

6 The advantages of graph-based methods Test patterns are applicable to wordnets in every language Test patterns highlight substructures that refer to possible errors and they simplify the work of the expert lexicographer (Lohk et al., 2012a), (Lohk et al., 2012b), (Lohk et al., 2014b) Using a test is always quicker than “[doing] a full revision in top- down or alphabetical order” (Čapek, 2012).

7 Graph-based methods These methods are purely formal and do not take into account the semantics among word forms. Specific substructures of a wordnet’s hierarchies are checked and validated. Target substructures include: Cycles (Šmrz, 2004), (Kubis, 2012) Shortcuts (Fischer, 1997) Rings (Liu et al., 2004; Richens, 2008) Dangling uplinks (Koeva et al., 2004; Šmrz, 2004) Orphan nodes (null graphs) (Čapek, 2012) Cycle ShortcutRingDangling uplink Group of methods Use of corpus data, lexical resources Use the contents of a synset Popularity Corpus-based meth.++High Rule-based meth.–+Medium Graph-based meth.––Low (yet!)

8 An artificial hierarchy and specific substructures 1 Short cut 2 Heart-shaped substructure 3 Ring 4 Closed subset 5 Dense component 6 Connected roots + 4 substructures 1 2 3 4 5 6 Specific substructures = test patterns

9 Dense component

10 Heart-shaped substructure

11 „Compound“ pattern

12 Connected roots

13 Wordnets in comparison Wordnet Noun roots Verb roots Multiple inheritance cases Short cuts Rings Synset with many roots Heart- shaped substructure Dense component „Compound“ pattern The largest closed subsets Princeton WordNet Version 3.0 123341,453402,99118155115358 1,333×167 Finnish Wordnet Version 2.0 123341,453402,99118155115394 1,334×167 Cornetto Version 2.0 222,4383515,309621,226217549 11,032×589 Polish Wordnet Version 2.0 6374210,94255357,887 205,254 5,037778541 30,794×4,683 Estonian Wordnet Version 70 11845172170037123x4 13


Download ppt "TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST."

Similar presentations


Ads by Google