Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Similar presentations


Presentation on theme: "The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland."— Presentation transcript:

1 The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland

2 Ann Devitt, TCD Introduction Measures Measures  WordNet “sub-hierarchies”  Multiple inheritance  Branching Factor  Depth versus Height  Cluster coefficients Specificity pilot study Specificity pilot study

3 Ann Devitt, TCD Terminology WordNet as directed acyclic graph WordNet as directed acyclic graph Node and synset interchangeable Node and synset interchangeable

4 Ann Devitt, TCD Dimensional distribution

5 Ann Devitt, TCD Overlap between hierarchies 2072 synsets: more than 1 top hierarchy 2072 synsets: more than 1 top hierarchy 35 synsets: more than 2 top hierarchies 35 synsets: more than 2 top hierarchies

6 Ann Devitt, TCD Some overlap examples Abstraction and Event Abstraction and Event  948 synsets  group action Entity and Group Entity and Group  250 nodes  weaponry

7 Ann Devitt, TCD Multiple inheritance 2.6% of nodes 2.6% of nodes Normal distribution throughout depth Normal distribution throughout depth Significantly different in different taxonomies: Significantly different in different taxonomies:  χ 2 (8, N=75180)=324.27, p≤0.001

8 Ann Devitt, TCD Specificity examples Parents = 1, depth < 3 Parents = 1, depth < 3  damnation  office Parents = 1, depth > 8 Parents = 1, depth > 8  beagle  palomino Parents > 1, depth < 3  person  artefact Parents > 1, depth > 8  sea bass  self- condemnation  bombardon

9 Ann Devitt, TCD Branching Factor Number of children + 1 Number of children + 1 Including leaf nodes Including leaf nodes  Range: 1 – 573  Average: 2.023 Excluding leaf nodes: Excluding leaf nodes:  Average: 5.793  97% less than 20

10 Ann Devitt, TCD Branching factor Overall low branching factor Overall low branching factor Same distribution in all sub-hierarchies Same distribution in all sub-hierarchies Large number of nodes in total Large number of nodes in total Greater overall depth in paths Greater overall depth in paths Not a shallow structure Not a shallow structure  despite 55,000 leaf nodes

11 Ann Devitt, TCD Depth vs Height Depth: Depth:  Maximum = 18  Normal distribution Height: Height:  Maximum = 5  93.6% 1 or 2 nodes from a leaf node  Zipfian distribution

12 Ann Devitt, TCD Depth vs Height Reported distributions Reported distributions  the same across the different sub hierarchies Depth is a more informative measure Depth is a more informative measure

13 Ann Devitt, TCD Clustering coefficient Measure of graph connectivity Measure of graph connectivity Ratio: Ratio:  Number of connections btwn nodes  Possible number of connections 2 Σ i k i (k i – 1)

14 Ann Devitt, TCD Cluster coefficients First-order measure First-order measure  Not useful for WordNet  Only 62 nodes have a coefficient > 0  Does not form clusters readily

15 Ann Devitt, TCD Cluster coefficients Second-order measure Second-order measure  Average 0.337  Normal distribution  May form clusters of wider diameter

16 Ann Devitt, TCD Pilot Study Aims 1. Do people have a notion of generality/specificity for concepts? 2. Do people agree on what is more/less general/specific? 3. What features of WordNet do these judgments correlate with?

17 Ann Devitt, TCD Sample ranking task I Axis, axis of rotation – (the center around which something rotates Axis, axis of rotation – (the center around which something rotates River boat – (a boat used on rivers or to ply a river) River boat – (a boat used on rivers or to ply a river) Remains – (any object that is left unused or still extant; “I threw out the remains of my dinner” Remains – (any object that is left unused or still extant; “I threw out the remains of my dinner”

18 Ann Devitt, TCD Sample ranking task II rational motive - (a motive that can be defended by reasoning or logical argument rational motive - (a motive that can be defended by reasoning or logical argument disapproval - (the act of disapproving or condemning) disapproval - (the act of disapproving or condemning) harmony, concord, concordance - (agreement of opinions) harmony, concord, concordance - (agreement of opinions)

19 Ann Devitt, TCD Do people agree on what is more/less general/specific? YES Cochran Q statistic (Cochran 1950) Cochran Q statistic (Cochran 1950) H 0 : that any agreement between respondents is due to chance H 0 : that any agreement between respondents is due to chance Overall: for 11 respondents Overall: for 11 respondents  Cochran's Q165.859  44 degrees of freedom  Asymp. Sig..000

20 Ann Devitt, TCD What WN features correlate? Depth Depth  Less deep = more general Children Children  Inconclusive Sisters Sisters  Less sisters = more general Sub-hierarchy Sub-hierarchy  Did not seem to affect judgments  Did increase the difficulty of the task

21 Ann Devitt, TCD Conclusion WordNet metrics WordNet metrics  Inheritance: Sub-hierarchy and parentage  Branching Factor  Distance: depth and height  Clustering Pilot study Pilot study  Suggests where to go with a larger study

22 Ann Devitt, TCD Bibliography W. G. Cochran: The comparison of percentages in matched samples. Biometrika, 37:256-266, 1950 W. G. Cochran: The comparison of percentages in matched samples. Biometrika, 37:256-266, 1950 David Touretsky: The Mathematics of Inheritance Systems, Los Altos, CA: Morgan Kaufmann (1986) David Touretsky: The Mathematics of Inheritance Systems, Los Altos, CA: Morgan Kaufmann (1986) D. J. Watts and S. H. Strogatz: Collective dynamics of small world networks, Nature 401, 130 (1999) D. J. Watts and S. H. Strogatz: Collective dynamics of small world networks, Nature 401, 130 (1999)

23 Ann Devitt, TCD Multiple Inheritance vs Depth


Download ppt "The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland."

Similar presentations


Ads by Google