1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.

1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth Supported by NSF Grants: #0092784, REC-9979894

2 Semantic Relatedness ► Some pairs of words are closer in meaning than others  E.g. car – tire are strongly related car – tree are not strongly related car – tree are not strongly related ► Relatedness between words can consist of  Synonymy [e.g. car – automobile]  Is-a/has-a relationships [e.g. car – tire]  Co-occurrence [e.g. car – insurance]

3 Goal of this Paper ► Create a measure to quantify semantic relatedness  Most existing work measures noun-noun only. ► Resnik (1995), Lin (1997), Jiang-Conrath (1997), Leacock-Chodorow (1998)  We can measure across parts of speech.  Based on WordNet definitions and relations. ► Evaluate  Using word sense disambiguation.  Compare to human relatedness judgments (in paper)

4 Description of WordNet ► Online English lexical database. ► Like dictionaries, contains word senses and their definitions or glosses  E.g.: sentence:  E.g.: sentence: “the penalty meted out to one adjudged guilty” ► Word senses that mean the same are grouped into synonym sets or synsets  E.g.: {sentence, conviction, condemnation}

5 sentence: “the penalty meted out to one adjudged guilty” Synsets are connected to other synsets through “semantic relations” Semantic Relations in WordNet

6 final judgment: “a judgment disposing of the case before the court of law” sentence: “the penalty meted out to one adjudged guilty” a “sentence” is a … Synsets are connected to other synsets through “semantic relations”

7 Semantic Relations in WordNet final judgment: “a judgment disposing of the case before the court of law” sentence: “the penalty meted out to one adjudged guilty” Synsets are connected to other synsets through “semantic relations” [hypernym] a “sentence” is a …

8 Semantic Relations in WordNet final judgment: “a judgment disposing of the case before the court of law” sentence: “the penalty meted out to one adjudged guilty” hard time: “term served in a maximum security prison” death penalty: “punishment by death via execution” … is a “sentence” Synsets are connected to other synsets through “semantic relations” a “sentence” is a … [hypernym]

9 Semantic Relations in WordNet final judgment: “a judgment disposing of the case before the court of law” sentence: “the penalty meted out to one adjudged guilty” hard time: “term served in a maximum security prison” death penalty: “punishment by death via execution” … is a “sentence” Synsets are connected to other synsets through “semantic relations” [hyponym] a “sentence” is a … [hypernym]

10 Gloss Overlaps ≈ Relatedness ► Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g:  bank(1): “a financial institution”  bank(2): “sloping land beside a body of water”  lake: “a body of water surrounded by land”

11 Gloss Overlaps ≈ Relatedness ► Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g:  bank(1): “a financial institution”  bank(2): “sloping land beside a body of water”  lake: “a body of water surrounded by land”

12 Gloss Overlaps ≈ Relatedness ► Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g:  bank(1): “a financial institution”  bank(2): “sloping land beside a body of water”  lake: “a body of water surrounded by land” ► Gloss overlaps = # content words common to two glosses ≈ relatedness  Thus, relatedness (bank(2), lake) = 3  And, relatedness (bank(1), lake) = 0

13 Limitations of (Lesk’s) Gloss Overlaps ► Most glosses are very short.  So not enough words to find overlaps with. ► Solution: Extended gloss overlaps  Add glosses of synsets connected to the input synsets.

14 sentence: “the penalty meted out to one adjudged guilty” bench: “persons who hear cases in a court of law” # overlapped words = 0 Extending a Gloss

15 sentence: “the penalty meted out to one adjudged guilty” final judgment: “a judgment disposing of the case before the court of law” bench: “persons who hear cases in a court of law” hypernym # overlapped words = 0 Extending a Gloss

16 sentence: “the penalty meted out to one adjudged guilty” final judgment: “a judgment disposing of the case before the court of law” bench: “persons who hear cases in a court of law” hypernym # overlapped words = 2 Extending a Gloss

17 Creating the Extended Gloss Overlap Measure ► How to measure overlaps? ► Which relations to use for gloss extension?

18 How to Score Overlaps? ► Lesk simply summed up overlapped words. ► But matches involving phrases – phrasal matches – are rarer, and more informative  E.g. “court of law” ► Aim: Score of n words in a phrase > sum of scores of n words in shorter phrases ► Solution: Give a phrase of n words a score of  “court of law” gets score of 9.

19 Which Relations to Use? ► Hypernyms [ “car”  “vehicle” ] ► Hyponyms [ “car”  “convertible” ] ► Meronyms [ “car”  “accelerator” ] ► Holonym [ “car”  “train” ] ► Also-see relation [“enter”  “move in” ] ► Attribute [ “measure”  “standard” ] ► Pertainym [ “centennial”  “century” ]

20 Extended Gloss Overlap Measure ► Input two synsets A and B ► Find phrasal gloss overlaps between A and B ► Next, find phrasal gloss overlaps between every synset connected to A, and every synset connected to B ► Compute phrasal scores for all such overlaps ► Add phrasal scores to get relatedness of A and B ► A and B can be from different parts of speech.

21 Evaluation: On WSD ► Test semantic relatedness measures on Word Sense Disambiguation (WSD) task. ► WSD = determine the intended sense of a multi-sense word in a sentence  E.g.: I sat on the bank of the lake. ► Our WSD algorithm: Pick that sense of the target word that is most strongly related to its neighboring words. (based on Lesk ’86)

22 the bench pronounced the sentence Word sense disambiguation using a relatedness measure

23 the bench pronounced the sentence bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person”

24 the bench pronounced the sentence bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

25 the bench pronounced the sentence sentence: “the penalty meted out to one adjudged guilty” sentence: “a string of words that satisfies grammar rules” bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

35 Evaluation Data ► Data from SENSEVAL-2 WSD exercise. ► 4,328 passages, each 2-3 sentences long and containing 1 multi-sense target word. ► Each target word labeled by humans with its most appropriate WordNet sense. ► WSD algorithm’s output senses compared against these human labels. ► Precision, recall, and f-measure reported.

36 Evaluation Results AlgorithmPrecisionRecallF-measure Sval-1 st 0.4020.4010.401 Extended Gloss 0.3510.3420.346 Sval-2 nd 0.2930.2930.293 Sval-3 rd 0.2470.2440.245 Lesk0.1830.1830.183 Random0.1410.1410.141

37 Which WN Relations Help? ► Evaluation with a single relation at a time  E.g., comparing only hypernyms, only hyponyms, etc. ► Result: No single comparison is a big source of information.  No pair exceeded f-measure of 0.136, as compared to overall f-measure of 0.346

38 Which WN Relations Help? ► Most helpful were:  Hyponym relation ► kinds of “car”  “compact”, “SUV”, “coupe”, etc.  Meronym relation ► parts of “car”  “accelerator”, “wheel”, “hood”, etc. ► These relations are usually one-many.  Thus they give access to many glosses. ► Implies: more glosses  more useful.

39 Conclusions ► We presented a new measure of semantic relatedness  Can operate across parts of speech. ► We evaluated on the task of WSD.  Performed much better than the Lesk baseline  Performance comparable to other systems. ► Future work:  Augment using corpus statistics.  Evaluate on different task.

40 Resources ► WordNet::Similarity (relatedness measures) (http://search.cpan.org/dist/WordNet-Similarity)  Extended gloss overlaps  Resnik, Lin, Jiang-Conrath  Leacock-Chodorow, Hirst-St. Onge  Edge Counting, Random ► SenseRelate (WSD using relatedness) (http://www.d.umn.edu/~tpederse/senserelate.html)

1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.

Similar presentations

Presentation on theme: "1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.

Similar presentations

Presentation on theme: "1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth."— Presentation transcript:

Similar presentations

About project

Feedback