Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.

Similar presentations


Presentation on theme: "1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth."— Presentation transcript:

1 1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth Supported by NSF Grants: #0092784, REC-9979894

2 2 Semantic Relatedness ► Some pairs of words are closer in meaning than others  E.g. car – tire are strongly related car – tree are not strongly related car – tree are not strongly related ► Relatedness between words can consist of  Synonymy [e.g. car – automobile]  Is-a/has-a relationships [e.g. car – tire]  Co-occurrence [e.g. car – insurance]

3 3 Goal of this Paper ► Create a measure to quantify semantic relatedness  Most existing work measures noun-noun only. ► Resnik (1995), Lin (1997), Jiang-Conrath (1997), Leacock-Chodorow (1998)  We can measure across parts of speech.  Based on WordNet definitions and relations. ► Evaluate  Using word sense disambiguation.  Compare to human relatedness judgments (in paper)

4 4 Description of WordNet ► Online English lexical database. ► Like dictionaries, contains word senses and their definitions or glosses  E.g.: sentence:  E.g.: sentence: “the penalty meted out to one adjudged guilty” ► Word senses that mean the same are grouped into synonym sets or synsets  E.g.: {sentence, conviction, condemnation}

5 5 sentence: “the penalty meted out to one adjudged guilty” Synsets are connected to other synsets through “semantic relations” Semantic Relations in WordNet

6 6 final judgment: “a judgment disposing of the case before the court of law” sentence: “the penalty meted out to one adjudged guilty” a “sentence” is a … Synsets are connected to other synsets through “semantic relations”

7 7 Semantic Relations in WordNet final judgment: “a judgment disposing of the case before the court of law” sentence: “the penalty meted out to one adjudged guilty” Synsets are connected to other synsets through “semantic relations” [hypernym] a “sentence” is a …

8 8 Semantic Relations in WordNet final judgment: “a judgment disposing of the case before the court of law” sentence: “the penalty meted out to one adjudged guilty” hard time: “term served in a maximum security prison” death penalty: “punishment by death via execution” … is a “sentence” Synsets are connected to other synsets through “semantic relations” a “sentence” is a … [hypernym]

9 9 Semantic Relations in WordNet final judgment: “a judgment disposing of the case before the court of law” sentence: “the penalty meted out to one adjudged guilty” hard time: “term served in a maximum security prison” death penalty: “punishment by death via execution” … is a “sentence” Synsets are connected to other synsets through “semantic relations” [hyponym] a “sentence” is a … [hypernym]

10 10 Gloss Overlaps ≈ Relatedness ► Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g:  bank(1): “a financial institution”  bank(2): “sloping land beside a body of water”  lake: “a body of water surrounded by land”

11 11 Gloss Overlaps ≈ Relatedness ► Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g:  bank(1): “a financial institution”  bank(2): “sloping land beside a body of water”  lake: “a body of water surrounded by land”

12 12 Gloss Overlaps ≈ Relatedness ► Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g:  bank(1): “a financial institution”  bank(2): “sloping land beside a body of water”  lake: “a body of water surrounded by land” ► Gloss overlaps = # content words common to two glosses ≈ relatedness  Thus, relatedness (bank(2), lake) = 3  And, relatedness (bank(1), lake) = 0

13 13 Limitations of (Lesk’s) Gloss Overlaps ► Most glosses are very short.  So not enough words to find overlaps with. ► Solution: Extended gloss overlaps  Add glosses of synsets connected to the input synsets.

14 14 sentence: “the penalty meted out to one adjudged guilty” bench: “persons who hear cases in a court of law” # overlapped words = 0 Extending a Gloss

15 15 sentence: “the penalty meted out to one adjudged guilty” final judgment: “a judgment disposing of the case before the court of law” bench: “persons who hear cases in a court of law” hypernym # overlapped words = 0 Extending a Gloss

16 16 sentence: “the penalty meted out to one adjudged guilty” final judgment: “a judgment disposing of the case before the court of law” bench: “persons who hear cases in a court of law” hypernym # overlapped words = 2 Extending a Gloss

17 17 Creating the Extended Gloss Overlap Measure ► How to measure overlaps? ► Which relations to use for gloss extension?

18 18 How to Score Overlaps? ► Lesk simply summed up overlapped words. ► But matches involving phrases – phrasal matches – are rarer, and more informative  E.g. “court of law” ► Aim: Score of n words in a phrase > sum of scores of n words in shorter phrases ► Solution: Give a phrase of n words a score of  “court of law” gets score of 9.

19 19 Which Relations to Use? ► Hypernyms [ “car”  “vehicle” ] ► Hyponyms [ “car”  “convertible” ] ► Meronyms [ “car”  “accelerator” ] ► Holonym [ “car”  “train” ] ► Also-see relation [“enter”  “move in” ] ► Attribute [ “measure”  “standard” ] ► Pertainym [ “centennial”  “century” ]

20 20 Extended Gloss Overlap Measure ► Input two synsets A and B ► Find phrasal gloss overlaps between A and B ► Next, find phrasal gloss overlaps between every synset connected to A, and every synset connected to B ► Compute phrasal scores for all such overlaps ► Add phrasal scores to get relatedness of A and B ► A and B can be from different parts of speech.

21 21 Evaluation: On WSD ► Test semantic relatedness measures on Word Sense Disambiguation (WSD) task. ► WSD = determine the intended sense of a multi-sense word in a sentence  E.g.: I sat on the bank of the lake. ► Our WSD algorithm: Pick that sense of the target word that is most strongly related to its neighboring words. (based on Lesk ’86)

22 22 the bench pronounced the sentence Word sense disambiguation using a relatedness measure

23 23 the bench pronounced the sentence bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person”

24 24 the bench pronounced the sentence bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

25 25 the bench pronounced the sentence sentence: “the penalty meted out to one adjudged guilty” sentence: “a string of words that satisfies grammar rules” bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

26 26 the bench pronounced the sentence sentence: “the penalty meted out to one adjudged guilty” sentence: “a string of words that satisfies grammar rules” bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

27 27 the bench pronounced the sentence sentence: “the penalty meted out to one adjudged guilty” sentence: “a string of words that satisfies grammar rules” bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

28 28 the bench pronounced the sentence sentence: “the penalty meted out to one adjudged guilty” sentence: “a string of words that satisfies grammar rules” bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

29 29 the bench pronounced the sentence sentence: “the penalty meted out to one adjudged guilty” sentence: “a string of words that satisfies grammar rules” bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

30 30 the bench pronounced the sentence sentence: “the penalty meted out to one adjudged guilty” sentence: “a string of words that satisfies grammar rules” bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

31 31 the bench pronounced the sentence sentence: “the penalty meted out to one adjudged guilty” sentence: “a string of words that satisfies grammar rules” bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

32 32 the bench pronounced the sentence sentence: “the penalty meted out to one adjudged guilty” sentence: “a string of words that satisfies grammar rules” bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

33 33 the bench pronounced the sentence sentence: “the penalty meted out to one adjudged guilty” sentence: “a string of words that satisfies grammar rules” bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

34 34 the bench pronounced the sentence sentence: “the penalty meted out to one adjudged guilty” sentence: “a string of words that satisfies grammar rules” bench: “persons who hear cases in a court of law” bench: “a long seat for more than one person” pronounce: “pronounce judgment on” pronounce: “speak or utter in a certain way”

35 35 Evaluation Data ► Data from SENSEVAL-2 WSD exercise. ► 4,328 passages, each 2-3 sentences long and containing 1 multi-sense target word. ► Each target word labeled by humans with its most appropriate WordNet sense. ► WSD algorithm’s output senses compared against these human labels. ► Precision, recall, and f-measure reported.

36 36 Evaluation Results AlgorithmPrecisionRecallF-measure Sval-1 st 0.4020.4010.401 Extended Gloss 0.3510.3420.346 Sval-2 nd 0.2930.2930.293 Sval-3 rd 0.2470.2440.245 Lesk0.1830.1830.183 Random0.1410.1410.141

37 37 Which WN Relations Help? ► Evaluation with a single relation at a time  E.g., comparing only hypernyms, only hyponyms, etc. ► Result: No single comparison is a big source of information.  No pair exceeded f-measure of 0.136, as compared to overall f-measure of 0.346

38 38 Which WN Relations Help? ► Most helpful were:  Hyponym relation ► kinds of “car”  “compact”, “SUV”, “coupe”, etc.  Meronym relation ► parts of “car”  “accelerator”, “wheel”, “hood”, etc. ► These relations are usually one-many.  Thus they give access to many glosses. ► Implies: more glosses  more useful.

39 39 Conclusions ► We presented a new measure of semantic relatedness  Can operate across parts of speech. ► We evaluated on the task of WSD.  Performed much better than the Lesk baseline  Performance comparable to other systems. ► Future work:  Augment using corpus statistics.  Evaluate on different task.

40 40 Resources ► WordNet::Similarity (relatedness measures) (http://search.cpan.org/dist/WordNet-Similarity)  Extended gloss overlaps  Resnik, Lin, Jiang-Conrath  Leacock-Chodorow, Hirst-St. Onge  Edge Counting, Random ► SenseRelate (WSD using relatedness) (http://www.d.umn.edu/~tpederse/senserelate.html)


Download ppt "1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth."

Similar presentations


Ads by Google