Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.

Similar presentations


Presentation on theme: "Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur."— Presentation transcript:

1 Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur

2 Punning Deliberate use of lexical ambiguity to create humour Three types of Puns: Homographic: same written word An elephant's opinion carries a lot of weight Homophonic: same spoken word Atheism is a non-prophet institution Imperfect: differs in both spelling and pronunciation The sign at the nudist camp read, “Clothed until April”

3 Problem Statement Pun disambiguation - identifying the multiple senses of a term known a priori to be a pun Focus on homographic mono-lexeme puns in this paper Dataset Creation using user-submitted puns and private collections by professional humorists Corpus pruned by trained human annotators to: One pun per instance One content word per pun Two meanings per pun Weak homography

4 Word Sense Disambiguation To determine the intended sense of a polysemous term in a given communicative act WSD systems require: Running Context Sense Inventory Approaches to WSD: Knowledge based using Lexical Semantic Resources Supervised machine learning

5 Lesk Algorithm Common topic is shared by words in a neighbourhood Simplified Lesk (SL) Compare various dictionary definitions of ambiguous target word with terms in its neighbouring context The sense which has maximum overlap is intended Limitations: Dependent on exact wordings of definition The dictionary glosses are very short - coarse grained

6 Lesk Algorithm Solution: use thesaurus that includes synonyms, homonyms and derivations Sense inventory like WordNet is used New Problem: WordNet is too fine-grained use clustering or coarsening techniques

7 Improvement in Algorithm Find word lemma and Part of Speech (POS) tagging to narrow list of candidate senses Simplified Extended Lesk (SEL) Modified SL by concatenating each sense’s definition with those of neighbouring senses from WordNet Simplified lexically expanded Lesk (SLEL) Extension of SL using 100 entries from large distributional thesaurus to expand each word’s sense

8 Tie Breaking Algorithms fail when tie in the highest lexical overlap Two tie-breaking approaches: POS tie-breaker Preferentially selects the best sense/pair of senses whose POS matches the result of Stanford POS tagger Clustering of WordNet senses Aligning WordNet to coarse-grained OmegaWiki LSR Based on hypothesis - humourous puns more likely to exploit coarse-grained homonymy than fine-grained systematic polysemy

9 Baselines Two baselines have been proposed for comparison Random selection from the candidate senses Most Frequent Sense (MFS) Selecting the candidate sense with highest frequency in the manually tagged sense corpus MFS baselines - difficult to beat as built on expensive sense-tagged data Benchmark for the performance of knowledge based disambiguators

10 Results

11

12 Observations Performs well over MFS baseline Accuracy lower on verbs – have highest polysemy Dataset small for machine learning techniques Explore additional tie-breaking algorithms Remove assumption of given a priori pun text with pun detection

13 Thank You !


Download ppt "Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur."

Similar presentations


Ads by Google