Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.

Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur

Punning Deliberate use of lexical ambiguity to create humour Three types of Puns: Homographic: same written word An elephant's opinion carries a lot of weight Homophonic: same spoken word Atheism is a non-prophet institution Imperfect: differs in both spelling and pronunciation The sign at the nudist camp read, “Clothed until April”

Problem Statement Pun disambiguation - identifying the multiple senses of a term known a priori to be a pun Focus on homographic mono-lexeme puns in this paper Dataset Creation using user-submitted puns and private collections by professional humorists Corpus pruned by trained human annotators to: One pun per instance One content word per pun Two meanings per pun Weak homography

Word Sense Disambiguation To determine the intended sense of a polysemous term in a given communicative act WSD systems require: Running Context Sense Inventory Approaches to WSD: Knowledge based using Lexical Semantic Resources Supervised machine learning

Lesk Algorithm Common topic is shared by words in a neighbourhood Simplified Lesk (SL) Compare various dictionary definitions of ambiguous target word with terms in its neighbouring context The sense which has maximum overlap is intended Limitations: Dependent on exact wordings of definition The dictionary glosses are very short - coarse grained

Lesk Algorithm Solution: use thesaurus that includes synonyms, homonyms and derivations Sense inventory like WordNet is used New Problem: WordNet is too fine-grained use clustering or coarsening techniques

Improvement in Algorithm Find word lemma and Part of Speech (POS) tagging to narrow list of candidate senses Simplified Extended Lesk (SEL) Modified SL by concatenating each sense’s definition with those of neighbouring senses from WordNet Simplified lexically expanded Lesk (SLEL) Extension of SL using 100 entries from large distributional thesaurus to expand each word’s sense

Tie Breaking Algorithms fail when tie in the highest lexical overlap Two tie-breaking approaches: POS tie-breaker Preferentially selects the best sense/pair of senses whose POS matches the result of Stanford POS tagger Clustering of WordNet senses Aligning WordNet to coarse-grained OmegaWiki LSR Based on hypothesis - humourous puns more likely to exploit coarse-grained homonymy than fine-grained systematic polysemy

Baselines Two baselines have been proposed for comparison Random selection from the candidate senses Most Frequent Sense (MFS) Selecting the candidate sense with highest frequency in the manually tagged sense corpus MFS baselines - difficult to beat as built on expensive sense-tagged data Benchmark for the performance of knowledge based disambiguators

Results

Observations Performs well over MFS baseline Accuracy lower on verbs – have highest polysemy Dataset small for machine learning techniques Explore additional tie-breaking algorithms Remove assumption of given a priori pun text with pun detection

Thank You !

Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.

Similar presentations

Presentation on theme: "Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.

Similar presentations

Presentation on theme: "Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur."— Presentation transcript:

Similar presentations

About project

Feedback