Problem Semi supervised sarcasm identification using SASI

Slides:



Advertisements
Similar presentations
Dan Jurafsky Lecture 4: Sarcasm, Alzheimers, +Distributional Semantics Computational Extraction of Social and Interactional Meaning SSLST, Summer 2011.
Advertisements

Sentiment Analysis on Twitter Data
Identifying Sarcasm in Twitter: A Closer Look
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.
Sarcasm Detection on Twitter A Behavioral Modeling Approach
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
K nearest neighbor and Rocchio algorithm
CS4705.  Idea: ‘extract’ or tag particular types of information from arbitrary text or transcribed speech.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
ISMB 2003 presentation Extracting Synonymous Gene and Protein Terms from Biological Literature Hong Yu and Eugene Agichtein Dept. Computer Science, Columbia.
Supervised Learning and k Nearest Neighbors Business Intelligence for Managers.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Classification Technology at LexisNexis SIGIR 2001 Workshop on Operational Text Classification Mark Wasson LexisNexis September.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
CONCEPTS AND TECHNIQUES FOR RECORD LINKAGE, ENTITY RESOLUTION, AND DUPLICATE DETECTION BY PETER CHRISTEN PRESENTED BY JOSEPH PARK Data Matching.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
Bootstrapping Information Extraction with Unlabeled Data Rayid Ghani Accenture Technology Labs Rosie Jones Carnegie Mellon University & Overture (With.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization Shubhanshu Mishra 1, Jana Diesner 1, Jason Byrne 2, Elizabeth.
Your Sentiment Precedes You: Using an author’s historical tweets to predict sarcasm Anupam Khattri 2, Aditya Joshi 1,3, Pushpak Bhattacharyya 1, Mark James.
An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Extending SASI to Satirical Product Reviews: A Preview Bernease Herman University of Michigan Monday, April 22, 2013.
Reputation Management System
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Sentiment Analysis on Tweets. Thumbs up? Sentiment Classification using Machine Learning Techniques Classify documents by overall sentiment. Machine Learning.
Homework 3 Progress Presentation -Meet Shah. Goal Identify whether tweet is sarcastic or not.
Sentiment Analysis of Twitter Messages Using Word2Vec
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
CRF &SVM in Medication Extraction
Sentiment Analysis Study
MID-SEM REVIEW.
Text Categorization Assigning documents to a fixed set of categories
iSRD Spam Review Detection with Imbalanced Data Distributions
Text Mining & Natural Language Processing
Measuring Complexity of Web Pages Using Gate
Text Mining & Natural Language Processing
Introduction to Sentiment Analysis
Presentation transcript:

Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon

Problem Semi supervised sarcasm identification using SASI Sarcasm: the activity of saying or writing the opposite of what you mean, or of speaking in a way intended to make someone else feel stupid or show them that you are angry

Datasets Twitter Dataset: Tweets are 140 characters or fewer Tweets can contain urls, references to other tweeters (@<user>) or hashtags #<tag> Slang, abbreviations, and emoticons are common 5.9 million tweets 14.2 average words per tweet 18.9% include a url, 35.3% contain @<user> 6.9% contain one or more hashtags

Datasets Amazon Dataset: 66,000 reviews of 120 products 953 characters on average Usually structured and grammatical Have fields including writer, date, rating, and summary Amazon reviews have a great deal of context compared to tweets

Classification The algorithm is semi-supervised Seeded with a small group of labeled sentences The seed is annotated with a sarcasm ranking in [1,5] Syntatic and pattern based features are used to build a classifier

Data Preprocessing Specific information was replaced with general tags to facilitate pattern matching: ‘[PRODUCT]’,‘[COMPANY]’,‘[TITLE]’ ‘[AUTHOR]’ ‘[USER]’,‘[LINK]’ and ‘[HASHTAG]’. All HTML tags removed

Pattern Extraction and Selection Words are classified into high frequency words (HFW) and (CW) A pattern is an ordered sequence of HFWs and slots for CWs “[COMPANY] CW does not CW much” Generated patterns were removed if they were present in two seeds with rankings 1 and 5 Patterns were removed which appear only in reference to a single product

Pattern Matching

Other Features (1) Sentence length in words, (2) Number of “!” characters in the sentence (3) Number of “?” characters in the sentence (4) Number of quotes in the sentence (5) Number of capitalized/all capitals words in the sentence.

Data Enrichment Assumption: Sentences near a sarcastic sentence are similarly sarcastic Using the seed set for the Amazon data, perform a yahoo search for text snippets containing the seeds. Include the surrounding sentences in the seed, annotated similarly to the original search parameters

Classification Similar to kNN The score for a new instance is the weighted average of the k nearest training set vectors, measured using Euclidean distance

Baseline Assume sarcasm implies saying the opposite of what you mean Identify reviews with few stars and decide that sarcasm is present if strongly positive words appear in the review

Training Sets Amazon: Twitter 80 positive and 505 negative examples (471/5020 expanded) Twitter 1500 #sarcasm hash tagged tweets (Noisy) Changed to be positive examples from the Amazon dataset and manually selected negative examples from the Twitter dataset

Test Sets 90 positive and 90 negative examples each for Amazon and Twitter Only sentences containing a named entity or named entity reference were sampled (more likely to contain sentiment → relevance) Non-sarcastic sentences belong only to negative reviews, increasing the chance that they contain negative sentiment MTurk used to create a gold standard for the test set. Each sentence was annotated by 3 annotators.

Inter-Annotator Agreement Amazon: k = 0.34 Twitter: k = 0.41 Superior Twitter agreement is attributed to lack of context in the medium

Tables

Baseline Intuitions The Baseline has high precision, but low recall It cannot recognize subtly sarcastic sentences These results imply that the definition “saying the opposite of what you mean” is not a good indicator of sarcasm

Reasons for Good Twitter Results Robustness of sparse and incomplete pattern matching SASI learns a model with a feature space spanning over 300 dimensions Sarcasm may be easier to detect in tweets because tweeters have to go out of their way to make sarcasm explicit in an environment with no context

Notes #sarcasm tags were unreliable Punctuation marks were the weakest predictors, in contrast to the findings of Teppermann et al. (2006) The exception to this is the use of ellipses, which was a strong predictor in combination with other features