Download presentation
1
Identifying Sarcasm in Twitter: A Closer Look
Roberto Gonzalez Smaranda Muresan Nina Wacholder
2
Aim of the study To construct a corpus of sarcastic utterances that have been explicitly labeled so by the composers themselves. (#sarcasm, #sarcastic) To exemplify the difficulty in distinguishing sarcastic sentences from negative/positive sentences.
3
Data Data for the study is divided in three sets of 900 tweets each: sarcastic, positive and negative. Each data set is culled from twitter using appropriate hash-tags. Sarcasm: #sarcasm, #sarcastic Positive: #happy, #joy, #lucky Negative: #sadness, #frustrated, #angry
4
Data Preprocessing Tweets tagged with #sarcasm or #sarcastic in the middle of the tweet removed. Manually checked to see if the tags were a part of the content of the tweet. Eg: “I really love #sarcasm”
5
Lexical features Unigrams Dictionary based Pennebaker et al (LIWC)
Linguistic Processes (adverbs, pronouns) Psychological Processes (Positive, negative emotion) Personal Concerns (work, achievement) Spoken Categories ( assent, non-fluencies) WordNet Affect List of interjections and punctuations
6
Pragmatic Features Positive emoticons Negative emoticons ToUser
smileys Negative emoticons Frowning faces ToUser @user
7
Comparisons and X2 rankings
8
Classification Logistic Regression and Support Vector Machine with SMO (sequential minimal optimization) Features used: Unigrams Dictionary features presence (LIWC+_P) Dictionary features frequency (LIWC+_F)
9
Classification Results
10
Comparison against human performance
3 judges asked to classify tweets as sarcastic, positive or negative. (90 tweets per category) S-N-P: 50% agreement (k = ) S-NS: 71.67% agreement (k = ) Emoticon based S-NS: 89% agreement (k = 0.74)
11
Human Comparison results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.