Subjectivity and Sentiment Analysis of Arabic Tweets with Limited Resources Supervisor Dr. Verena Rieser Presented By ESHRAG REFAEE OSACT 27 May 2014.

Slides:

Advertisements

Similar presentations

University of Sheffield NLP Module 11: Advanced Machine Learning.

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Sentiment Analysis on Twitter Data

Farag Saad i-KNOW 2014 Graz- Austria,

Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.

Distant Supervision for Emotion Classification in Twitter posts 1/17.

TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.

Sarcasm Detection on Twitter A Behavioral Modeling Approach

Towards Separating Trigram- generated and Real sentences with SVM Jerry Zhu CALD KDD Lab 2001/4/20.

Sentiment Analysis An Overview of Concepts and Selected Techniques.

SUPERVISORS DR. VERENA RIESER & PROF. ROB POOLEY SENTIMENT ANALYSIS OF ARABIC SOCIAL NETWORKS PRESENTED BY ESHRAG REFAEE.

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)

Text Classification With Support Vector Machines

“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.

Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin

Scalable Text Mining with Sparse Generative Models

Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.

Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.

Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.

(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**

Sentiment Analysis of Social Media Content using N-Gram Graphs Authors: Fotis Aisopos, George Papadakis, Theordora Varvarigou Presenter: Konstantinos Tserpes.

Introduction to Text and Web Mining. I. Text Mining is part of our lives.

Aspect Guided Text Categorization with Unobserved Labels Dan Roth, Yuancheng Tu University of Illinois at Urbana-Champaign.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.

1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON.

*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

PIER Research Methods Protocol Analysis Module Hua Ai Language Technologies Institute/ PSLC.

Prediction of Influencers from Word Use Chan Shing Hei.

Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma

TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.

Interlingua Annotation Owen Rambow Advaith Siddharthan Kathleen McKeown

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

Recognizing Stances in Ideological Online Debates.

Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.

CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.

Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C

Kim Schouten, Flavius Frasincar, and Rommert Dekker

Name: Sushmita Laila Khan Affiliation: Georgia Southern University

Sentiment analysis algorithms and applications: A survey

Jingcheng Du, B.S., Jun Xu, Ph.D., Hsingyi Song, MPH, Cui Tao, Ph.D.

Aspect-Based Sentiment Analysis Using Lexico-Semantic Patterns

Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang

Predicting Prevalence of Influenza-Like Illness From Geo-Tagged Tweets

Extracting Why Text Segment from Web Based on Grammar-gram

Stance Classification of Ideological Debates

Presentation transcript:

Subjectivity and Sentiment Analysis of Arabic Tweets with Limited Resources Supervisor Dr. Verena Rieser Presented By ESHRAG REFAEE OSACT 27 May 2014

Outline 1. Introduction The concept of subjectivity and sentiment analysis (SSA) Motivations and challenges of SSA for Arabic Previous work on SSA of Arabic social networks 2. Experimental setup Twitter corpus: collection and annotation Evaluation metrics Machine learners 3. Results and Error Analysis 4. Summary and future work 2

Subjectivity and Sentiment analysis (SSA) Definition: Analysing and understanding people’s sentiments, evaluations, opinions, attitudes, and emotions from written text. 3

Hierarchical Model of Subjectivity and Sentiment analysis (SSA) 4 User- generated text SubjectivePositiveNegativeObjective

Applications In addition to its significance as a major sub-field of Natural Language Processing (NLP) research, SSA has a range of real-world applications:  Commercial applications measuring success of a product  Social applications  Political applications  Economical applications 5

SSA and Social Networks The growing importance of sentiment analysis coincides with the growth of social media such as micro-blogs. 6

7

Twitter (Statistic Brain, 2014) March 2012, Twitter now available in Arabic (Twitter Blog, 2012) 8 Twitter ~60 M tweets/day >600 M active users 10 th most popular site in the world SSA and Twitter

About Arabic Arabic is the language of over 422 million people First language of the 22 member countries of the Arabic League Official language in three other countries (UNISCO, 2013). 9

About Arabic Arabic is the language of over 422 million people Arabic language can be classified into three major levels (Habash, 2010):  Classic Arabic (CA)  Modern standard Arabic (MSA)  Arabic Dialects (AD). 10 Used in social networks side-by-side

Challenges with Respect to Arabic  Limited availability of NLP resources for DA.  Noisy features.  No large-scale Arabic Twitter corpus annotated for SSA publically available.  Sparse labelled data.  BUT: Lots of unlabelled data! 11

Challenges With Respect to Twitter ‘Bad language’ (Eisenstein, J. 2013) Unclear sentiment indicator Dynamic nature/ topic-shifting (Go et al, 2009). 12 المساواة في قمع الحريات الشخصية عدل Equality in supressing personal freedom is justice ew, ugh instead of disgusting bro instead of brother

Previous Work on SSA of Arabic Tweets PublicationFeature-setsClassificatio n scheme Results Abdul_magged et al (2012) Stem and lemma word tokens, POS, semantic features, user: person/org SVM (two-stage binary classification) The best acc % for sentiment analysis and 79.01% for subjectivity analysis Mourad and Darwish (2013) Stem word tokens, tweets-specific features, stylistic features SVM and NB with 10-fold cross-validation The best acc. 64.1% for subjectivity classification and 72.5% for sentiment classification 13 Mainly Supervised Learning on manually annotated corpora.  Costly annotations.  Not scalable/ applicable to unseen topics!

Previous Work on SSA of Arabic Tweets PublicationFeature-setsClassificati on scheme DatasetsResults Abdul_Mageed et al (2012) Stem and lemma word tokens, POS, semantic features, user: person/org SVM (two- stage binary classification) 3k Arabic tweets The best acc % for sentiment analysis and 79.01% for subjectivity analysis Mourad and Darwish (2013) Stem word tokens, tweets- specific features, stylistic features SVM and NB with 10-fold cross- validation 2,300 Arabic tweets The best acc. 64.1% for subjectivity classification and 72.5% for sentiment classification 14  Word-based features.  SVM shown to perform best (large feature sets)  Evaluation:  10-fold cross-validation  Held-out test set from same corpus  No test for unseen topics/ scalability for topic shift!

Outline 1. Introduction Motivations and challenges of subjectivity and sentiment analysis (SSA) for Arabic Previous work on SSA of Arabic social networks 2. Experimental setup Twitter corpus: collection and annotation Evaluation metrics Machine learners 3. Results and Error Analysis 4. Summary and future work 15

Methodology and Approach Un- labelled tweets Human annotators Gold- standard labelled tweets Arabic ALP tools Train machine learning scheme: SVM classifier Manually- annotated held-out test set Features Model evaluation

Arabic Twitter SSA Corpora 17

Arabic Twitter SSA Corpora: Gold Standard Data Set Manually annotated for sentiment analysis (total=3,309) 2 native speaker annotators (weighted Kappa=0.76) 18

Arabic Twitter SSA Corpora: Held-out Test Set 963 tweets were manually annotated for evaluating the trained models. 19

Arabic Twitter SSA Corpora Sentiment labelExample Positive السياحة في اليمن جمال لا يصدق Tourism in Yemen, unbelievable beauty Negative حنا للأسف نستخدم ايفون Unfortunately, we use the iPhone Neutral ميركل تدعو اوكرانيا لتشكيل حكومة جديدة Merkel calls for Ukraine to form a new government 20  Examples of annotated tweets

Features Extraction TypeLinguistic tool/resourceFeature-set Morphological features Arabic morphological analyser: MADA + TOKAN V3.2 (Habash and Rambow, 2005 & Habash, and Roth, 2009). Diacritic, Aspect, Gender, mood, person,part-of- speech, State, voice, Has- morph-analysis Syntactic features N-grams of word tokens Semantic features Polarity lexicons: 1)ArabSenti (Abdul- Mageed et al, 2011) 2)MPQA-translation (Wilson et al, 2005) Has-positive-lexicon, Has-negative-lexicon, Has-neutral-lexicon, Has-negator Stylistic features Has-positive-emoticon, Has-negative-emoticon 21

Subjectivity and Sentiment Classification Experiments 22

SSA Classification: Problem Formulations 23 TextSubjectivePositiveNegativeObjective TextPositiveNegativeNeutral

Machine Learning Classifiers  Support Vector Machines (SVM): Sequential Minimal Optimization-SMO (Platt, 1999)  Majority baseline: ZeroR 24 SVM aims to identify the Optimal hyperplane that linearly separates data instances with the maximum margin (Hsu et al, 2003)

Evaluation Metrics F-measure Accuracy: Significant differences: T-test with p<

Outline 1. Introduction Motivations and challenges of subjectivity and sentiment analysis (SSA) for Arabic Previous work on SSA of Arabic social networks 2. Experimental setup Twitter corpus: collection and annotation Evaluation metrics Machine learners 3. Results and Error Analysis 4. Summary and future work 26

Results and evaluation Data-set Majority baseline SVM 10-fold cross- validation SVM Held-out test set FAccF F Polar vs. neutral Positive vs. negative Positive vs. negative vs. neutral

Error Analysis: 28  The most predictive word uni-grams in the two datasets as evaluated by Chi-Squared IDDevelopment set (Spring’13)Test set (Autumn’13) ArabicEnglishArabicEnglish 1 الخيرWell-being اجمل More beautiful الشعبNation7.114احسن Better اجملMore beautiful6.9927آه (sigh) ماهرSkilful5.0705سعادة Happines s مبروك Congratulations 4.984الخير Welfare/ Well- being 4.689

Error Analysis 29  The most predictive word uni-grams in the two datasets as evaluated by Chi-Squared IDDevelopment set (Spring’13)Test set (Autumn’13) ArabicEnglishArabicEnglish 1 الخيرWell-being اجمل More beautiful الشعبNation7.114احسن Better اجمل More beautiful آه (sigh) ماهرSkilful5.0705سعادة Happines s مبروك Congratulation s 4.984الخير Welfare/ Well- being 4.689

Current Work A large-scale Arabic Twitter SSA Corpus: DISTANT supervision (DS) data set **Refaee and Rieser (2014). Can we Read Emotions from a smiley face? Emoticon-based distant supervision for subjectivity and sentiment analysis of Arabic Twitter feeds. In the 5th International Workshop on Emotion, Social Signals, Sentiment and Linked Open Data. 30 Un-labelled tweets Noisy labels: #hashtags & Automatically- labelled tweets Arabic ALP tools Train machine learning scheme: Learn SVM classifier Model evaluation: Manually- annotated test set Features

Current work Annotate and release a newly collected gold-standard Arabic Twitter corpus*  Extended feature-sets: * Available via ELRA repository, details described in [Refaee & Rieser, LREC 2014]. 31 TypeFeature-set Twitter-specific featuresHas-hashtag, has-URL, is-favourite, is-retweet Social signalsHas-consents, has-dazzle, has-laugh, has-regret, has-sigh Language styleMSA/DA, is-sarcastic Tweet categoryTweet-category {politics, sport, social, religious, internet, commercial, etc.} Number of instances6,894 Word frequencies91,419 Word tokens28,373

32 Please come and see my poster on May 29, Time 11:45-13:25 Session: social media processing P 32 No. 317

Thanks Looking forward to hear your feedback … Or contact me 33

DS for SSA of social networks in other languages 34 LanguagePublication Auto- sentiment feature Sentiment labels Feature-sets Classificatio n schemes Results English Go et al (2009) Emoticons Positive vs. negative Unigrams, bigrams, and POS NB, SVM, ME Best Accuracy= 83% Bifet and Frank (2010) Emoticons Positive vs. negative Unigrams Multinomial NB, SGD Best accuracy= 82.45% Purver and Battersby (2012) Emoticons 6 emotion classes UnigramsSVM Best F- score=77.5% (detecting happiness) Suttles and Ide (2013) Emoticons, hashtags and emoji 8 emotion classes (binary classification) UnigramsNB, ME Best acc. 90.6% {joy vs. sadness} Chinese Yuan and Purver (2012) Emoticons 6 emotion classes Character- based and word-based N-grams SVM Best accuracy= 78.2% (detecting happiness)

Example of annotation disagreement 35 #Tweet textLabel Annot ator 1 Annot ator 2 1 لنرى قوتكم يا ارهابيه بشار الاسد لنسحقكم ونحن لا نتشرف بلقياكم يا كلاب الناتو Let’s see your power you the terrorists of Bashar Al-Assad to crush you and we do not even want to see you, you NATO’s dogs Negati ve 2يوجد ايفون بين كل اربعة هواتف ذكية There is an iPhone among each 4 smart phones Neutra l (facts) Neutra l (no- clear positiv e evalua tion) 3 تعتبر السياحة مورد هام للاقتصاد البحريني حيث بلغ عدد السائحين في 2007 الى 4.8 مليون سائح ومتوقع ان يزداد بشكل كبير جدا Tourism is considered as an important revenue of the Bahrain's economy, as number of tourists in 2007 reached 4.8 M and expected to increase (very) enormously Positiv e (positi ve evalua tion) Neutra l (news) 4علمتنا الثورات العربية ان بشار الاسد عنده حقThe political revolution (Arab Spring) has taught us that Bashar Al-Assad is right Neutra l (sarca stic view) Negati ve (negati ve stance )

Methodology and Approach Un- labelled tweets Noisy labels: #hashtags & Automaticall y-labelled tweets Arabi c ALP tools Train machine learning scheme: Learn SVM classifier Model evaluatio: Manually- annotated test set Features

Approach and methodology Arabic Twitter Corpora Build and annotate a Twitter corpora for SSA Machine Learning Algorithm Apply a machine learning scheme: Support Vector Machines (SVM) Build a sentiment classifier Learn a statistical classifier to discriminate a given text to: subjective vs. objective subjective positive vs. subjective negative Evaluate and test models’ capabilities of being generalised Independent test set 37

Experimental settings Pre-processing Remove re-tweets Normalize Latin characters, digits, URLs, user-names, hashtags Replace > 2 repetitive characters consecutively with only 2 Apply light Arabic stemmer Remove stop words Problem formulations Two-stage binary classification: subjective vs. objective; positive vs. negative One-stage multi-class classification: positive vs. negative vs. neutral 38

DS for SSA of social networks in other languages 39 LanguagePublication Auto- sentiment feature Sentiment labels Feature-sets Classificatio n schemes Results English Go et al (2009) Emoticons Positive vs. negative Unigrams, bigrams, and POS NB, SVM, ME Best Accuracy= 83% Bifet and Frank (2010) Emoticons Positive vs. negative Unigrams Multinomial NB, SGD Best accuracy= 82.45% Purver and Battersby (2012) Emoticons 6 emotion classes UnigramsSVM Best F- score=77.5% (detecting happiness) Suttles and Ide (2013) Emoticons, hashtags and emoji 8 emotion classes (binary classification) UnigramsNB, ME Best acc. 90.6% {joy vs. sadness} Chinese Yuan and Purver (2012) Emoticons 6 emotion classes Character- based and word-based N-grams SVM Best accuracy= 78.2% (detecting happiness)

DS for SSA of social networks in other languages 40 LanguagePublication Auto- sentiment feature Sentiment labels Feature-sets Classificatio n schemes Results English Go et al (2009) Emoticons Positive vs. negative Unigrams, bigrams, and POS NB, SVM, ME Best Accuracy= 83% Bifet and Frank (2010) Emoticons Positive vs. negative Unigrams Multinomial NB, SGD Best accuracy= 82.45% Purver and Battersby (2012) Emoticons 6 emotion classes UnigramsSVM Best F- score=77.5% (detecting happiness) Suttles and Ide (2013) Emoticons, hashtags and emoji 8 emotion classes (binary classification) UnigramsNB, ME Best acc. 90.6% {joy vs. sadness} Chinese Yuan and Purver (2012) Emoticons 6 emotion classes Character- based and word-based N-grams SVM Best accuracy= 78.2% (detecting happiness)