SUPERVISORS DR. VERENA RIESER & PROF. ROB POOLEY SENTIMENT ANALYSIS OF ARABIC SOCIAL NETWORKS PRESENTED BY ESHRAG REFAEE.

Slides:

Advertisements

Similar presentations

Sentiment Analysis on Twitter Data

Advertisements

GermanPolarityClues A Lexical Resource for German Sentiment Analysis

Farag Saad i-KNOW 2014 Graz- Austria,

Distant Supervision for Emotion Classification in Twitter posts 1/17.

Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.

Subjectivity and Sentiment Analysis of Arabic Tweets with Limited Resources Supervisor Dr. Verena Rieser Presented By ESHRAG REFAEE OSACT 27 May 2014.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

A Metric for Software Readability by Raymond P.L. Buse and Westley R. Weimer Presenters: John and Suman.

Problem Semi supervised sarcasm identification using SASI

Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.

Sentiment Analysis An Overview of Concepts and Selected Techniques.

Every Term Has Sentiment: Learning from Emoticon Evidences for Chinese Microblog Sentiment Analysis Jiang Fei State Key Laboratory.

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

Text Categorization Hongning Wang Today’s lecture Bayes decision theory Supervised text categorization – General steps for text categorization.

Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Scalable Text Mining with Sparse Generative Models

Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.

Introduction to Machine Learning Approach Lecture 5.

Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.

More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,

Opinion Mining on the Web 2.0 Characteristics of User Generated Content and Their Impacts ITEC 547 Text Mining Ass. Professor: Nazife Dimililer Name: Feras.

Opinion mining in social networks Student: Aleksandar Ponjavić 3244/2014 Mentor: Profesor dr Veljko Milutinović.

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Sentiment Analysis of Social Media Content using N-Gram Graphs Authors: Fotis Aisopos, George Papadakis, Theordora Varvarigou Presenter: Konstantinos Tserpes.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Text Classification, Active/Interactive learning.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

©2012 Paula Matuszek CSC 9010: Text Mining Applications: Document-Based Techniques Dr. Paula Matuszek

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones,

TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.

Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization Shubhanshu Mishra 1, Jana Diesner 1, Jason Byrne 2, Elizabeth.

CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

CSC 594 Topics in AI – Text Mining and Analytics

Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.

Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.

Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Automated Sentiment Analysis from Blogs: Predicting the Change in Stock Magnitude Saleh Alshepani (BH115) Supervisor : Dr Najeeb Abbas Al-Sammarraie.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.

Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty.

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

A Sentiment-Based Approach to Twitter User Recommendation BY AJAY ABDULPUR RAJARAM NIKKAM.

Name: Sushmita Laila Khan Affiliation: Georgia Southern University

Sentiment analysis algorithms and applications: A survey

Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang

iSRD Spam Review Detection with Imbalanced Data Distributions

Automatic Extraction of Hierarchical Relations from Text

Extracting Why Text Segment from Web Based on Grammar-gram

Stance Classification of Ideological Debates

Presentation transcript:

SUPERVISORS DR. VERENA RIESER & PROF. ROB POOLEY SENTIMENT ANALYSIS OF ARABIC SOCIAL NETWORKS PRESENTED BY ESHRAG REFAEE

OUTLINE The concept of sentiment analysis Arabic as a morphologically rich language Aims of the research Sentiment analysis in English and Arabic literature Twitter corpus: collection and annotation Empirical work Results and evaluation Future work

SENTIMENT ANALYSIS Definition: Analysing and understanding people’s sentiments, evaluations, opinions, attitudes, and emotions from written text. Research on SA appeared early 2000 (Liu, 2012). SA is one of the most active research areas in NLP.

APPLICATIONS In addition to its significance as a major sub-field of Natural Language Processing (NLP)research, SSA has a potential of several:  Commercial applications measuring success of a product  Social applications  Political applications  Economical applications

SENTIMENT ANALYSIS OF SOCIAL NETWORKS The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, and micro-blogs. A social network like twitter, with more than 500 million active users (ALEXA, 2012), provides a global arena for users to share views, attitudes, preferences etc; and discuss points of agreement, and/or conflict. March 2012, Twitter has become available in Arabic (Twitter Blog, 2012)

ABOUT ARABIC Arabic is the language of an aggregate population of over 300 million people, first language of the 22 member countries of the Arabic League and official language in three others (Habash, 2010).

ABOUT ARABIC Arabic language can be classified into three major levels:  Classic Arabic (CA)  Modern standard Arabic (MSA)  Arabic Dialects (AD). Social networks uses DA & MSA side-by-side(Al- Sabbagh, and Girju, 2012).

AIMS Address the bottleneck of availability of NLP resources to study SA of Arabic micro-blogs genre by constructing a corpus of Arabic tweets, a subset of which is annotated for sentiment analysis. Use the corpus to build and test models of sentiment analysis. Employ freely available Arabic NLP tool for annotating language specific features, including Part-of-Speech tagging, and morphological analysis. Evaluate the quality of these features by measuring their contribution to the SA classification task.

AIMS OF THIS RESEARCH Construct a corpus of Arabic tweets for sentiment analysis. Build and test classification models for automatic sentiment analysis. Explore distant supervision approaches to build efficient models for the changing twitter stream.

SENTIMENT ANALYSIS OF ENGLISH TEXT Feature-sets Publication Word tokens Semantic Feat. Stylistic Feat. n-grams Morph Unique Domain POS User: PER/ORG Statistical Feat. Classification Schemes ResultsTargeted language Yu, H., & Hatzivassiloglou, V. (2003) NBAcc. 91 English(newswire articles, question-answering) Abbasi et al (2008) SVM 10-fold CV 2-stage classification Best Acc English and Arabic forums, movie reviews Osherenko, (2008) SVMprecision 44% recall 42% English (759sentences) Wilson et al (2009) Boos Texter, TiMBL, Ripper, SVM (1)Perfect neutral classification (manual). BL78.7 SVM81.6 (2) Auto neut. Detection SVM64. Neutral-polar SVM75.3 English (question-answering opinion corpus) Bifet and Frank (2010) Multi-nominal NB, SGD Best acc NB SGD Englis tweets (automatic annotation using emoticons) Pak and Paroubek (2010) NB SVM 60% FEnglish tweets Purver and Battersby (2012) SVM 10-fold CV Six-class emotion detection 77.5% F for happiness on manual test set English tweets-distant Learning (automatic annotation using emoticons) noisy labels

SENTIMENT ANALYSIS OF ARABIC TEXT Feature-sets Publication Word tokens Semantic Feat. Stylistic Feat. n-grams Morph Unique Domain POS User: PER/ORG Statistical Feat. Classification Schemes ResultsTargeted language Abbasi et al (2008) SVM 10-fold CV 2-stage classification Best Acc English and Arabic forums, movie reviews Farra et al (2010) SVM, J48 10-fold CV Acc. Grammatical 89.3/semant 80 Arabic movie reviews(44) Abdul-Mageed et al 2011 SVM 2-stage classification (-neutral) Manual polarity MSA lexicon Stem+morph+ADJ F 5-fold CV F (with the best config. Modern Standard Arabic El-Halees, 2011 Max entropy, k- nearest, NB, SVM Best acc Arabic forum posts(1143) Itani et al 2012 Naïve BayesBest acc Arabic (Facebook posts) Mourad and Darwish 2013 NB, and SVM 2- stage (sentiment: only positive vs. negative) 10- fold CV Best acc. On tweet SUBJ 64.1, SENTI 72.5 Arabic tweets (2,300 manual annotation)

APPROACH AND METHODOLOGY Arabic Twitter Corpora Build and annotate a Twitter corpora for SSA Machine Learning Algorithm Apply a machine learning scheme : Support Vector Machines (SVM) Naïve Bayes (NB) Decision Tree (J48) Build a sentiment classifier Learn a statistical classifier to discriminate a given text to: subjective vs. objective subjective positive vs. subjective negative Evaluate and test models’ capabilities of being generalised 10 fold cross- validation Independent test set

BUILDING TRAINING SET 1: DEFINING THE ANNOTATION SCHEME LabelDefinitionExample Polar  Positive or negative emotion, evaluation, or attitude. السياحة في اليمن جمال لا يصدق Tourism in Yemen, unbelievable beauty positive  Clear positive indicator كم انت عظيم يا بشار الاسد How great you are, Bashar Al-Asad Negative  Clear negative indicator حنا للأسف نستخدم ايفون Unfortunately, we use the iPhone Neutral  Simple factual statement/ news  Open questions with no emotions indicated  Undeterminable indicators/neither positive or negative وفاة جديدة بإتش 7 إن 9 بالصين A new reported death case with H7N9 in China كيف انقطعت الإنترنت عن سوريا؟ How was the Internet disconnected from Syria? لمساواة في قمع الحريات الشخصية عدل Equality in suppressing personal freedoms is justice

BUILDING TRAINING SET 2: AGREEMENT STUDY we conducted an inter-annotator agreement study on a subset of 677 of the annotated tweets. We use Cohen’s Kappa (Cohen, 1960) which measures the degree of agreement among the assigned labels, correcting for agreement by chance. Where Pr(a) is the observed agreement among annotators, and Pr(e) is the probability of agreement by chance among annotators. The overall observed agreement is 84.79% and resulting weighted Kappa reached 0.756, which indicates a reliable annotations.

OUR ARABIC TWITTER CORPUS  (Refaee E, and Rieser V, 2014). An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014) Reykjavik, Iceland.  Corpus freely available from LREC repository.

APPROACH AND METHODOLOGY Arabic Twitter Corpora Build and annotate a Twitter corpora for SSA Machine Learning Algorithm Apply a machine learning scheme : Support Vector Machines (SVM) Naïve Bayes (NB) Decision Tree (J48) Build a sentiment classifier Learn a statistical classifier to discriminate a given text to: subjective vs. objective subjective positive vs. subjective negative Evaluate and test models’ capabilities of being generalised 10 fold cross- validation Independent test set

BUILDING TRAINING SET : FEATURES EXTRACTION & FEATURE VECTOR CONSTRUCTION Raw tweets An Arabic Twitter Corpora Text cleaning-up Sentiment annotation Feature extraction Pre-processing: build feature vector Classifier/ learner Class of a new document

EXPERIMENTAL SETTINGS a.Machine learners We use the implementations of the following algorithms provided by the WEKA data mining package – version (Witten and Frank, 2005).  Naïve Bayes (NB)  Trees (J48) NB is a simple probabilistic classifier that assume the feature independence J48 is a statistical model that generate a decision tree used for classification.

EXPERIMENTAL SETTINGS a.Machine learners We use the implementations of the following algorithms provided by the WEKA data mining package – version (Witten and Frank, 2005).  Sequential Minimal Optimization-SMO (Platt, 1999) Support Vector Machines (SVM)  ZeroR (baseline scheme) SVM aims to identify the Optimal hyperplane that linearly separates data instances with the maximum margin

EXPERIMENTAL SETTINGS b. Evaluation Metrics The results are evaluated with respect to two statistical measurements: F-measure (F) the harmonic average of the precision and recall: Where precision is the ratio of retrieved instances that are relevant, and recall is the ratio of relevant instances that are retrieved. The accuracy is percentage of the correctly classified instances: For all experiments, machine learners were run 100 times for each data-set (10 repetition* 10-fold cross validation)

RESULTS AND EVALUATION baselineSVM Tokens Morph feat Semantic feat Stylistic feat level classification: Subjective vs. Objective

RESULTS AND EVALUATION 2-level classification: positive vs. negative baselineSVM Tokens Morph feat Semantic feat Stylistic feat

RESULTS AND EVALUATION baselineSVM Tokens Morph feat Semantic feat Stylistic feat Single-level classification: positive vs. negative. Vs. neutral

CURRENT DIRECTION OF RESEARCH Applying semi-supervised learning to automatically annotate the rest of our twitter corpus. Investigate distant learning approaches to boost a large training set to be used for models’ optimisation. Building a high quality polarity lexicon to be employed in automatically detecting/identifying the overall sentiment orientation of a given text. Explore culture-related features that can detect cultural references in user-generated text.