Poorva Potdar Sentiment and Textual analysis of Create-Debate data EECS 595 – End Term Project.

Slides:

Advertisements

Similar presentations

Chapter 8 Flashcards.

Advertisements

 Debate is a formal type of argument.  There are several forms of debate, but all include guidelines that make sure everyone has a chance to speak their.

Distant Supervision for Emotion Classification in Twitter posts 1/17.

The Research Consumer Evaluates Measurement Reliability and Validity

A Metric for Software Readability by Raymond P.L. Buse and Westley R. Weimer Presenters: John and Suman.

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

Pollyanna Gonçalves (UFMG, Brazil) Matheus Araújo (UFMG, Brazil) Fabrício Benevenuto (UFMG, Brazil) Meeyoung Cha (KAIST, Korea) Comparing and Combining.

Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.

Sentiment Analysis An Overview of Concepts and Selected Techniques.

Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.

A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.

Introduction to Supervised Machine Learning Concepts PRESENTED BY B. Barla Cambazoglu February 21, 2014.

MSS 905 Methods of Missiological Research

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam

1 I256: Applied Natural Language Processing Marti Hearst Nov 8, 2006.

#title We know tweeted last summer ! Shrey Gupta & Sonali Aggarwal.

Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.

Understanding Research Results

Mining and Summarizing Customer Reviews

A Joint Model of Feature Mining and Sentiment Analysis for Product Review Rating Jorge Carrillo de Albornoz Laura Plaza Pablo Gervás Alberto Díaz Universidad.

More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies.

LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.

SHOWTIME! STATISTICAL TOOLS IN EVALUATION CORRELATION TECHNIQUE SIMPLE PREDICTION TESTS OF DIFFERENCE.

Near East University Department of English Language Teaching Advanced Research Techniques Correlational Studies Abdalmonam H. Elkorbow.

Introduction to Quantitative Data Analysis (continued) Reading on Quantitative Data Analysis: Baxter and Babbie, 2004, Chapter 12.

Introduction to Text and Web Mining. I. Text Mining is part of our lives.

Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.

Bug Localization with Machine Learning Techniques Wujie Zheng

True genius resides in the capacity for the evaluation of uncertain, hazardous, and conflicting information. - Winston Churchill.

Designing Ranking Systems for Consumer Reviews: The Economic Impact of Customer Sentiment in Electronic Markets Anindya Ghose Panagiotis Ipeirotis Stern.

Date: 2013/8/27 Author: Shinya Tanaka, Adam Jatowt, Makoto P. Kato, Katsumi Tanaka Source: WSDM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Estimating.

Microblogs: Information and Social Network Huang Yuxin.

Correlation and Prediction Error The amount of prediction error is associated with the strength of the correlation between X and Y.

2012. You must assume that your reader will disagree with you, or be skeptical; therefore, your tone must be reasonable, professional, and trustworthy.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

 Conversation Level Constraints on Pedophile Detection in Chat Rooms PAN 2012 — Sexual Predator Identification Claudia Peersman, Frederik Vaassen, Vincent.

Building a Classroom Community Of Readers, Writers, & Thinkers.

*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.

Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.

How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.

Prediction of Influencers from Word Use Chan Shing Hei.

Tackling the Complexities of Source Evaluation: Active Learning Exercises That Foster Students’ Critical Thinking Juliet Rumble & Toni Carter Auburn University.

TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.

CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.

Pearson Correlation Coefficient 77B Recommender Systems.

Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.

Chapter 6: Analyzing and Interpreting Quantitative Data

1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

English Testing Skills for the SAT Understanding how the author supports him/herself within a piece of writing.

Show Me the Money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeirotis.

Getting the most out of interactive and developmental data Daniel Messinger

Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.

Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.

Antisocial Behavior in Online Discussion Communities Authors: Justin Cheng, Cristian Danescu-Niculescu-Mizily, Jure Leskovec Presented by: Ananya Subburathinam.

Research Progress Kieu Que Anh School of Knowledge, JAIST.

Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.

More than words: Social network’s text mining for consumer brand sentiments Expert Systems with Applications 40 (2013) 4241–4251 Mohamed M. Mostafa Reporter.

Miss Amorin Language Arts SAT

Name: Sushmita Laila Khan Affiliation: Georgia Southern University

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

Edge Weight Prediction in Weighted Signed Networks

Measuring Complexity of Web Pages Using Gate

Text Mining & Natural Language Processing

Vocabulary for Argumentative Paragraph

Getting the most out of interactive and developmental data

Correlation and Prediction

Presentation transcript:

Poorva Potdar Sentiment and Textual analysis of Create-Debate data EECS 595 – End Term Project

EUREKA!! – Getting the Idea Why sentiment analysis?  Huge amount of opinionated Text on web  Sentiment Analysis on web – popularity of a product, movie or a person as such. Idea:  Create Debate – online debating forum where people argue for/against some topic.  Mine for the salient text features for agreement/disagreement posts.

Math Debates… Sentences! Posts, Users - Labeled dataset Neutral Agreement Disagreement Structural Analysis – Certain features of the language in the post that make it a high score agreement/disagreement post. Behavioral Analysis – Aspects of User’s behavior that give him a high rank on the forum. Creating the Haystack ….

What's the gain? Influence detection in a community Sub-Group Detection Stance Identification – Are there any visible groups with a particular stance? Predict the Crowd Trend for a particular topic of interest? Text Summarization

Correlation between polarity of the post Vs its score? Popular pattern observed in the dependency parse of agreement/disagreement posts? Emoticons? Are posts with formal text up-voted often? Finding the needle - structural features ….

Experiment 1 : Polarity Measure Intuition : Is the number of +ve/-ve words an indicative of how popular a post is? Tool – Opinion Finder/ Wordnet. Output of processed data by Opinion Finder.  It think it's wrong to assume that in order to be a revolutionary thinker you have to be crazy  MPQAPOL – Indicates the polarity of the word like “bad”  MPQASRC – Indicates the opinion source in the sentence like “It”  MPQASD – Direct subject expression in the sentence like “said” Result :  No evident correlation between number of polar words and the rank of the post  Authors use equal distribution of positive and negative words while expressing agreement/disagreement. PostsAgreement PostsDisagreement Posts Positive words Negative words

Experiment 2 : Readability Measure Intuition : Do the posts that are more readable/formal gain higher scores? Tool – Flesch Toolkit to analyze the Flesch Readability measure for each post. Calculated Pearson’s coefficient between the labeled score and Flesch score for each of the posts. Result : High correlation - the more formal the language of a post, the more is the points associated with it.  Eg 1 : “good times...bring it back ! =-=-=-=-=-=-=-=-=- =-=-=-==-=-=- ))))))))))))” [Flesch – 0, Labeled points - 1]  Eg 2 : “Vegetables is often seen as more healthy than eating meat.” [Flesch – 93.12, Labeled points – 29 (max)] PostsAgreement PostsDisagreement Posts Pearson’s correlation for flesch readability

Experiment 3 : Emoticon analysis Intuition : Do Emoticons in agreement/disagreement posts have any correlation with their labeled scores? Tool – CMU Ark Tagger [Stanford Parser doesn’t scale well]. Pearson’s coefficient between the labeled score and number of +ve/-ve emoticons for agreement/disagreement posts. Result : High correlation between number of emoticons and rank of disagreement posts. Analysis : authors tend to use expressive emoticons like smiles to give a sarcastic opinion regarding a particular argument.  “Hey! What’s that supposed to mean?;)”,  “Sure If you say so :P”. PostsAgreement PostsDisagreement Posts Positive emoticons Negative emoticons

Experiment 4 : Dependency Parse Intuition : Do highly ranked agreement/disagreement posts depict a popular dependency pattern? Agreement posts tend to express an agreement early on in the post, while disagreement is mild. Tool – Stanford Parser – Syntactic and Dependency Parse of the posts. Result: A lot of highly ranked agreement posts showed a popular dependency pattern as follows that begins with -  I->nsubj->+ve [I agree to, I like your point, I up-voted your argument] “I have to agree. Blah blah” I->nsubj->have->xcomp->agree->End I->nsubj->+ve->xcomp->+ve->End Stanford Parser + ExtractDependencies Code to traverse PRP to PRP$ Sentiwordnet PostsAgreement Posts Pearson’s coeff with I->nsubj->+ve pattern

Author starting a neutral post? Time of entry into discussion? Average number of times an author participates in a thread? Author participating in agreement/disagreement discussions? Finding the needle - behavioral features ….

Which Authors get the highest rank? -1 Intuition : To find if average number of times an author participates in a thread has a correlation with his ranking? Pearson’s coefficient Average number of times an author participates in a thread Result :  There is a pretty evident positive correlation of an author’s points to the number of times he participates in the discussion posts per thread.

Which Authors get the highest rank?-2 Intuition : To find if authors who participate in some kind of discussion/ or start a new thread get a high rank ? Pearson’s coefficient Authors who agree Authors who disagree0.770 Authors who start a new thread Result :  Rating of authors who agree > Rating of authors who disagree more > Rating of authors who start a new debate.  Authors who participate more in discussions are more popular.

Which Authors get the highest rank?-3 Intuition : To find if a authors that participate early/late in discussion fetch more ranking? Pearson’s coefficient Authors who participate early Authors who participate late Result :  Authors participating late in discussion are likely to have higher ranking.  By Intuition, authors who come late in discussion already know the opinion bias.  Participating early doesn’t help in ranking

Get the Ranking of Authors w.r.t features Trained a linear regression model using Weka’s Libsvm and got a predicted ranking of all authors based on the features. Got a correlation coefficient by comparing these rankings vs the gold standard rankings. SVM’s Correlation Coefficient Gold Standard Rankings/ Predicted Rankings Result :  The feature vector set shows a decent correlation with the actual rankings.

Future Work In this project, I essentially looked at some of the structural and behavioral features The opinion finder tool also tells whether it is a subjective or objective. One of the future Experiments – to find if there exists a correlation between subj/obj sentences and score of post? Does the length of the post matter? Going forward - consolidate all these features and results in the database and make it available as an open-source dataset

Thank You!