SOCIAL COMPUTING Homework 3 Presentation

Slides:



Advertisements
Similar presentations
Survey design. What is a survey?? Asking questions – questionnaires Finding out things about people Simple things – lots of people What things? What people?
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
SOCIAL MEDIA ADVOCACY &. WHAT YOU WILL GET OUT OF TODAY’S SESSION: HOW COALITION MEMBERS AND SUPPORTERS CAN ADVOCATE FOR MINNEMINDS CHANNELS TYPES OF.
A Metric for Software Readability by Raymond P.L. Buse and Westley R. Weimer Presenters: John and Suman.
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Morris LeBlanc.  Why Image Retrieval is Hard?  Problems with Image Retrieval  Support Vector Machines  Active Learning  Image Processing ◦ Texture.
Experimental Evaluation
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
An Introduction to Machine Learning and Natural Language Processing Tools Presented by: Mark Sammons, Vivek Srikumar (Many slides courtesy of Nick Rizzolo)
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Experimental Evaluation of Learning Algorithms Part 1.
Where did plants and animals come from? How did I come to be?
Prediction of Influencers from Word Use Chan Shing Hei.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Brenna Patterson|Jenna Quiring|Allison Riggs| Kala’e Parish “Twitter is word-of-mouth marketing on steroids.” –Adam Franklin, journalist.
Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization Shubhanshu Mishra 1, Jana Diesner 1, Jason Byrne 2, Elizabeth.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Reputation Management System
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
Influence detection of famous personalities using Politeness and Likeability Navita Jain.
Homework 3 Progress Presentation -Meet Shah. Goal Identify whether tweet is sarcastic or not.
Big Data Processing of School Shooting Archives
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Messages Using Word2Vec
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
The Effects of Cashtags in Predicting Daily DJIA Directional Change
How to forecast solar flares?
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
An Artificial Intelligence Approach to Precision Oncology
DM-Group Meeting Liangzhe Chen, Nov
Influence detection of famous personalities using Politeness and Likeability Navita Jain.
Insight Ahmad Jabi | Yazan Shakhshir | Saleem Abu Dhair
Ying He Wuhan University of Technology Twitter: #AMIA2017
Project 4 User and Movie 2018/11/10.
Internal WP7 meeting Warsaw, June 12-13, 2017
Roberto Battiti, Mauro Brunato
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Mitchell Kossoris, Catelyn Scholl, Zhi Zheng
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
Twitter Equity Firm Value
Cryptocurrencies: A Brief Look & Sentiment Analysis
Discriminative Frequent Pattern Analysis for Effective Classification
iSRD Spam Review Detection with Imbalanced Data Distributions
Feature Extraction on Twitter Streaming data using Spark RDD
Approaching an ML Problem
How to use Twitter By Fraser and Laura.
Business and Management Research
Example: Academic Search
Intro to Machine Learning
Evaluating Classifiers
Big Data Environment. Analysing Public Perceptions of South Africa’s Local Elections by using Geo-located Twitter Data.
Introduction to Sentiment Analysis
Machine Learning: Methodology Chapter
Social Media Presentation
Lesson Overview 1.1 What Is Science?.
Machine Learning: Lecture 5
Austin Karingada, Jacob Handy, Adviser : Dr
Presenter: Donovan Orn
Presentation transcript:

SOCIAL COMPUTING Homework 3 Presentation By Arun Sharma

Data Chosen The Data Chosen is Twitter Data from Profiles of People. Includes their Tweet Text, Number of Followers, Retweets etc. Used for Collection process: Tweepy Twitter Library (Lets you search for specific twitter handles/ hashtags based on your query)

Data Collected from Established Leaders: Barack Obama Narendra Modi Praveen Swami Amitabh Bachchan Sachin Tendulkar

Data Collected From Self Made Leaders: Used an article from India today to collect famous twitter celeb names http://www.outlookindia.com/magazine/story/look-whos-chasing- the-twitter-god/280458 Gabbar Singh Ikaveri Jaihind Jhunjhunwala GreatBong RoflIndian

ESTABLISHING GROUND TRUTH Used 3 Human Annotators for each specific group of people. Annotators were asked to fill out a form based on a subset of data provided to them. The majority decision on the annotated tweets on various parameters was taken as ground truth. In case of a tie, another human annotator was consulted. And the majority decision was considered.

ESTABLISHING GROUND TRUTH For Every Tweet The Annotators had to Choose from the below mentioned options Positive Negative Neutral

Features Chosen And How to Extract Them I worked on the following Features: Sentiment Involved: Positive/Negative/Neutral Unigram/Bigram/Trigram/Line Length Pronoun Count (Stanford Core NLP & Code from HW1) Used an existing open source tool like Weka/LightSide to predict sentiment based on given annotated data collected during Ground Truth.

Experimental Methodology Targeted Data Collection using Tweepy was done. Data Cleaning Establishing Ground Truth using Human Annotators on a subset of chosen data Extracting the Relevant and chosen features using tools available Analyzing the results and comparing them for different groups chosen.

Ground Truth Using human annotators the following count of sentiment was obtained Total Tweets: 526 Positive Tweets: 304 Negative Tweers: 165 Neutral Tweets: 87

Classifier For Sentiment Extraction I used Support Vector Machine to train and predict my data. In the annotated data that was used to train the classifier about the percentage of 60% (approximately) were positive the rest being negative and neutral. A 15 fold cross validation was performed on the dataset.

Precision and Recall Obtained The precision and recall values as obtained for the Positive, Negative and Neutral values is as follows The following results show high precision and recall for positive prediction and less for the other two. This is due to the case that the data collected had very few negative and neutral examples to train on when compared to the positive example tweets. Sentiment Precision Recall Positive 74.917 72.99 Negative 47.979 49.686 Neutral 45.977 47.059

Dataset 1: Sentiment Sentiment Extracted after applying Classifier Total Positive Identified 450 Total Negative Identified 154 Total Neutral Identified 114 Total Tweets 717

Dataset 1: Sentiment Sentiment Extracted after applying Classifier Total Positive Identified 361 Total Negative Identified 306 Total Neutral Identified 234 Total Tweets 900

Pronoun Count: Dataset 1 & 2 Dataset 1 Pronoun Count Dataset 2 Obama 142 GreatBong 151 PraveenSwami 147 ikaveri 123 Narendra Modi 184 Jaihind 216 SrBachchan 92 jhunjhunwala 114 Sachin 191 roflindian 121

Pronoun Count: Total and Average Dataset 1 (Established Leaders) 816 Dataset 2 (Self Made Leaders) 756 Average Pronoun count in dataset 1 163.2 Average Pronoun count in dataset 2 126

Likes & Retweets Total Average Dataset 1 Dataset 2 Total Likes 2843745   Dataset 1 Dataset 2 Total Likes 2843745 9830 Total Retweets 1105902 28837   Dataset 1 Dataset 2 Average Likes 568,749 1683 Average Retweets 221,180 4806

Likes & Retweets: Reasons Gap is huge with respect to the number of likes And number of retweets between the twitter users in dataset one and dataset two Can be attributed to the fact that already established leaders on twitter cater to a larger segment of twitter population Whereas the self established ones cater to the niche followers that they have created in their domain.

Number Of Followers And Following: Total: Average:   Dataset 1 Dataset 2 Total No. of Followers 124950509 585744 Total Number of Following 639119 4339   Dataset 1 Dataset 2 Average no. of followers/user 24990101.8 97624 Average no. of following/user 127823 723

Does Not Convey the Whole Picture: Followers and Following Dataset 1 Dataset 2 Name of user Followers Following Obama 73988163 636613 Narendra Modi 19619806 1372 Amitabh Bachchan 20643658 999 Sachin 10642322 15 Praveen Swami 56560 120 Name of User Followers Following Jaihind 2210 143 Gabbar Singh 277135 1230 GreatBong 23151 140 JhunJhunwala 95667 705 RoflIndian 125883 469 ikaveri 41095 1652

Number Of Followers And Following: Reason The Average no. of followers might not be as an accurate indicator There were outlier profiles that increased this values dramatically when the others did not show such a huge numbers as they had a lot of followers of their own.

USER INTERACTION: Calculated the total number of interactions made by each member of the dataset.

USER INTERACTION: Dataset 1 Total number of references: 397 Average Replies: 79.4 tweets per user Dataset 2 Total number of references: 781 Average Replies: 130.1 tweets per user

USER INTERACTION: Reasons Established leaders interact less with public and tweet the things they think are important. Self made leaders interact way more with their followers. They need to interact with people in order to remain an influencer on twitter world.

Conclusion: The Difference Exists As we have seen from the profiles analyzed there is a significant diffirence between both the groups. Difference can be noticed by looking: General Sentiments of the tweets Use of Pronoun Number of Followers Number of Following User Interactions done