Download presentation
Presentation is loading. Please wait.
1
SOCIAL COMPUTING Homework 3 Presentation
By Arun Sharma
2
Data Chosen The Data Chosen is Twitter Data from Profiles of People.
Includes their Tweet Text, Number of Followers, Retweets etc. Used for Collection process: Tweepy Twitter Library (Lets you search for specific twitter handles/ hashtags based on your query)
3
Data Collected from Established Leaders: Barack Obama Narendra Modi
Praveen Swami Amitabh Bachchan Sachin Tendulkar
4
Data Collected From Self Made Leaders: Used an article from India today to collect famous twitter celeb names the-twitter-god/280458 Gabbar Singh Ikaveri Jaihind Jhunjhunwala GreatBong RoflIndian
5
ESTABLISHING GROUND TRUTH
Used 3 Human Annotators for each specific group of people. Annotators were asked to fill out a form based on a subset of data provided to them. The majority decision on the annotated tweets on various parameters was taken as ground truth. In case of a tie, another human annotator was consulted. And the majority decision was considered.
6
ESTABLISHING GROUND TRUTH
For Every Tweet The Annotators had to Choose from the below mentioned options Positive Negative Neutral
7
Features Chosen And How to Extract Them
I worked on the following Features: Sentiment Involved: Positive/Negative/Neutral Unigram/Bigram/Trigram/Line Length Pronoun Count (Stanford Core NLP & Code from HW1) Used an existing open source tool like Weka/LightSide to predict sentiment based on given annotated data collected during Ground Truth.
8
Experimental Methodology
Targeted Data Collection using Tweepy was done. Data Cleaning Establishing Ground Truth using Human Annotators on a subset of chosen data Extracting the Relevant and chosen features using tools available Analyzing the results and comparing them for different groups chosen.
9
Ground Truth Using human annotators the following count of sentiment was obtained Total Tweets: 526 Positive Tweets: 304 Negative Tweers: 165 Neutral Tweets: 87
10
Classifier For Sentiment Extraction
I used Support Vector Machine to train and predict my data. In the annotated data that was used to train the classifier about the percentage of 60% (approximately) were positive the rest being negative and neutral. A 15 fold cross validation was performed on the dataset.
11
Precision and Recall Obtained
The precision and recall values as obtained for the Positive, Negative and Neutral values is as follows The following results show high precision and recall for positive prediction and less for the other two. This is due to the case that the data collected had very few negative and neutral examples to train on when compared to the positive example tweets. Sentiment Precision Recall Positive 74.917 72.99 Negative 47.979 49.686 Neutral 45.977 47.059
12
Dataset 1: Sentiment Sentiment Extracted after applying Classifier
Total Positive Identified 450 Total Negative Identified 154 Total Neutral Identified 114 Total Tweets 717
13
Dataset 1: Sentiment Sentiment Extracted after applying Classifier
Total Positive Identified 361 Total Negative Identified 306 Total Neutral Identified 234 Total Tweets 900
14
Pronoun Count: Dataset 1 & 2 Dataset 1 Pronoun Count Dataset 2 Obama
142 GreatBong 151 PraveenSwami 147 ikaveri 123 Narendra Modi 184 Jaihind 216 SrBachchan 92 jhunjhunwala 114 Sachin 191 roflindian 121
15
Pronoun Count: Total and Average
Dataset 1 (Established Leaders) 816 Dataset 2 (Self Made Leaders) 756 Average Pronoun count in dataset 1 163.2 Average Pronoun count in dataset 2 126
16
Likes & Retweets Total Average Dataset 1 Dataset 2 Total Likes 2843745
Dataset 1 Dataset 2 Total Likes 9830 Total Retweets 28837 Dataset 1 Dataset 2 Average Likes 568,749 1683 Average Retweets 221,180 4806
17
Likes & Retweets: Reasons
Gap is huge with respect to the number of likes And number of retweets between the twitter users in dataset one and dataset two Can be attributed to the fact that already established leaders on twitter cater to a larger segment of twitter population Whereas the self established ones cater to the niche followers that they have created in their domain.
18
Number Of Followers And Following:
Total: Average: Dataset 1 Dataset 2 Total No. of Followers 585744 Total Number of Following 639119 4339 Dataset 1 Dataset 2 Average no. of followers/user 97624 Average no. of following/user 127823 723
19
Does Not Convey the Whole Picture: Followers and Following
Dataset 1 Dataset 2 Name of user Followers Following Obama 636613 Narendra Modi 1372 Amitabh Bachchan 999 Sachin 15 Praveen Swami 56560 120 Name of User Followers Following Jaihind 2210 143 Gabbar Singh 277135 1230 GreatBong 23151 140 JhunJhunwala 95667 705 RoflIndian 125883 469 ikaveri 41095 1652
20
Number Of Followers And Following: Reason
The Average no. of followers might not be as an accurate indicator There were outlier profiles that increased this values dramatically when the others did not show such a huge numbers as they had a lot of followers of their own.
21
USER INTERACTION: Calculated the total number of interactions made by each member of the dataset.
22
USER INTERACTION: Dataset 1 Total number of references: 397
Average Replies: 79.4 tweets per user Dataset 2 Total number of references: 781 Average Replies: tweets per user
23
USER INTERACTION: Reasons
Established leaders interact less with public and tweet the things they think are important. Self made leaders interact way more with their followers. They need to interact with people in order to remain an influencer on twitter world.
24
Conclusion: The Difference Exists
As we have seen from the profiles analyzed there is a significant diffirence between both the groups. Difference can be noticed by looking: General Sentiments of the tweets Use of Pronoun Number of Followers Number of Following User Interactions done
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.