Sentiment Analysis of Twitter Data
Introduction It has been found in a survey that from 2010 onwards the amount of data generated is approximately 3 times than what we have produced before 2010. Twitter is one of the main source of generating data with approximately 500 million tweets estimated per day, which is a clear sign that there is lot of things we can predict from it, ‘Sentiment Analysis’ is a basic process of analyzing data of twitter and getting information about latest trends, likability or dispensability of any thing or object etc. Sentiment Analysis now a days is very common and its main application was in predicting U.S. elections winner, after every presidential debate of Trump and Hillary, sentiment analysis was performed on the twitter data of both the candidates and it was found that the people were tweeting much positive about Trump day after day, before the elections by which it was concluded that there is a strong chance of Donald Trump winning the elections and which happened.
Connecting to Twitter API For getting data from twitter we have to make our application on https://dev.twitter.com/ . After making we will be redirected to our application homepage where we can generate our access tokens by clicking on ‘Generate Access Token’ icon on our application homepage. There are two types of API of twitter : Search API - for searching whatever text object we are looking for e.g. #NBA or “CristianoRonaldo”. Rest API - write a new tweet, follow any user, read their profile, getting there profile information and much more. But in sentiment analysis mostly we have to use search API for getting tweets.
In this project, first I have collected the data and stored them in a file then read the data from the file and word tokenized them using python’s Natural Language Toolkit Library(nltk). There is a list available of positive and negative words on internet that will have all the negative or positive words, stored the words in the dataframe and then compared words of my tweets with the respective words in positive and negative dataframes and generated net positive score and negative score for each( I have collected 200 tweets(as twitter maximum limit is 200 tweets at a time) about ‘Narendra Modi’ and ‘Arvind Kejriwal’ and then analyzing them and displaying the final result in a pie chart to determine whether people are speaking about whom Narendra Modi or Arvind Kejriwal ).
Extracting Tweets and storing them in a file
Tweets in Dataframe
Script for analyzing the tweets and displaying the results (Part -1 of Script )
As the whole script cannot come on one single slide therefore I have used different slides .
Results By this sentiment analysis we can conclude that people have more positive opinions for Narendra Modi as compared with Arvind Kejriwal.
THANK YOU