Austin Karingada, Jacob Handy, Adviser : Dr

Slides:

Advertisements

Similar presentations

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel

Advertisements

Distant Supervision for Emotion Classification in Twitter posts 1/17.

Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.

Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.

A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

A commonly used feature to discriminate between hand and foot movements is the variance of the EEG signal at certain electrodes. To this end, one calculates.

Causality challenge workshop (IEEE WCCI) June 2, Slide 1 Bernoulli Mixture Models for Markov Blanket Filtering and Classification Mehreen Saeed Department.

What is machine learning? 1. A very trivial machine learning tool K-Nearest-Neighbors (KNN) The predicted class of the query sample depends on the voting.

Introduction to machine learning

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

PubMed/How to Search, Display, Download & (module 4.1)

CS 5604 Spring 2015 Classification Xuewen Cui Rongrong Tao Ruide Zhang May 5th, 2015.

Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.

Bayesian Networks. Male brain wiring Female brain wiring.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Universit at Dortmund, LS VIII

Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.

TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.

Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer.

Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.

Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.

Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.

Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.

UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.

A Sentiment-Based Approach to Twitter User Recommendation BY AJAY ABDULPUR RAJARAM NIKKAM.

Big Data Processing of School Shooting Archives

Detecting Web Attacks Using Multi-Stage Log Analysis

A Simple Approach for Author Profiling in MapReduce

Sentiment Analysis of Twitter Data(using HadoopMapreduce)

Sentiment Analysis of Twitter Data

Sentiment Analysis of Twitter Messages Using Word2Vec

Name: Sushmita Laila Khan Affiliation: Georgia Southern University

Matt Gormley Lecture 3 September 7, 2016

Module 11: File Structure

A Straightforward Author Profiling Approach in MapReduce

Recitation #3 Tel Aviv University 2016/2017 Slava Novgorodov

School of Computer Science & Engineering

Data Mining 101 with Scikit-Learn

MID-SEM REVIEW.

Text Classification CS5604 Information Retrieval and Storage – Spring 2016 Virginia Polytechnic Institute and State University Blacksburg, VA Professor:

Brain Hemorrhage Detection and Classification Steps

Materials & Methods Introduction Abstract Results Conclusion

Machine Learning Week 1.

An Inteligent System to Diabetes Prediction

Prepared by: Mahmoud Rafeek Al-Farra

Text Categorization Rong Jin.

iSRD Spam Review Detection with Imbalanced Data Distributions

Algorithms and Problem Solving

Machine Learning in Practice Lecture 22

The Naïve Bayes (NB) Classifier

Somi Jacob and Christian Bach

Materials & Methods Introduction Abstract Results Conclusion

Information Retrieval

Materials & Methods Introduction Abstract Results Conclusion

Deep Learning for the Soft Cutoff Problem

NAÏVE BAYES CLASSIFICATION

Materials & Methods Introduction Abstract Results Conclusion

Speech recognition, machine learning

Materials & Methods Introduction Abstract Results Conclusion

Presentation transcript:

Analyzing #POTUS Sentiment on Twitter to Predict Public Opinion on Presidential Issues Austin Karingada, Jacob Handy, Adviser : Dr. Dongchul Kim Department of Computer Science, College of Engineering and Computer Science ACKNOWLEDGEMENTS Acknowledge collaborators, partners and funding agencies, either with text or with their logos, as appropriate. GOALS Some of the goals of this project are: predict public opinion on a presidential policy by searching for sentiment patterns in past tweets using #POTUS. Implement an SVM and Naïve Bayes Algorithm and compare the accuracy of the prediction with each other INTRODUCTION Abstract For this project we are using twitter data to predict the public opinion on a presidential policy by searching for sentiment patterns in past tweets using #POTUS. In this project we used 2 machine learning methods: Naïve Bayes Algorithm and Support Vector Machines or SVM for short. We used these 2 methods in order to perform sentiment analysis on the tweets and then compare results to each other to see which method yielded the higher accuracy rate. Introduction #POTUS is a very trendy tweet ever since Donald Trump took office and is very active on twitter. From the interactivity from other twitter users, we can either distinguish them from negative, positive or neutral. Here we used a twitter API to get the tweets that used #POTUS and from there distinguished the tweets based on what words that are being used in order to feed it into the machine learning algorithm. METHOD For Collecting the data: We used Tython query with parameters: Searching for #POTUS Switch between mixed and recent results 100 tweets at a time Tweets in English Used one-hot encoded data into dictionary of id, classes and sentiment and wrote that to a csv file without label and id for calculating. Naive Bayes Algorithm Simple and effective classification algorithm Supervised learning Popular uses include: spam filters, text analysis and medical diagnosis. Assumes that the probability of each attribute belonging to a given class value is independent of all other attributes Calculates the probability of each instance of each class and selects the highest probability Process of Naïve bayes: Before the prediction: Preprocess the data into the table format from earlier Split the data set with 67% for training set and 33% for test set Separate data by classes to calculate the statistics for each class Calculate the mean Calculate the standard deviation Collect the values After the prediction: Calculate probabilities using the equation on last slide Summarize all the probabilities for each class Make a prediction based on the best probability Test the probabilities with the actual values Get the accuracy as a percentage RESULTS We had achieved a accuracy rate of 67.55% accuracy with Naïve Bayes. But since SVM was not working properly to display a proper accuracy rate due to not being implemented properly upon the time of this poster being made. DISCUSSION Limitations: One of the biggest limitations was sarcasm, there was not a good method to detect sarcasm in any libraries or any algorithms that could pick it up. We did try to identify both negative and positive words in a tweet but it doesn’t really fix the problem of trying to detect sarcasm. Another limitation we had was the amount of data we could get, the API we used only gave us a 100 tweets and in order to get a years worth of tweets, we have to pay for that and it is very expensive. CONCLUSIONS In conclusion, we were able to predict the sentiment at a rate of 67.55% with Naïve Bayes but not with SVMs and hopefully in the future, we can fix the SVM and compare that to Naïve Bayes and see which model can predict the sentiment better. Another future work is that we can implement bigrams, bigrams are a pair of consecutive written units such as letters, syllables, or words. Bigrams would help pick up sarcasm and would help improve accuracy rate of the program. Another future work is to implement Bernoulli Naïve Bayes which would work better since we dropped neutral sentiment. The math behind Naives Bayes This the example table, this table shows what keywords popped up in the POTUS and what the overall sentiment was from other twitter users. 1 = positive, 0 = negative Above here the picture is the how the one-hot encoded file looks like. The 0 is negative sentiment and the 1 is positive sentiment