Queries Over Graph Data: Presidential Election

Slides:



Advertisements
Similar presentations
Campaign Financing and Election Outcome
Advertisements

Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Technical BI Project Lifecycle
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
Russell Taylor Lecturer in Computing & Business Studies.
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you to enter and organize information?
Copyright © 2014 Pearson Education, Inc. 1 It's what you learn after you know it all that counts. John Wooden Key Terms and Review (Chapter 6) Enhancing.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
Neo4j Sarvesh Nagarajan TODO: Perhaps add a picture here.
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
Appendix: The WEKA Data Mining Software
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
Fast Kernel-Density-Based Classification and Clustering Using P-Trees Anne Denton Major Advisor: William Perrizo.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
NSF DUE ; Wen M. Andrews J. Sargeant Reynolds Community College Richmond, Virginia.
Media bias towards presidential election in the USA Luiz Guilherme Matoso do Nascimento.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Address Book Application Introducing Database Programming.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Chapter 9 Working with Databases. Copyright © 2011 Pearson Addison-Wesley Introduction In this chapter you will learn: – Basic database concepts – How.
Data Analysis and Visualisation. Problem Solving Methodology.
Data Visualization with Tableau
The US Presidential Candidacy within Media Insight Sample Report
Medicare, Social Security, and the 2016 Election
Detecting Web Attacks Using Multi-Stage Log Analysis
Neo4j: GRAPH DATABASE 27 March, 2017
Presentation by: ABHISHEK KAMAT ABHISHEK MADHUSUDHAN SUYAMEENDRA WADKI
[ 10.3 ] Voting Trends.
Sentiment Analysis of Twitter Data(using HadoopMapreduce)
Big Data is a Big Deal!.
Sentiment Analysis of Twitter Data
Can you predict who will win the US election?
VB 2010 Pertemuan 10.
Power Point #4 Presidential Campaigns
Text Mining CSC 600: Data Mining Class 20.
Big Data.
Rule Induction for Classification Using
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Future-oriented Benchmarking Through Social Media Analysis
Your Reliable and Efficient Design Tool
DATA MINING Python.
US President Election.
MID-SEM REVIEW.
Database-Driven Web Sites
American Government Negative Campaigning.
Prepared by Kimberly Sayre and Jinbo Bi
Compiler Construction
Good/Bad, Happy/Sad conducting sentiment analysis on user survey data from Houghton Library with R.
Database Vs. Data Warehouse
GIFT / Fiscal Data Package Iteration 3
University of Houston-Clear Lake Kaiser Permanente San Jose
Unit# 6: ICT Applications
Exercise 48 - Skills Adobe Flash CS4 is a software program you use to create animation and applications that range form simple to very complex. Being able.
Learning to Program in Python
Data Science with Python
Overview of big data tools
RecTech - Associated Recreation Council
Navya Thum January 30, 2013 Day 5: MICROSOFT EXCEL Navya Thum January 30, 2013.
Fordham Connect Train-the-Trainer Training Reports
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Text Mining CSC 576: Data Mining.
Databases and Information Systems
Introduction to NoSQL Database Systems
Modern Campaigns.
TOOLS & Projects overview
Campaigns and Elections
Presidential Debate #2 Write a paragraph for each of the following prompts in complete sentences How would you describe the performance of Donald Trump?
Geog 375 Individual Final Programming Project: Automated Thematic Maps
Austin Karingada, Jacob Handy, Adviser : Dr
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Queries Over Graph Data: Presidential Election KYLE BROWN, CHUKWUDI OGUEJIOFOR, ANDREW RADOSEVIC, TAPTI SAHA

Motivation: Presidential Election The 2016 United States Presidential Election was one of the most heated campaigns in the history of our country. All projections pointed towards Hillary Clinton claiming victory over Donald Trump. Take a look into finding sentiment based on language used in responses to questions. Why was the polling off? Big Data was a very new concept during the 2012 election, but it was used to predict the outcome. How can big data be used to help win an election?

Proposed Solution We performed a survey with more than 100 people of different age, religion etc to gather a wide variety of data and opinion. We determined the confidence using an open source online tool. We needed to program the data.

Software Used Graph Viz Neo4j Community Edition Produces graph visualizations quickly for large amounts of data Neo4j Community Edition Native property graph database Database is visible to a browser Sentiment Classifier Tool Online tool for classifying the responses to our questionnaire Based on Naïve Bayes classification Python GraphViz and Neo4j are supported in python 2.7 and python 3

Importing Libraries We used three libraries: Graph Viz Neo4j Panda Library Attributes: R style data frame works well with Pandas as it supports column header. Handles different data types in one package. Rendering graphs in graphviz is easy. Command line tool so it can generate any number of data. Neo4j Database is interactive and supports queries.

Loading Data We used Excel to store our Database Data

Sentiment Type We have divided our sentiments(data classifier) into 3 different types: Positive Negative Neutral These three types are further divided into two categories: Lean Strong

Sorting The program will sort the data according to the sentiment and the confidence value After sorting the data, it will save into a new file with the updated classifications.

Creating the Graph with GraphViz The program will create the text for the graph Nodes will all align at the top A graph will be created based on the nodes Build time: 15s

Creating the Graph with Neo4j Each statement is binned into a list for its sentiment. Then all lists are iterated over and the statements are assigned to a node and related to their sentiment node Current Timing: Time Elapsed: 128.229000092s Time to Clear DB: 1.07999992371s Time to Init: 1.03800010681s Time Loading Nodes:126.111000061s Total Node Count: 122

Future Works Use a larger dataset More surveys or mining data from social media such as Twitter Use Neo4j for statement similarity and machine learning Split statements into individual words and create more relationships Expand to more than 3 dimensions (text, confidence, sentiment) Possibly adding the target candidate as a dimension More visualizations More complex queries in Neo4j

Conclusion This hyped up election in history is also the most modern which allows for the largest data pool. Hilary was not a heavy favorite but was noticeably favored to win. Instead of polling on simple Y/N basis, it could be beneficial to look into people’s sentiment. Trump was able to swing previously identified democratic voters in order to obtain victory.

Questions?