Download presentation
Presentation is loading. Please wait.
Published byGwenda Townsend Modified over 6 years ago
1
Sentiment Analysis of Twitter Data(using HadoopMapreduce)
Group Name:Cloud Atlas Group Members: Gautham Atluri, Nikhil Aourpally, Sai Kiran Neelakantam
2
Group project Description
Task 1: Hadoop Multi-Node Cluster Setup on Ubuntu Task 2: Preprocessing Data Task 3: Training a Classifier Task 4: Classifying Tweets Task 5: GUI Application Development
3
Tasks allocation Gautham Atluri (Developer) Nikhil Auropally
Gautham Atluri (Developer) Nikhil Auropally (Project Lead) Sai Kiran Neelakantam Task 1: Hadoop Setup Learn how to setup a hadoop environment to build a multi-node cluster on Ubuntu. Work Load: 33% Learn how to setup a hadoop environment to build a multi- node cluster on Ubuntu. Task 2: Preprocessing Data Collecting datasets of classified tweets and build a program to extract the words from the tweets and their respective sentiment type. Work Load: 40% Collecting datasets of classified tweets and build a program to extract the words from the tweets and their respective sentiment type. Testing the functionality of the program developed and ensuring that the data extracted is efficient and as per the requirement of the corresponding tasks. Work Load: 20% Task 3: Train Classifier Write a MapReduce program to calculate the frequency of occurrence of every word in the dataset in decreasing order. Testing the functionality of the program developed by teammates and ensuring that it works according to the requirements. Train the classifier with the extracted training set using Naïve Bayes Classifier Probability Algorithm. Task 4: Classify Tweets Testing the functionality of the program developed by teammates and ensuring that it works according to the problem statement. Write a program to classify the tweets and output the sentiment type of it. Display the details of the classification in a report. Write a program to implement the search API on twitter to extract the tweets based on the user search query and display the details in a report. Task 5: GUI Application Development Designing the front-end view of the application and make sure that it is user- friendly and easy to understand. Designing the front-end view of the application and make sure that it is user-friendly and easy to understand.
4
Technical Details Software Hardware
Software packages: Apache Hadoop Libraries, JDK Programming/Scripting Languages: Java/Python, Shell Script, JSP, HTML, CSS Hardware Public Cloud Infrastructure(vlab)
5
Technical Details Project setup
6
Technical Details Roadmaps of your project (Milestones)
Week1,Week2:Project Proposal and Multinode Hadoop Cluster Setup in ubuntu Week3: Preprocessing the Data to remove unwanted noise. Week3, Week4: Write MapReduce Algorithm to build the naïve Bayes Classifier and also classify the test tweets gathered from Twitter Stream API. Week4,Week5: Develop Front End GUI for Sentiment Analysis tool Week6: Testing of the Sentiment Analysis Application developed and present a demo of the same.
7
Risks and Benefits Novel aspects of this project: Risks/challenges:
Test and Classify based on real time twitter feed data. Using naïve Bayes classifier model which promises good accuracy rate. Risks/challenges: Lack of large datasets of classified tweets Distinguishing between noisy and useful keywords Potential applications & benefits: Very helpful in gaining insights from customer point of view, public sentiment about a product before its launch. Stock Sentiment, Movie Success Analysis etc.
8
Tasks Accomplished by Now
1st Draft of the Project Proposal completed. Learnt how to setup multinode Hadoop Cluster in Ubuntu
9
Result generated is an output from MapReduce classification job.
Conclusion Result generated is an output from MapReduce classification job. Contains tweets grouped by their sentiment in a text file.
10
Demo (if have)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.