Download presentation
Presentation is loading. Please wait.
Published byDaniel Owen Modified over 8 years ago
1
Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty
2
Agenda Pre-processing of tweets Research literatures studied and motivation Next 2-weeks Plans
3
Pre-processing Tasks Completed: Parsed all the files provided by Raytheon and extracted tweets of ~18GB. Tweets doesn’t have meta-data associated with it for time being. Tweets containing non-ascii characters and new-line characters are discarded. –POS tagger stopped processing the tweets containing above characters. Tasks to be addressed: Approximately 2 weeks to POS tag, Chunking and NER all the tweets that we have currently at our disposal.
4
Research Literatures Studied Several research literatures have been studied to get an idea of the prior work in this field. –Sentiment Analysis –Opinion-Target pairs –Latent user attributes –Event Detection –POS and NER for twitter data-set –Domain Adaptation Reference to all the research literatures can be found on wiki maintained by our team.
5
Motivation behind studying research literatures Sentiment Analysis provides background to examine sentiment of a person on a topic, an abstract or a discussion etc. –Classifying the polarity of a given text at the document, sentence, or feature/aspect level. –Generally, sentiments means positive, negative, or neutral. –This could be extended to emotional states of a person such as angry, sad or happy. Latent user attributes –For our project, we need to construct profile. –Profile associated with meta-data. Name, Profile Id, Tweet Id, location (geo-stationary or profile creation) etc. –Some meta-data are not available as part of tweets meta-data. Gender, age, political orientation, region
6
Motivation behind studying research literatures contd… Event Detection –Event is basically an observable phenomena or occurrence. Ex. Earthquake, war, flood –People have different opinion. –Zero-in on an event and start analyzing the sentiment of a person over a definite period during that effect of the event. POS and NER for twitter data-set (continuing…) –Existing tool (such as Alan Ritter’s POS tagging for twitter) is currently being used for part-of-speech tagging and named-entity recognition. –This will be used as feature in our learning algorithm. Domain Adaptation –How the model behaves in a different data-set.
7
Next 2-weeks plans Complete POS tagging and NER in next 2-3 weeks using existing tool. Annotating tweets. Identifying the domains/issues that we will be concentrating on and finding the active users in the domains/issues. –Key words to be used to search domains/issues. –Group the tweets with respect to domains –Find the active users in each domain.
8
Difficulties Faced Feature selection POS tagging and NER Removing non-ascii characters
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.