Reputation Management System

Slides:



Advertisements
Similar presentations
Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Advertisements

Overview of Twitter API Nathan Liu. Twitter API Essentials Twitter API is a Representational State Transfer(REST) style web services exposed over HTTP(S).
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Text Categorization.
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Sentiment Analysis on Twitter Data
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Problem Semi supervised sarcasm identification using SASI
Information Retrieval in Practice
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
Learning Bit by Bit Search. Information Retrieval Census Memex Sea of Documents Find those related to “new media” Brute force.
Scalable Text Mining with Sparse Generative Models
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Overview of Search Engines
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Improving Software Package Search Quality Dan Fingal and Jamie Nicolson.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Sentiment Analysis of Social Media Content using N-Gram Graphs Authors: Fotis Aisopos, George Papadakis, Theordora Varvarigou Presenter: Konstantinos Tserpes.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Processing of large document collections Part 3 (Evaluation of text classifiers, term selection) Helena Ahonen-Myka Spring 2006.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Spam Detection Ethan Grefe December 13, 2013.
Chapter 23: Probabilistic Language Models April 13, 2004.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization Shubhanshu Mishra 1, Jana Diesner 1, Jason Byrne 2, Elizabeth.
Linking Organizational Social Networking Profiles PROJECT ID: H JEROME CHENG ZHI KAI (A H ) 1.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
E VENT D ETECTION USING A C LUSTERING A LGORITHM Kleisarchaki Sofia, University of Crete, 1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
TwitterFeedRank Nick Flacco Dalton Huynh Abhishek Jha Phong Lam.
Institute of Informatics & Telecommunications NCSR “Demokritos” Spidering Tool, Corpus collection Vangelis Karkaletsis, Kostas Stamatakis, Dimitra Farmakiotou.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Influence detection of famous personalities using Politeness and Likeability Navita Jain.
A Sentiment-Based Approach to Twitter User Recommendation BY AJAY ABDULPUR RAJARAM NIKKAM.
Information Retrieval in Practice
A Simple Approach for Author Profiling in MapReduce
Sentiment Analysis of Twitter Messages Using Word2Vec
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Erasmus University Rotterdam
Information Retrieval and Web Search
Sentiment Analysis Study
MID-SEM REVIEW.
SOCIAL COMPUTING Homework 3 Presentation
Multimedia Information Retrieval
Intro to Machine Learning
Information Retrieval
Academic & More Group 4 谢知晖 王逸雄 郭嘉宋 程若愚.
Naïve Bayes Text Classification
Text Analytics Solutions with Azure Machine Learning
Presentation transcript:

Reputation Management System Mihai Damaschin Matthijs Dorst Maria Gerontini Cihat Imamoglu Caroline Queva

Introduction Before : Now : Difficult to track reputation of a brand. I like this company, it’s a really good one. You should try ! It’s awesome ! Someone I like this company, and you ? No I don’t, I think it’s not a good one. Difficult to track reputation of a brand. Can know the reputation of a brand by monitoring social media sources

Methods Search Search for specific keywords : provided by Twitter Problem : their in-memory solution is limited to the last 8 days To handle this problem, store the search results in a MySQL database. Query the database as well as the Twitter search. Increase recall. Don’t use a relevance score based on term frequency and document frequency, since tweets are limited in length.

Methods : Get tweets Need to retrieve data about the tweets : Twitter offers access to most of the functionalities required through a REST API. I like this company, it’s really good. Someone Twitter API Tweet text Author Number of followers Number of retweets … Problem encountered : the rate limit imposed by the API (150 calls per hour, when not authenticated and 350 otherwise). Solution found : caching. But still a problem for the calculation of PageRank (can’t create a user graph)

Methods : Preprocessing : To achieve better results with lee noise Elimination of repeated characters in tweets Greeeaaaat Greeaat Aweeesome Aweesome Look Look Annotations and URL’s removal @Someone : it’s awesome. Check www.this.com. : it’s awesome. Check. Tokenization : Use StandardTokenizer from Lucene (Apache) Send to this@mail.com. This is perfect. Tokenizer Send to this@mail.com This is perfect

Methods : Sentiment analysis Lexicon : find the score of a tweet to label it as negative, neutral or positive. Standard lexicon “It is good” => score = 0+0+1 = 1 Modifiers and negations : add a coefficient for the following words “It is not good” => score = 0+0+0+(-1)*1 = -1 Distance : the score decreases as the distance between the sentiment word and the name of the company increases “I like A but I really hate B” => Positive score for A, negative score for B

Methods Naïve Bayes: Use the algorithm provided by Weka and the hand-classified data provided by Sanders Analytics Extract the tweet content Transform it into a feature vector Train the model with the extracted annotated data Serialize the model

Methods PageRank: Static quality measure. Use number of retweets, number of followers, friends and other tweets of the author Use information about the maximum of such numbers Use log of the features and a simple linear mixture Obtain a single number representing authority PageRank: Static quality measure. Problem encountered : the rate limit imposed by the API is a problem for the creation of a user graph. Consequently, it was not possible to apply the original pagerank algorithm.

Methods: User Interface Goal : keep the design as simple and self explanatory as possible, while maintaining the essential functionality Search Input No distractions: the user is presented with a single text input field and a single button. Search Results Overall: pie chart with breakdown of positive, negative and neutral tweets. Timeline: incremental sentiment over time. Specific: most influential tweets.

Methods: User Interface

Results for unfiltered data Experimental Set up: Use an existing twitter corpus, provided by Sanders Analytics Split this data set for training and testing Measure precision and recall Use the query “Microsoft” for testing Four different ways have been used for the sentiment estimation : The Gaussian distance. The detection of negations and modifiers. The Bayes classifier and a lexicon with assigned negative and positive scores. The presence of smileys to the sentiment score. Results for unfiltered data

Results A significant improvement occurs at recall for the negations and the modifiers. The performance of precision has been decreased. Results for data without URL and annotations The Bayes classifier does not indicate any difference. The overall score has an improvement in comparison with the unfiltered data. Results for data without repeated characters