Ranking Tweets Considering Trust and Relevance Srijith Ravikumar,Raju Balakrishnan, and Subbarao Kambhampati Arizona State University 1.

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Twitter – what is it? The School District of Haverford Township |
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Twitter Glossary. #: People use the hashtag symbol # before a relevant keyword or phrase (no spaces) in their Tweet to categorize those Tweets and help.
RAProp: Ranking Tweets by Exploiting the Tweet/User/Web Ecosystem and Inter-Tweet Agreement Srijith Ravikumar Master’s Thesis Defense Committee Members.
How to make the most of your website: It’s one of your best marketing, branding, awareness tools.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Implicit Queries for Vitor R. Carvalho (Joint work with Joshua Goodman, at Microsoft Research)
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Optimal Ad Ranking for Profit Maximization Raju Balakrishnan (Arizona State University) Subbarao Kambhampati (Arizona State University) TexPoint fonts.
Social Media Intro to Business & Marketing. The most three most trusted forms of advertising are: Recommendations from people I know - 90% Consumer opinions.
Evaluating Search Engine
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Internet Resources Discovery (IRD) Search Engines Quality.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
1 Chapter 19: Information Retrieval. ©Silberschatz, Korth and Sudarshan19.2Database System Concepts - 5 th Edition, Sep 2, 2005 Chapter 19: Information.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Chapter 19: Information Retrieval
Information Retrieval
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 6 9/8/2011.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Emerging Topic Detection on Twitter (Cataldi et al., MDMKDD 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences
Aardvark Anatomy of a Large-Scale Social Search Engine.
Knowing Your Facebook From Your Flickr Dan O’ Neill – -
1 Chapter 19: Information Retrieval Chapter 19: Information Retrieval Relevance Ranking Using Terms Relevance Using Hyperlinks Synonyms., Homonyms,
Computing & Information Sciences Kansas State University Monday, 04 Dec 2006CIS 560: Database System Concepts Lecture 41 of 42 Monday, 04 December 2006.
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
SourceRank: Relevance and Trust Assessment for Deep Web Sources Based on Inter-Source Agreement Raju Balakrishnan, Subbarao Kambhampati Arizona State University.
Data Structures & Algorithms and The Internet: A different way of thinking.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Truth Discovery with Multiple Conflicting Information Providers on the Web KDD 07.
OCLC Online Computer Library Center 1 Social Media and Advocacy.
Web- and Multimedia-based Information Systems Lecture 2.
The Emergence of Conventions in Online Social ‡ MPI-SWS * KAIST † Stevens Institute.
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
“In the beginning -- before Google -- a darkness was upon the land.” Joel Achenbach Washington Post.
This material has been prepared by First Trust Portfolios L.P. for educational and informational purposes. Nothing in this material is intended to supersede.
Evaluating Event Credibility on Twitter Presented by Yanan Xie College of Computer Science, Zhejiang University 2012.
Frompo is a Next Generation Curated Search Engine. Frompo has a community of users who come together and curate search results to help improve.
14. June 2016 Mapping democracy Indira Ishmurzina
Database System Concepts, 5th Ed. ©Sang Ho Lee Chapter 19: Information Retrieval.
CSCE 590 Web Scraping – Information Extraction II
Millions of Databases: Which are Trustworthy and Relevant?
Recommender Systems & Collaborative Filtering
Information Retrieval
Map Reduce.
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Information Retrieval
Data Integration for Relational Web
Information retrieval and PageRank
Twitter Tutorial.
Chapter 31: Information Retrieval
Information Retrieval and Web Design
Chapter 19: Information Retrieval
Presentation transcript:

Ranking Tweets Considering Trust and Relevance Srijith Ravikumar,Raju Balakrishnan, and Subbarao Kambhampati Arizona State University 1

One of the most prominent micro-blogging service. Twitter has over 140 million active users and generates over 340 millions tweets daily and handles over 1.6 billion search queries per day.search queries Users access tweets by following other users and by using the search function. 2

Twitter Search Sorted by Reverse Chronological Order Select the top retweeted single tweet as the top Tweet. Does not apply any relevance metrics. Contains spams and untrustworthy tweets. Results for the Query: “Britney Spears” 3

TweetRank Query Top K Results Top N Results Acts as a mediator between User and Twitter K is much higher than N and thereby we are able to eliminate untrustworthy results. 4

Need for Relevance and Trust Spread of False Facts in Twitter has become an everyday event Re-Tweets and users can be bought. Thereby making relying on those for trustworthiness does not work. 5

Getting Relevant & Trustworthy Results Manual curation is out of question.. (unless you are the Government of China :-) )  How many would it take to clean up a micro-blog with140 million active users? Automated analysis?  Page Rank uses the explicit links between the Web Pages for evaluation of Trust and Relevance. But what are the links between tweets? 6

Links in Twitter Space Retweet Agreement Re-Tweet: Explicit links between tweets Agreement: Implicit links between tweets that contain the same fact 7

Agreement Agreement between two tweets is defined as amount of similarity in their content. Retweets are not considered in Agreement as Retweets are unverified endorsements. How does agreement Capture Relevance and Trust?  A tweet which is agreed upon by a large number of other tweets is likely to be popular. The popular tweets are more likely to be Relevant.  Since agreement does not include retweets, most agreed tweet has most number of independent users agreeing on the same fact and hence they are more trustworthy. 8

Agreement Computation For efficient computation of agreement we need to understand the meaning of each tweet. This need Natural Language Processing. As a preliminary idea, we compute agreement using Soft TF-IDF with Jaro-Winkler similarity. Soft TF-IDF is similar to TF-IDF except it considers similar tokens in two compared document vectors in addition exactly similar terms. 9

Computing Ranked Results Simple voting technique is used to compute the Ranked Results. The Agreement of a tweet is the sum of the agreement with all others tweets. The tweets are sorted according to Agreement voting and Top-N results are send to user

Results: Britney Spears Twitter ResultsTweetRank Results (Oops?!) Britney Spears is Engaged... Again! - its britney: In entertainment: Britney Spears engaged to marry her longtime boyfriend and former agent Jason Trawick. Britney Spears Engaged Again #Britney #Spears #engaged to #boyfriend: #report: LOS ANGELES (Reuters) - Pop star Britney Britney Spears engaged: Congratulations to Britney Spears and her beau Jason Trawick for getting engaged via a 3.5 carat ring! We are certainly happy for her! 11

Evaluation - Relevance Top N results where manually labelled as follows: Not related to the topic or spam0 Remotely Relevant to the topic1/3 Tweets which have some information on the topic 2/3 Tweets which have good amount of information 1 12

Evaluation - Trust Untrustworthy tweets such as spam or wrong facts Tweets which are opinions 0 Tweets which contain correct facts 1 Top N results where manually labelled as follows: 13

Ranking Cost The time increases quadratically with the number of tweets. Since the computation of agreement is pairwise it can be easily parallelized using MapReduce. 14

Twitter Eco-System Followers Hyperlinks Tweeted By Tweeted URL 15

Summary  We model the tweet space as a tri-layer graph; containing tweet layer, user layer and web-page layer.  Ranking is derived based on users, tweets, and prestige of the referred web pages.  Micro-blog spamming is increasingly becoming lucrative and problematic.  We are working on a ranking sensitive to trustworthiness and relevance of Micro-blogs. 16