Nisha Ranga TURNING DOWN THE NOISE IN BLOGOSPHERE.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Optimizing Recommender Systems as a Submodular Bandits Problem Yisong Yue Carnegie Mellon University Joint work with Carlos Guestrin & Sue Ann Hong.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Turning Down the Noise in the Blogosphere Khalid El-Arini, Gaurav Veda, Dafna Shahaf, Carlos Guestrin.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Active Learning and Collaborative Filtering
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Evaluating Search Engine
ISP 433/533 Week 2 IR Models.
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
25 Sept 07 FF8 - Discrete Choice Data Introduction Tony O’Hagan.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
CS246 Link-Based Ranking. Problems of TFIDF Vector  Works well on small controlled corpus, but not on the Web  Top result for “American Airlines” query:
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Content : 1. Introduction 2. Traffic forecasting and traffic allocation ◦ 2.1. Traffic forecasting ◦ 2.2. Traffic allocation 3. Fleeting problems ◦ 3.1.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Tag-based Social Interest Discovery
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
FYP Presentation DATA FUSION OF CONSUMER BEHAVIOR DATASETS USING SOCIAL MEDIA Madhav Kannan A R 1.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
U & I: Users & Information Lab Sept 2008  Alice Oh 
Algorithms and their Applications CS2004 ( ) Dr Stephen Swift 1.2 Introduction to Algorithms.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
The First in GPON Verification Classic Mistakes Verification Leadership Seminar Racheli Ganot FlexLight Networks.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
L EARNING TO M ODEL R ELATEDNESS FOR N EWS R ECOMMENDATION Author: Yuanhua Lv and et al. UIUC, Yahoo! labs Presenter: Robbie 1 WWW 2011.
Chapter 23: Probabilistic Language Models April 13, 2004.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A Ralph Grishman NYU.
Presenting and Analysing your Data CSCI 6620 Spring 2014 Thesis Projects: Chapter 10 CSCI 6620 Spring 2014 Thesis Projects: Chapter 10.
Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Concurrency and Performance Based on slides by Henri Casanova.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Presentation by: ABHISHEK KAMAT ABHISHEK MADHUSUDHAN SUYAMEENDRA WADKI
Queensland University of Technology
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Personalized Social Image Recommendation
Lecture 15: Text Classification & Naive Bayes
Optimizing Submodular Functions
Multimedia Information Retrieval
Information Retrieval
Connecting the Dots Between News Article
A random experiment gives rise to possible outcomes, but any particular outcome is uncertain – “random”. For example, tossing a coin… we know H or T will.
Presentation transcript:

Nisha Ranga TURNING DOWN THE NOISE IN BLOGOSPHERE

INTRODUCTION COVERING THE BLOGOSPHERE Definition Near-optimal solution PERSONALIZED COVERAGE No- regret algorithm EVALUATE ALGORITHM

TOP STORIES OF BLOGOSPHERE 'Lipstick on a pig': Attack on Palin or common line? sept Lady Gaga Meat Dress for Halloween? Butcher's Weigh In Oct 2010

CHARACTERIZE THE POST BY FEATURES Low level feature USA, Obama, Network, ipad (noun, entities) High level feature Rise of China, Japan after earth quake (Topic)

BLOGOSPHERE : A blogosphere is a triplet U, Posts, cover ·(·) U = {u1,u2,...} is a finite set of features & Posts is a finite set of posts Cover function = Amount by which ‘Document d’ covers ‘features f’ Set of posts – Cover function = Amount by which ‘Set A’ covers ‘feature f’ F A (f) = ∑ cover A (f)

DRAWBACK Feature significance in corpus: All features in a corpus are treated equally, and thus we cannot emphasize the importance of certain features eg. covering “Cathedral High School” should not be as valuable as covering “Obama.” Feature significance in post: This objective function does not characterize how relevant a post is to a particular feature eg. a post about Obama’s speech covers Obama just as much as a post that barely mentions him Incremental coverage: This coverage notion is too strong, since after seeing one post that covers a certain feature, we will never gain anything from another post that covers the same feature. eg. If first post covered the Obama’s speech really well then the second post doesn’t has anything new to offer

Some features are more important than others Solution: Define Probabilistic Coverage cover A (f) = P(feature f | post d) With topic as feature; we say that post d is about topic f i.e The fist post is about Obama

Want to cover important features Solution: Associate a weight ‘w’ with each feature ‘f’ Where f is the frequency of feature in corpus cover an important feature by multiple posts F A (f) = ∑ w cover A (f)

Avoid same coverage by all the posts Solution: We define probability that at least one post in the set A covers the feature f It define feature importance in individual posts and in whole corpus F A (f) = ∑ w f cover A (f)

Each user has different interests, and selected post may cover stories not of his interest Utilize users feedback in order to learn a personalized notion of coverage of each user F(A) assigns fixed weight to each feature. It may vary among different users User’s coverage function is of the form F A (f) = ∑ π i ¨w f cover A (f) Where π i is personalized preference

Feedback a user provides for a post is independent of other post presented in the same set Easy to identify the ‘liking’ of the user If a user did not like the post either because of its contents or because the previous post have already covered the story Incremental coverage of a post

Evaluation is done on data collected over 2 week period Algorithm was run to select important and prevalent stories Measuring Topicality Measuring Redundancy

Yahoo! Performs well on both redundancy and topicality measurement; but it uses rich features like CTR, Search trends, user voting etc While TDN + LDA uses only text based features

Google has good topicality Algorithm TDN+LDA is as good as Google