Web Information Retrieval

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Topic 3 Top-K and Skyline Algorithms. 2 What is top-k processing? Find k items that best answer a users query –As a set, as a sorted list, or as a sorted.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Best-Effort Top-k Query Processing Under Budgetary Constraints
Comparison of parallel and random approach to a candidate list in the multifeature querying Peter Gurský Institute of Computer Science UPJŠ, Košice, Slovakia.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Common Voting Rules as Maximum Likelihood Estimators Vincent Conitzer (Joint work with Tuomas Sandholm) Early version of this work appeared in UAI-05.
Voting and social choice Vincent Conitzer
Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 9: Social Choice Lecturer: Moni Naor.
Math 1010 ‘Mathematical Thought and Practice’ An active learning approach to a liberal arts mathematics course.
Voting Theory.
Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)
Social Choice: The Impossible Dream Michelle Blessing February 23, 2010 Michelle Blessing February 23, 2010.
Using a Modified Borda Count to Predict the Outcome of a Condorcet Tally on a Graphical Model 11/19/05 Galen Pickard, MIT Advisor: Dr. Whitman Richards,
Supporting Ad-Hoc Ranking Aggregates Chengkai Li (UIUC) joint work with Kevin Chang (UIUC) Ihab Ilyas (Waterloo)
Approximating Optimal Social Choice under Metric Preferences Elliot Anshelevich Onkar Bhardwaj John Postl Rensselaer Polytechnic Institute (RPI), Troy,
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
More on Rankings.
Rank Aggregation Methods for the Web CS728 Lecture 11.
CPS Voting and social choice
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 April 20, 2005
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
Rank Aggregation. Rank Aggregation: Settings Multiple items – Web-pages, cars, apartments,…. Multiple scores for each item – By different reviewers, users,
Top-k and Skyline Computation in Database Systems
Aggregation Algorithms and Instance Optimality
Common Voting Rules as Maximum Likelihood Estimators Vincent Conitzer and Tuomas Sandholm Carnegie Mellon University, Computer Science Department.
Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation.
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Social choice (voting) Vincent Conitzer > > > >
An efficient distributed protocol for collective decision- making in combinatorial domains CMSS Feb , 2012 Minyi Li Intelligent Agent Technology.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
CPS Voting and social choice Vincent Conitzer
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Sahand Negahban Sewoong Oh Devavrat Shah Yale + UIUC + MIT.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Information Networks Rank Aggregation Lecture 10.
Efficient Processing of Top-k Spatial Preference Queries
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Math for Liberal Studies.  We have seen many methods, all of them flawed in some way  Which method should we use?  Maybe we shouldn’t use any of them,
Combining Fuzzy Information: An Overview Ronald Fagin.
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Arrow’s Impossibility Theorem
Algorithms for Large Data Sets
Supporting Ad-Hoc Ranking Aggregates
Introduction If we assume
Top-k Query Processing
Rank Aggregation.
Popular Ranking Algorithms
Information Organization: Clustering
Models and Algorithms for Complex Networks
Voting systems Chi-Kwong Li.
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Voting and social choice
Probabilistic Latent Preference Analysis
Introduction to Social Choice
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Efficient Processing of Top-k Spatial Preference Queries
CPS Voting and social choice
Outline Rank Aggregation Computing aggregate scores
Presentation transcript:

Web Information Retrieval Rank Aggregation Thanks to Panayiotis Tsaparas

Rank Aggregation Given a set of rankings R1,R2,…,Rm of a set of objects X1,X2,…,Xn produce a single ranking R that is in agreement with the existing rankings

Examples Voting rankings R1,R2,…,Rm are the voters, the objects X1,X2,…,Xn are the candidates.

Examples Combining multiple scoring functions rankings R1,R2,…,Rm are the scoring functions, the objects X1,X2,…,Xn are data items. Combine the PageRank scores with term-weighting scores Combine scores for multimedia items color, shape, texture Combine scores for database tuples find the best hotel according to price and location

Examples Combining multiple sources rankings R1,R2,…,Rm are the sources, the objects X1,X2,…,Xn are data items. meta-search engines for the Web distributed databases P2P sources

Variants of the problem Combining scores we know the scores assigned to objects by each ranking, and we want to compute a single score Combining ordinal rankings the scores are not known, only the ordering is known the scores are known but we do not know how, or do not want to combine them e.g. price and star rating

Combining scores Each object Xi has m scores (ri1,ri2,…,rim) The score of object Xi is computed using an aggregate scoring function f(ri1,ri2,…,rim) R1 R2 R3 X1 1 0.3 0.2 X2 0.8 X3 0.5 0.7 0.6 X4 X5 0.1

Combining scores Each object Xi has m scores (ri1,ri2,…,rim) The score of object Xi is computed using an aggregate scoring function f(ri1,ri2,…,rim) f(ri1,ri2,…,rim) = min{ri1,ri2,…,rim} R1 R2 R3 R X1 1 0.3 0.2 X2 0.8 X3 0.5 0.7 0.6 X4 X5 0.1

Combining scores Each object Xi has m scores (ri1,ri2,…,rim) The score of object Xi is computed using an aggregate scoring function f(ri1,ri2,…,rim) f(ri1,ri2,…,rim) = max{ri1,ri2,…,rim} R1 R2 R3 R X1 1 0.3 0.2 X2 0.8 X3 0.5 0.7 0.6 X4 X5 0.1

Combining scores Each object Xi has m scores (ri1,ri2,…,rim) The score of object Xi is computed using an aggregate scoring function f(ri1,ri2,…,rim) f(ri1,ri2,…,rim) = ri1 + ri2 + …+ rim R1 R2 R3 R X1 1 0.3 0.2 1.5 X2 0.8 1.6 X3 0.5 0.7 0.6 1.8 X4 1.3 X5 0.1

Top-k Given a set of n objects and m scoring lists sorted in decreasing order, find the top-k objects according to a scoring function f top-k: a set T of k objects such that f(rj1,…,rjm) ≤ f(ri1,…,rim) for every object Xi in T and every object Xj not in T Assumption: The function f is monotone f(r1,…,rm) ≤ f(r1’,…,rm’) if ri ≤ ri’ for all i Objective: Compute top-k with the minimum cost

Cost function We want to minimize the number of accesses to the scoring lists Sorted accesses: sequentially access the objects in the order in which they appear in a list cost Cs Random accesses: obtain the cost value for a specific object in a list cost Cr If s sorted accesses and r random accesses minimize s Cs + r Cr

Example Compute top-2 for the sum aggregate function R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2

Fagin’s Algorithm Access sequentially all lists in parallel until there are k objects that have been seen in all lists R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2

Fagin’s Algorithm Access sequentially all lists in parallel until there are k objects that have been seen in all lists R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2

Fagin’s Algorithm Access sequentially all lists in parallel until there are k objects that have been seen in all lists R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2

Fagin’s Algorithm Access sequentially all lists in parallel until there are k objects that have been seen in all lists R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2

Fagin’s Algorithm Access sequentially all lists in parallel until there are k objects that have been seen in all lists R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2

Fagin’s Algorithm Perform random accesses to obtain the scores of all seen objects R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2

Fagin’s Algorithm Compute score for all objects and find the top-k R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2 R X3 1.8 X2 1.6 X1 1.5 X4 1.3

Fagin’s Algorithm X5 cannot be in the top-2 because of the monotonicity property f(X5) ≤ f(X1) ≤ f(X3) R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2 R X3 1.8 X2 1.6 X1 1.5 X4 1.3

Fagin’s Algorithm The algorithm is cost optimal under some probabilistic assumptions for a restricted class of aggregate functions

Threshold algorithm Access the elements sequentially R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2

Threshold algorithm At each sequential access Set the threshold t to be the aggregate of the scores seen in this access R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2 t = 2.6

Threshold algorithm At each sequential access Do random accesses and compute the score of the objects seen R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2 t = 2.6 X1 1.5 X2 1.6 X4 1.3

Threshold algorithm At each sequential access Maintain a list of top-k objects seen so far R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2 t = 2.6 X2 1.6 X1 1.5

Threshold algorithm At each sequential access When the scores of the top-k are greater or equal to the threshold, stop R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2 t = 2.1 X3 1.8 X2 1.6

Threshold algorithm At each sequential access When the scores of the top-k are greater or equal to the threshold, stop R1 X1 1 X2 0.8 X3 0.5 X4 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2 t = 1.0 X3 1.8 X2 1.6

Threshold algorithm Return the top-k seen so far R1 X1 1 X2 0.8 X3 0.5 0.3 X5 0.1 R2 X2 0.8 X3 0.7 X1 0.3 X4 0.2 X5 0.1 R3 X4 0.8 X3 0.6 X1 0.2 X5 0.1 X2 t = 1.0 X3 1.8 X2 1.6

Threshold algorithm From the monotonicity property for any object not seen, the score of the object is less than the threshold f(X5) ≤ t ≤ f(X2) The algorithm is instance cost-optimal within a constant factor of the best algorithm on any database

Combining rankings In many cases the scores are not known e.g. meta-search engines – scores are proprietary information … or we do not know how they were obtained one search engine returns score 10, the other 100. What does this mean? … or the scores are incompatible apples and oranges: does it make sense to combine price with distance? In this cases we can only work with the rankings

The problem Input: a set of rankings R1,R2,…,Rm of the objects X1,X2,…,Xn. Each ranking Ri is a total ordering of the objects for every pair Xi,Xj either Xi is ranked above Xj or Xj is ranked above Xi Output: A total ordering R that aggregates rankings R1,R2,…,Rm

Voting theory A voting system is a rank aggregation mechanism Long history and literature criteria and axioms for good voting systems

What is a good voting system? The Condorcet criterion if object A defeats every other object in a pairwise majority vote, then A should be ranked first Extended Condorcet criterion if the objects in a set X defeat in pairwise comparisons the objects in the set Y then the objects in X should be ranked above those in Y Not all voting systems satisfy the Condorcet criterion!

Pairwise majority comparisons Unfortunately the Condorcet winner does not always exist irrational behavior of groups V1 V2 V3 1 A B C 2 3 A > B B > C C > A

Pairwise majority comparisons Resolve cycles by imposing an agenda V1 V2 V3 1 A D E 2 B 3 C 4 5

Pairwise majority comparisons Resolve cycles by imposing an agenda V1 V2 V3 1 A D E 2 B 3 C 4 5 A B A

Pairwise majority comparisons Resolve cycles by imposing an agenda V1 V2 V3 1 A D E 2 B 3 C 4 5 A B A E E

Pairwise majority comparisons Resolve cycles by imposing an agenda V1 V2 V3 1 A D E 2 B 3 C 4 5 A B A E E D D

Pairwise majority comparisons Resolve cycles by imposing an agenda C is the winner V1 V2 V3 1 A D E 2 B 3 C 4 5 A B A E E D D C C

Pairwise majority comparisons Resolve cycles by imposing an agenda But everybody prefers A or B over C V1 V2 V3 1 A D E 2 B 3 C 4 5 A B A E E D D C C

Pairwise majority comparisons The voting system is not Pareto optimal there exists another ordering that everybody prefers Also, it is sensitive to the order of voting

References Ron Fagin, Amnon Lotem, Moni Naor. Optimal aggregation algorithms for middleware, J. Computer and System Sciences 66 (2003), pp. 614-656. Extended abstract appeared in Proc. 2001 ACM Symposium on Principles of Database Systems (PODS '01), pp. 102-113. Alex Tabbarok Lecture Notes Ron Fagin, Ravi Kumar, D. Sivakumar Efficient similarity search and classification via rank aggregation, Proc. 2003 ACM SIGMOD Conference (SIGMOD '03), pp. 301-312. Cynthia Dwork, Ravi Kumar, Moni Naor, D. Sivakumar. Rank Aggregation Methods for the Web. 10th International World Wide Web Conference, May 2001. C. Dwork, R. Kumar, M. Naor, D. Sivakumar, "Rank Aggregation Revisited," WWW10; selected as Web Search Area highlight, 2001.