Ranking: Compare, Don’t Score Ammar Ammar, Devavrat Shah (LIDS – MIT) Poster ( No preprint), WIDS 2011
Introduction The need to rank items based on user input – election, betting, recommendation systems. E.g. Netflix, the movie streaming service company (accounts for 30% of U.S. web traffic) : the problem of recommending movies to users based on partial historical information about their preferences. Two main approaches Scores : ask users to provide a score/rating for each product, and use the scores to rank the products. (A popular approach) Comparisons : ask users to compare two, or more, products at a time. Use comparisons to rank products. (A natural alternative) 2
Introduction Scores Advantage : Easy aggregation. Disadvantage : Scores are arbitrary/relative (e.g. scale). Comparisons Advantage : Absolute information. Disadvantage : Hard aggregation
Mathematical Model n products, N={1,...,n}. Each customer is associated with a permutation σ of the elements of N. σ(i) < σ(j) means customer prefers product i to j. E.g. N={1,2,3,4,5} a customer have a permutation σ = ( ) his preference ranking 3 > 1 > 4 > 2 > 5 Their model of customer choice is a distribution, μ: S n [0,1], over the set of possible permutations, S n. Observed data is limited to pairwise comparison marginal of μ : w ij = P [ σ(i) < σ(j) ] (fraction of users who prefer item i to item j) Goal: find an estimate μ, that is consistent with the data likedislike ˆ 4
Maximum Entropy Multiple distributions are consistent with the data constraints The principle of maximum entropy helps a distribution choice. Subject to known constraints (called “testable information”) The probability distribution which best represents the current state of knowledge is the one with largest entropy. The solution has the parametric form max μ σ log μ σ 5
Contributions Developed a consistent algorithm for estimating the parameter of the Maximum Entropy distribution. Algorithm is distributed and iterative. Provided a randomized 2-approximation scheme for the mean of the distribution. Developed two ranking schemes that utilize the Maximum Entropy distribution to obtain a ranking that puts emphasis at the top elements: Top-k ranking: uses likelihood of the item appearing in top k. θ-ranking: uses a tilted average of the item's possible positions. 6
Algorithm Sketch The Maximum Entropy distribution is fully characterized by the parameter λ ij. To estimate these parameters using the data w ij 's. Initialize the parameter to λ ij =1. for t=1,2,... T1: Set λ ij t+1 = λ ij t + 1/t(w ij – E λ [I { σ(i) < σ(j) } ] ) Exact computation of E λ [I { σ(i) < σ(j) } ] is hard. Use MCMC or BP to obtain an approximation Parameters can be estimated "separately" in a distributed manner. 7 t t
Mode Mode : σ* = argmax ( ) Exact computation of the mode is hard. A randomized 2-approximation: 1. Generate k permutations, σ 1,..., σ k, u.a.r. 2. Select the permutation σ with the largest weight. 8 σ ˆ
Top-k Ranking A robust ranking of the top k items using one of the following two schemes: 1. top-k ranking Compute: S k (i)=P λ [σ(i) ≤ k]. Rank products using S k and choose top k. 2. θ-ranking Compute: S θ (i)=∑ j e -θj ∙ P λ [σ(i) = j] Rank products using S θ and choose top k. 9