Dynamic Covering for Recommendation Systems Ioannis Antonellis Anish Das Sarma Shaddin Dughmi
Outline Covering & Recommendations Succinct Dynamic Covering Results: o Upper Bounds o Lower Bounds
Max k-cover Problem Input: o integer k o items: X = {1,2,..., n} o sets: I = {S1,..., Sm}, Si subset of X Output: Find subset of I with size less than k that maximizes cover of items A B k=1, Solution: A (size=3) k=2, Solutions: A,C (size=4) A,B (size=4) B,C (size=4) C 2 Sets Items
Max k-cover Problem NP-complete Greedy Algorithm o pick set that cover more items o iterate 1 - ((k-1)/k)^k <= 1 - 1/e = 0.67 approximation A B C Sets Items k=1, Solution: A (size=3) k=2, Solutions: A,C (size=4) A,B (size=4) B,C (size=4)
Max k-cover in Recommendations Alice views and rates movies Netflix would like to recommend new movies to Alice for watching Important problem: o Find users "similar" to Alice o Find users who cover a large set of Alice's likes and dislikes
Netflix example Each user is identified by subset of movies he likes/viewed Alice likes {A, B, C} Fred likes {A, D} Bob likes {B, E} Ben likes {C, F} Jim likes {A, B, F} James likes {A, B, F} Ben and Jim in conjunction cover all Alice's likes Fred, Bob and Ben in conjunction cover all Alice's likes Jim and James add same value
k-covering vs nearest neighbor for k=1, equivalent (dot product similarity) covering allows for diversifying recommendations want to cover all genres liked by a user o consider a user that likes 100 thriller movies and 10 comedies o want "similar" users to cover as many movies as possible o k-nearest neighbor attempts to find many similar users, not cover as many movies as possible
oDesk example Online labor marketplace clients post jobs and/or invite contractors contractors apply to jobs Contractor recommendations for clients o Bob invites/interviews/hires contractors o find clients "similar" to Bob Job recommendations for contractors o Alice applies to jobs o find contractors "similar" to Alice
Succinct Dynamic Covering (SDC) Input: o integer k o items: X = {1,2,..., n} o sets: I = {S1,..., Sm}, Si subset of X o query Q subset of X Output: Find subset of I with size less than k that maximizes cover of items in query Q However we further constrain the problem: o space constrained: statically preprocess (X,I) and store a small sketch, much smaller than O(mn) o dynamic: Q is not known apriori during the sketch creation
Notice two twists dynamic o for each user the set of movies that need to be covered is different o covering is not static space-constrained o real time, interactive recommendations o the whole netflix graph is huge 10 million users 100k movies popular movies have been viewed many times o cannot process over the entire graph at query time
Ad serving online advertisers o bid on webpages matching relevancy criteria o target certain user demographics When a user visits a page Ad servers: o have some (not precise) idea about the demographic of the user (e.g. from click logs) o try to pick a set of ads that cover many user demographics o need to solve the SDC probem
Ad serving space-constraint: o set system consists of users, webpages and clicks dynamic: o each user view of each page is associated with different user demographic A B C Ads Webpages User visited pages
Coverage Oracle Offline stage: o Input: integer k items: X = {1,2,..., n} sets: I = {S1,..., Sm}, Si subset of X Output: Data Structure D Dynamic stage: o Input: Query Q subset of X o Output: use D to find subset of I with size less than k that maximizes cover of items in query Q
Outline Covering & Recommendations Succinct Dynamic Covering Results: o Upper Bounds o Lower Bounds
Results given space limitations o interested in approximate solutions for SDC space vs approximation ratio tradeoffs ε: [0,1/2] δ1, δ1: non-negative integers, not both zero
Simple Deterministic Algorithm For every item, "remember" one set break ties arbitrarily m/k approximation, linear space Sets Items Sets Items k=2: OPT = 16 APPROX = 8 ratio = 16/8 =2
Better Deterministic Algorithm Find unchosen set containing the most uncovered items. Iterate. similar to previous algorithm, order is fixed sqrt(n/k) approximation, linear space Sets Items Sets Items k=2: OPT = 16 APPROX = 16 ratio = 16/16 = 1
Randomized Algorithm m ε /sqrt(k) approximation nm 1-2ε space Find unchosen set containing at least n/(m ε sqrt(k)). Choose and Iterate. For every remaining unchosen set, choose n/m 2ε uniformly at random from the uncovered items
Randomized Algorithm m ε /sqrt(k) approximation nm 1-2ε space Find unchosen set containing at least n/(m ε sqrt(k)). Choose and Iterate. For every remaining unchosen set, choose n/m 2ε uniformly at random from the uncovered items
Lower Bound holds for deterministic oracles only proof somewhat involved, uses the probabilistic method matches randomized upper bound Open problem: randomized lower bound
Related word distance oracles in graphs, Thorup and Zwick set cover in streaming model (sets are streams or items are streams) nearest neighbor (NN) search: o for k=1, SDC and NN are equivalent using the dot product similarity o no locality sensitive hashing for dot product (Charikar). So, no hope for signature schemes for SDC.
Summary Introduced Succinct Dynamic Covering problem Applications in many real-world recommendation systems approximation ratio and space tradeoffs Deterministic and Randomized upper bounds Deterministic lower bound
Thank you!