Dynamic Covering for Recommendation Systems Ioannis Antonellis Anish Das Sarma Shaddin Dughmi.

Slides:



Advertisements
Similar presentations
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
Advertisements

The strength of routing Schemes. Main issues Eliminating the buzz: Are there real differences between forwarding schemes: OSPF vs. MPLS? Can we quantify.
Lindsey Bleimes Charlie Garrod Adam Meyerson
Class-constrained Packing Problems with Application to Storage Management in Multimedia Systems Tami Tamir Department of Computer Science The Technion.
1 SOFSEM 2007 Weighted Nearest Neighbor Algorithms for the Graph Exploration Problem on Cycles Eiji Miyano Kyushu Institute of Technology, Japan Joint.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Approximations of points and polygonal chains
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.
Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)
Branch & Bound Algorithms
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
1 Truthful Mechanism for Facility Allocation: A Characterization and Improvement of Approximation Ratio Pinyan Lu, MSR Asia Yajun Wang, MSR Asia Yuan Zhou,
A Fairy Tale of Greedy Algorithms Yuli Ye Joint work with Allan Borodin, University of Toronto.
Absorbing Random walks Coverage
Algorithms for Max-min Optimization
CS 345 Data Mining Online algorithms Search advertising.
1 Greedy Algorithms. 2 2 A short list of categories Algorithm types we will consider include: Simple recursive algorithms Backtracking algorithms Divide.
Approximation Algorithms
Zoë Abrams, Ashish Goel, Serge Plotkin Stanford University Set K-Cover Algorithms for Energy Efficient Monitoring in Wireless Sensor Networks.
CS 345 Data Mining Online algorithms Search advertising.
The Load Distance Balancing Problem Eddie Bortnikov (Yahoo!) Samir Khuller (Maryland) Yishay Mansour (Google) Seffi Naor (Technion)
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
Priority Models Sashka Davis University of California, San Diego June 1, 2003.
CBLOCK: An Automatic Blocking Mechanism for Large-Scale Deduplication Tasks Ashwin Machanavajjhala Duke University with Anish Das Sarma, Ankur Jain, Philip.
1/24 Algorithms for Generalized Caching Nikhil Bansal IBM Research Niv Buchbinder Open Univ. Israel Seffi Naor Technion.
CS38 Introduction to Algorithms Lecture 18 May 29, CS38 Lecture 18.
Approximation Algorithms for NP-hard Combinatorial Problems Magnús M. Halldórsson Reykjavik University
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Marketing and CS Philip Chan. Enticing you to buy a product 1. What is the content of the ad? 2. Where to advertise? TV, radio, newspaper, magazine, internet,
Approximation Algorithms for Knapsack Problems 1 Tsvi Kopelowitz Modified by Ariel Rosenfeld.
Private Approximation of Search Problems Amos Beimel Paz Carmi Kobbi Nissim Enav Weinreb (Technion)
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Heuristic Optimization Methods Greedy algorithms, Approximation algorithms, and GRASP.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
Marina Drosou, Evaggelia Pitoura Computer Science Department
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Online Social Networks and Media
A Membrane Algorithm for the Min Storage problem Dipartimento di Informatica, Sistemistica e Comunicazione Università degli Studi di Milano – Bicocca WMC.
Topics in Algorithms 2007 Ramesh Hariharan. Tree Embeddings.
Frequency Capping in Online Advertising Moran Feldman Technion Joint work with: Niv Buchbinder,The Open University of Israel Arpita Ghosh,Yahoo! Research.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
1 Latency-Bounded Minimum Influential Node Selection in Social Networks Incheol Shin
1 Approximation Algorithms for Generalized Min-Sum Set Cover Ravishankar Krishnaswamy Carnegie Mellon University joint work with Nikhil Bansal and Anupam.
Approximation Algorithms for Combinatorial Auctions with Complement-Free Bidders Speaker: Shahar Dobzinski Joint work with Noam Nisan & Michael Schapira.
Final Review Chris and Virginia. Overview One big multi-part question. (Likely to be on data structures) Many small questions. (Similar to those in midterm.
1 Approximation Algorithms for Generalized Scheduling Problems Ravishankar Krishnaswamy Carnegie Mellon University joint work with Nikhil Bansal, Anupam.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Models of Greedy Algorithms for Graph Problems Sashka Davis, UCSD Russell Impagliazzo, UCSD SIAM SODA 2004.
Da Yan, Raymond Chi-Wing Wong, and Wilfred Ng The Hong Kong University of Science and Technology.
Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.
The Range Mode ProblemCell Probe Lower Bounds for the Range Mode ProblemThe Range k-frequency Problem Preprocess an array A of n elements into a space.
TU/e Algorithms (2IL15) – Lecture 11 1 Approximation Algorithms.
8.3.2 Constant Distance Approximations
New Characterizations in Turnstile Streams with Applications
Digital Signature Schemes and the Random Oracle Model
Multi - Way Number Partitioning
Lecture 7: Dynamic sampling Dimension Reduction
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Synthesizing View Definitions from Data
Bin Fu Department of Computer Science
Near-Optimal (Euclidean) Metric Compression
Dynamic and Online Algorithms for Set Cover
Online algorithms Search advertising
Coverage Approximation Algorithms
cse 521: design and analysis of algorithms
Neuro-RAM Unit in Spiking Neural Networks with Applications
Computational Advertising and
Presentation transcript:

Dynamic Covering for Recommendation Systems Ioannis Antonellis Anish Das Sarma Shaddin Dughmi

Outline Covering & Recommendations Succinct Dynamic Covering Results: o Upper Bounds o Lower Bounds

Max k-cover Problem Input: o integer k o items: X = {1,2,..., n} o sets: I = {S1,..., Sm}, Si subset of X Output: Find subset of I with size less than k that maximizes cover of items A B k=1, Solution: A (size=3) k=2, Solutions: A,C (size=4) A,B (size=4) B,C (size=4) C 2 Sets Items

Max k-cover Problem NP-complete Greedy Algorithm o pick set that cover more items o iterate 1 - ((k-1)/k)^k <= 1 - 1/e = 0.67 approximation A B C Sets Items k=1, Solution: A (size=3) k=2, Solutions: A,C (size=4) A,B (size=4) B,C (size=4)

Max k-cover in Recommendations Alice views and rates movies Netflix would like to recommend new movies to Alice for watching Important problem: o Find users "similar" to Alice o Find users who cover a large set of Alice's likes and dislikes

Netflix example Each user is identified by subset of movies he likes/viewed Alice likes {A, B, C} Fred likes {A, D} Bob likes {B, E} Ben likes {C, F} Jim likes {A, B, F} James likes {A, B, F} Ben and Jim in conjunction cover all Alice's likes Fred, Bob and Ben in conjunction cover all Alice's likes Jim and James add same value

k-covering vs nearest neighbor for k=1, equivalent (dot product similarity) covering allows for diversifying recommendations want to cover all genres liked by a user o consider a user that likes 100 thriller movies and 10 comedies o want "similar" users to cover as many movies as possible o k-nearest neighbor attempts to find many similar users, not cover as many movies as possible

oDesk example Online labor marketplace clients post jobs and/or invite contractors contractors apply to jobs Contractor recommendations for clients o Bob invites/interviews/hires contractors o find clients "similar" to Bob Job recommendations for contractors o Alice applies to jobs o find contractors "similar" to Alice

Succinct Dynamic Covering (SDC) Input: o integer k o items: X = {1,2,..., n} o sets: I = {S1,..., Sm}, Si subset of X o query Q subset of X Output: Find subset of I with size less than k that maximizes cover of items in query Q However we further constrain the problem: o space constrained: statically preprocess (X,I) and store a small sketch, much smaller than O(mn) o dynamic: Q is not known apriori during the sketch creation

Notice two twists dynamic o for each user the set of movies that need to be covered is different o covering is not static space-constrained o real time, interactive recommendations o the whole netflix graph is huge  10 million users  100k movies  popular movies have been viewed many times o cannot process over the entire graph at query time

Ad serving online advertisers o bid on webpages matching relevancy criteria o target certain user demographics When a user visits a page Ad servers: o have some (not precise) idea about the demographic of the user (e.g. from click logs) o try to pick a set of ads that cover many user demographics o need to solve the SDC probem

Ad serving space-constraint: o set system consists of users, webpages and clicks dynamic: o each user view of each page is associated with different user demographic A B C Ads Webpages User visited pages

Coverage Oracle Offline stage: o Input:  integer k  items: X = {1,2,..., n}  sets: I = {S1,..., Sm}, Si subset of X Output: Data Structure D Dynamic stage: o Input: Query Q subset of X o Output: use D to find subset of I with size less than k that maximizes cover of items in query Q

Outline Covering & Recommendations Succinct Dynamic Covering Results: o Upper Bounds o Lower Bounds

Results given space limitations o interested in approximate solutions for SDC space vs approximation ratio tradeoffs ε: [0,1/2] δ1, δ1: non-negative integers, not both zero

Simple Deterministic Algorithm For every item, "remember" one set break ties arbitrarily m/k approximation, linear space Sets Items Sets Items k=2: OPT = 16 APPROX = 8 ratio = 16/8 =2

Better Deterministic Algorithm Find unchosen set containing the most uncovered items. Iterate. similar to previous algorithm, order is fixed sqrt(n/k) approximation, linear space Sets Items Sets Items k=2: OPT = 16 APPROX = 16 ratio = 16/16 = 1

Randomized Algorithm m ε /sqrt(k) approximation nm 1-2ε space Find unchosen set containing at least n/(m ε sqrt(k)). Choose and Iterate. For every remaining unchosen set, choose n/m 2ε uniformly at random from the uncovered items

Randomized Algorithm m ε /sqrt(k) approximation nm 1-2ε space Find unchosen set containing at least n/(m ε sqrt(k)). Choose and Iterate. For every remaining unchosen set, choose n/m 2ε uniformly at random from the uncovered items

Lower Bound holds for deterministic oracles only proof somewhat involved, uses the probabilistic method matches randomized upper bound Open problem: randomized lower bound

Related word distance oracles in graphs, Thorup and Zwick set cover in streaming model (sets are streams or items are streams) nearest neighbor (NN) search: o for k=1, SDC and NN are equivalent using the dot product similarity o no locality sensitive hashing for dot product (Charikar). So, no hope for signature schemes for SDC.

Summary Introduced Succinct Dynamic Covering problem Applications in many real-world recommendation systems approximation ratio and space tradeoffs Deterministic and Randomized upper bounds Deterministic lower bound

Thank you!