SIGMOD 2006 Context-sensitive ranking Rakesh AgrawalMicrosoft Search Labs Ralf RantzauIBM Silicon Valley Lab Evimaria TerziUniversity of Helsinki & Microsoft.

SIGMOD 2006 Context-sensitive ranking Rakesh AgrawalMicrosoft Search Labs Ralf RantzauIBM Silicon Valley Lab Evimaria TerziUniversity of Helsinki & Microsoft Search Labs Work done largely while the authors were in IBM Almaden

SIGMOD 2006 The curse of abundance: Too many data and too many answers Query shopping.com for a digital camera: Query Froogle for a tennis racquet:

SIGMOD 2006 Ranking query results Algorithms for ranking web pages have been quite successful ([BP’98,Kleinberg98]) –Key idea: Exploit the graph of hyperlinks between web pages Can we take similar approach for ranking database query results? –Need for a graph structure that accurately describes the relationships between tuples in the database -Past attempts: schema and key constraints or queries [BHP’04, BHNCS’02, GMT’04] But are these graphs natural or do they reflect design optimization decisions?

SIGMOD 2006 Using preferences to induce a graph of tuples Genre (G)Actor (A)Title (T)Language t1t1 DramaKidmanBirthEnglish t2t2 DramaCruzVanilla SkyEnglish t3t3 Sci-FiReevesMatrixEnglish t4t4 ComedyCruzSin noticias de DiosSpanish t5t5 ComedyAnistonRumor has it…English Drama > Sci-Fi Kidman > Reeves Matrix > Birth t 1 >t 3 and t 2 >t 3 t 1 >t 3 t 3 >t 1 t1t1 t2t2 t3t3 [ Preferences are predicates of the form “X=x 1 > X=x 2 ” ]

SIGMOD 2006 Augment preferences with context Genre (G)Actor (A)Title (T)Language (L) t1t1 DramaKidmanBirthEnglish t2t2 DramaCruzVanilla SkyEnglish t3t3 Sci-FiReevesMatrixEnglish t4t4 ComedyCruzSin noticias de DiosSpanish t5t5 ComedyAnistonRumor has itEnglish –in general (*) English > Spanish | * –but in the context of Comedies Spanish > English| Comedies [ Contexts are predicates of the form “Y=a” ]

SIGMOD 2006 Preferences in the past Preferences expressed via a numeric score [AW’00,KI’04,KI’05] –Nicole Kidman : 0.9 –Penelope Cruz : 0.4 –Dramas : 0.8 –Comedies : 0.3 Pairwise preferences in ML literature [CSS’97] Preferences as partial orders [Kieβling’02] Preferences as first-order formulas [Chomiki’03]

SIGMOD 2006 Contextual preferences Genre (G)Actor (A)Title (T)Language (L) t1t1 DramaKidmanBirthEnglish t2t2 DramaCruzVanilla SkyEnglish t3t3 Sci-FiReevesMatrixEnglish t4t4 ComedyCruzSin noticias de DiosSpanish t5t5 ComedyAnistonRumor has it…English P 1 ={G=Drama > G=Sci-Fi | L=English} P 2 ={A=Kidman > A=Reeves | L = English} P 3 ={T=matrix > T=Birth | L=English } t 1 >t 3 |En and t 2 >t 3 |En t 1 >t 3 |En t 3 >t 1 |En Genre (G)Actor (A)Title (T)Language (L) t1t1 DramaKidmanBirthEnglish t2t2 DramaCruzVanilla SkyEnglish t3t3 Sci-FiReevesMatrixEnglish t4t4 ComedyCruzSin noticias de DiosSpanish t5t5 ComedyAnistonRumor has it…English t1t1 t2t2 t3t3 2/3 1/3 1 1/2 t1t1 t2t2 t3t3

SIGMOD 2006 Obtaining preferences Users provide preferences voluntarily – in the same way users rate products and services Preferences can be automatically collected via browser plug-ins or taskbars (with user permission) Preferences can be learned from past data Preferences can also be learned from the data (e.g., using association-rule mining) Preferences are obtained from various sources and can contain cycles and contradictions, which are resolved democratically

SIGMOD 2006 Overview Question: How to incorporate users preferences when ranking query results? Approach: Accumulate contextual preferences of the form i 1 >i 2 |X Order the answer tuples such that the preferences are maximally respected, giving higher weight to those preferences whose contexts have closer match to the query

SIGMOD 2006 Issues How to define similarity between a query and a context ? –See paper for the distance function. Can we create orders in an offline step and use their information at query time ? Should we save all orders? How to combine the saved orders while answering queries ?

SIGMOD 2006 Problem decomposition [Problem 1]: For every context X build an order τ X (Ordering) [Problem 2]: Given a set of orders T m = {τ 1,…, τ m } find ℓ representative orders T ℓ (ClusterOrders) Assign each of the input orders to one of the representatives (the closest) Associate with each representative σ a set of contexts Y σ [Problem 3]: Provide top-k results for the query Q –respecting the representative orders and –weight respect according to the similarity between query and contexts (Querying)

SIGMOD 2006 Problem 1: The Ordering problem For a given context X and a set of preferences P X over the tuples D={t 1,…,t n } find an ordering τ of D such that t1t1 t2t2 t3t3 1/2 2/3 1/3 1 t1t1 t2t2 t3t3 t2t2 t1t1 t3t3 Agree = 1 +1/2 = 2/3 = 13/6

SIGMOD 2006 Problem 2: The ClusterOrders problem Given m orders T m ={τ 1,…,τ m }, each corresponding to a single concept X i, find ℓ representative orders T ℓ such that cost(T ℓ ) is minimized where and We use the standard Spearman footrule and Kendall tau distances for comparing orderings

SIGMOD 2006 The ClusterOrders problem: Example a b c d e f a b c d e f a b c d e f f e d c b a f e d c b a a b c d e f f e d c b a Cost(τ 1 ) = 2 01 1 01 Cost(τ 2 ) = 1Cost(τ 1, τ 2 ) = 2+1=3

SIGMOD 2006 Problem 3: The Querying problem Provide top-k results for query Q respecting the representative orders and weighting respect using the corresponding set of contexts

SIGMOD 2006 Constructing orders from preferences [Problem1] Problem is NP-hard; need for heuristics PickPerm algorithm : pick a random permutation, inverse it and pick the best of the two t1t1 t2t2 t3t3 1/2 2/3 1/3 1 t1t1 t2t2 t3t3 t2t2 t3t3 t1t1 A = 11/6 t1t1 t3t3 t2t2 A = 5/6 t2t2 t3t3 t1t1 [ Inspired by the 2-approximation algorithm for finding the maximum acyclic subgraph of a given graph ]

SIGMOD 2006 Greedy algorithm [CSS’97] At the i-th iteration pick the i-th element of the output permutation At each iteration pick the tuple t with the highest s_val(t) = OutDegree(t)-InDegree(t) in the remaining preference graph t1t1 t2t2 t3t3 1/2 2/3 1/3 1 t1t1 t2t2 t3t3 2/3 1/3 t1t1 t3t3 1 -4/3 t2t2 1/3 -1/3 t2t2 t1t1 t2t2 t1t1 t3t3

SIGMOD 2006 MC -algorithm Reverse the directions of the edges on the preference graph Run a random walk (with random restarts) on the reversed graph Rank according to the stationary distribution

SIGMOD 2006 Performance Data generation –Fix an order on the tuples –Generate preferences that respect this order –Pc: the probability that a preference is generated between a pair of tuples Observations –For small p c values more orders are compatible, all algorithms are good –For large p c values MC and Greedy find the optimal order

SIGMOD 2006 Reducing the number of orders [Problem 2] Finding ℓ representative orders is NP-hard Finding ℓ orders from the input ones (good approximation, but still hard) Need for heuristics Greedy algorithm –Always pick the order (from the input) that introduces the minimum cost Furthest algorithm –Start by picking a random order τ and add it in the output set of orders T ℓ –For ℓ-1 iterations pick the order that is furthest away from the orders already in T ℓ

SIGMOD 2006 Refine the representative orders Given the set of representative orders T ℓ, assign each input order τЄT m to its closest representative in T ℓ. (partition T m into ℓ partitions)* –Discrete refinement: For each partition pick the best representative of the partition –Continuous refinement: ( [DKNS’01] ) For each partition find the best representative of the partition *Notice the resemblance between this problem and Catalog Segmentation problem by [KPR’04]

SIGMOD 2006 Performance Data generation –Fix ℓ underlying orders T –Generate other orders from T by picking an order in T and adding noise (swaps) –Compute the cost of the solution wrt to the ground truth Observations –Without refinements: Greedy performs steadily better than Furthest –With refinements: Both algorithms are equally good –The groupings are equivalent

SIGMOD 2006 Problem 3: The Querying problem Use variation of the TA algorithms [FLN’02, FKS’03] –Assume k = 2 and query Q such that: sim(Q,Y 1 ) = 0.5, sim(Q,Y 2 ) = 0.3, sim(Q,Y 3 )=0.1 Y 1,T 1 t1t1 5 t2t2 4 t3t3 3 t4t4 2 T5T5 1 Y 2,T 2 t2t2 5 t3t3 4 t1t1 3 t4t4 2 t5t5 1 Y 3,T 3 t4t4 5 t3t3 4 t1t1 3 t5t5 2 t2t2 1 0.50.30.1

SIGMOD 2006 Problem 3: The Querying problem 1.At each sequential access a.Set the threshold TH to be the aggregate of the scores seen in this access TH =0.5*5+0.3*5+0.1*5=4.5 Y 1,T 1 t1t1 5 t2t2 4 t3t3 3 t4t4 2 T5T5 1 Y 2,T 2 t2t2 5 t3t3 4 t1t1 3 t4t4 2 t5t5 1 Y 3,T 3 t4t4 5 t3t3 4 t1t1 3 t5t5 2 t2t2 1 0.50.30.1

SIGMOD 2006 Problem 3: The Querying problem 1.At each sequential access b.Do random accesses and compute the score of the objects seen TH =0.5*5+0.3*5+0.1*5=4.5 Y 1,T 1 t1t1 5 t2t2 4 t3t3 3 t4t4 2 T5T5 1 Y 2,T 2 t2t2 5 t3t3 4 t1t1 3 t4t4 2 t5t5 1 Y 3,T 3 t4t4 5 t3t3 4 t1t1 3 t5t5 2 t2t2 1 t1t1 3.7 t2t2 3.6 t4t4 2.1 0.50.30.1

SIGMOD 2006 Problem 3: The Querying problem 1.At each sequential access b.Do random accesses and compute the score of the objects seen TH =0.5*5+0.3*5+0.1*5=4.5 Y 1,T 1 t1t1 5 t2t2 4 t3t3 3 t4t4 2 T5T5 1 Y 2,T 2 t2t2 5 t3t3 4 t1t1 3 t4t4 2 t5t5 1 Y 3,T 3 t4t4 5 t3t3 4 t1t1 3 t5t5 2 t2t2 1 t1t1 3.7 t2t2 3.6 0.50.30.1

SIGMOD 2006 Problem 3: The Querying problem 1.At each sequential access c.Maintain a list of the top-k objects seen so far TH =0.5*5+0.3*5+0.1*5=4.5 Y 1,T 1 t1t1 5 t2t2 4 t3t3 3 t4t4 2 T5T5 1 Y 2,T 2 t2t2 5 t3t3 4 t1t1 3 t4t4 2 t5t5 1 Y 3,T 3 t4t4 5 t3t3 4 t1t1 3 t5t5 2 t2t2 1 t1t1 3.7 t2t2 3.6 0.50.30.1

SIGMOD 2006 Problem 3: The Querying problem 1.At each sequential access d.When the scores of the top-k are greater or equal to the threshold, stop TH =0.5*4+0.3*4+0.1*4=3.6 Y 1,T 1 t1t1 5 t2t2 4 t3t3 3 t4t4 2 T5T5 1 Y 2,T 2 t2t2 5 t3t3 4 t1t1 3 t4t4 2 t5t5 1 Y 3,T 3 t4t4 5 t3t3 4 t1t1 3 t5t5 2 t2t2 1 t1t1 3.7 t2t2 3.6 0.50.30.1

SIGMOD 2006 Accuracy of top-k results IMDB dataset –Automatically generate preferences via association- rule mining: ‘A1=a’ > ‘A1=b’ |X if conf(X  a)>conf(X  b) –Sol k : top-k results obtained after clustering –G k : top-k results without clustering

SIGMOD 2006 Accuracy of top-k results

SIGMOD 2006 Recap Notion of contextual preferences Use of contextual preferences to order database results Use of association rules to obtain contextual preferences Experimental validation of the effectiveness of the proposed techniques using both synthetic and real data

SIGMOD 2006 Conclusions and future work The framework of contextual preferences is both intuitive and practical The framework is easily extended to accommodate for top-k lists and bucket orders Scalability of the algorithms needs further investigation

SIGMOD 2006 Questions?

SIGMOD 2006 Context-sensitive ranking Rakesh AgrawalMicrosoft Search Labs Ralf RantzauIBM Silicon Valley Lab Evimaria TerziUniversity of Helsinki & Microsoft.

Similar presentations

Presentation on theme: "SIGMOD 2006 Context-sensitive ranking Rakesh AgrawalMicrosoft Search Labs Ralf RantzauIBM Silicon Valley Lab Evimaria TerziUniversity of Helsinki & Microsoft."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SIGMOD 2006 Context-sensitive ranking Rakesh AgrawalMicrosoft Search Labs Ralf RantzauIBM Silicon Valley Lab Evimaria TerziUniversity of Helsinki & Microsoft.

Similar presentations

Presentation on theme: "SIGMOD 2006 Context-sensitive ranking Rakesh AgrawalMicrosoft Search Labs Ralf RantzauIBM Silicon Valley Lab Evimaria TerziUniversity of Helsinki & Microsoft."— Presentation transcript:

Similar presentations

About project

Feedback