Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University of Trento Francesco Bonchi, Yahoo Labs - Francesco Gullo, Yahoo Labs
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 2 Issues with Pattern Search O OHO S Query 510 matches Too many matches Results are not grouped
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 3 Solution: Discovering Specializations O OHO S 510 matches O OHOH O S CH 3 O OHO S SH O OHO S H3CH3C O O S CH 3 O OHOH O S H S 448 matches46 matches114 matches 382 matches 46 matches Specializations
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 4 Applications Finding groups of molecules having a particular reagent Analyze a set of proteins to find diseases Workflow optimization Anomaly detection in a network 3D shape search with similar properties
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 5 Dealing with Specializations in Web and Relational Data Faceted Search present aspects of the results [Roy08] Query reformulation Modify some of the query conditions In structured databases [Mishra09] In web search [Dang10] First Study of Problem on GRAPHS
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 6 Graph Query Reformulation Results Query Specializations: query supergraphs … Exponential number of specializations
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 7 Challenges The number of reformulation is exponential Quantify the interestingness of a reformulation Finding query specializations is NP-complete
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 8 A Naïve Approach: k-most frequent super-patterns Query 480 matches 450 matches 100 matches Super Patterns 30 matches 420 matches Until k patterns are found: - Retrieve the most frequent super-pattern Until k patterns are found: - Retrieve the most frequent super-pattern Frequent ≠ Interesting !
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 9 Our Approach Graph Query Reformulation with Diversity Finds k meaningful specializations efficiently
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 10 Finding Meaningful Specializations Results Query Diversity Find k meaningful specializations: 1.Span all the results 2.Present different aspects of the results ? Find k meaningful specializations: 1.Span all the results 2.Present different aspects of the results ?
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 11 Diversity Matters Results Query Objective function f(Q) λ = 1 Non optimal: f({Q 1 ’,Q 2 ’}) = 7 Optimal: f({Q 3 ’,Q 4 ’}) = 8 λ = 1 Non optimal: f({Q 1 ’,Q 2 ’}) = 7 Optimal: f({Q 3 ’,Q 4 ’}) = 8
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 12 Problem Graph Query Reformulation with Diversity 12 Theorem (NP-hardness) The problem reduces to MAX-SUM Diversification Problem, so it is NP-hard Theorem (NP-hardness) The problem reduces to MAX-SUM Diversification Problem, so it is NP-hard
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 13 Solution: Greedy Algorithm 13 Greedy While k-specializations are not found 1.Find the specialization leading to the maximum increment of the objective function (marginal gain) 2.Add the specialization to the results Theorem The algorithm is a ½-approximation Theorem The algorithm is a ½-approximation Finding the maximum gain is #P-complete [Valiant79] Solution Fast_MMPG: Branch and bound algorithm to efficiently find the specialization with the maximum marginal gain
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 14 The multiplicity vector Results Output set of specializations
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 15 Upper bound on the Marginal gain Lemma The marginal gain increases if the multiplicity of the considered item is where |Q| is the number of reformulations in the reformulated set constructed so far. Lemma The marginal gain increases if the multiplicity of the considered item is where |Q| is the number of reformulations in the reformulated set constructed so far. Upper bound : is the value of the objective function considering only results with multiplicity Theorem
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 16 Upper bound Results Output set of specializations 12111
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 17 Until the reformulation with the maximum upper bound and marginal gain is not found 1.Expand the reformulation with the max upper bound 2.Prune Reformulations with marginal gain smaller than the upper bound so far Until the reformulation with the maximum upper bound and marginal gain is not found 1.Expand the reformulation with the max upper bound 2.Prune Reformulations with marginal gain smaller than the upper bound so far The Fast_MMPG Algorithm upper bound marginal gain
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 18 Experimental Setup 18 Datasets: AIDS: 10k chemical compounds Financial: 17k transaction workflows Web: 13k interactions with a recommender system Baseline algorithms: k-freq: returns top-k frequent supergraphs of a query LIndex: informative patterns index Experiments: Time and objective function value varying k, query size, λ Anecdotal Scalability
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 19 Time Comparison Number of specializations 1.k-freq runs only slighly faster 2.Time increases linearly in k 3.Fast_MMPG has real-time performance Query size 1.Fast_MMPG comparable to k- freq 2.Time decreases with query size (less reformulations) number of reformulations (k) query size
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 20 Objective function gain 20 Analysis 1.Lambda correctly moves the objective function towards diversity 2.k-freq only captures coverage
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 21 Qualitative evaluation k-freq Fast_MMPG C O O OH C O CH 3 C O Fe C O NH 2 C O CH 3 C O C O C C O CH 2 C C O NH 2 C O CH 2 C NH Query Analysis k-freq finds specialization of the same superquery Fast_MMPG returns reformulations with more diversified structures
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 22 Conclusions First study of the problem of query reformulation in graph databases Principled objective function optimizing coverage and diversity Algorithmic solutions with quality guarantees and real time responses
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 23 Questions? Thank you!