Download presentation
Presentation is loading. Please wait.
Published byAdrian Greer Modified over 9 years ago
1
Exemplar Queries: Knowledge Exploration using Information Graphs Davide Mottin, University of Trento August 20, 2015 @ RMIT University, Melbourne Department of Information Engineering and Computer Science
2
2 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Short Bio Education April 2015 – Now in the job market!: PhD in computer science from University of Trento Thesis title: “Advanced Query Paradigms for the Novice User” Advisors: Prof. Themis Palpanas, Prof. Yannis Velegrakis 2010/08: MSc/BSc in computer science Working Experience 2012: Yahoo! Labs, Barcelona under Dr. Francesco Bonchi 2011: Microsoft Research, Beijing under Dr. Haixun Wang
3
3 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Traditional Query Answering owns=Search Engine, based=California produces=Mobiles Database
4
4 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Hardly Expressible Queries Query??? Does not know how to describe other companies Database
5
5 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT The Exemplar Queries perspective “ I think the greatest way to learn is to learn by someone's example.” Tobey Maguire
6
6 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT A different need
7
7 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Existing Search Engines acquisitions like Google Youtube Yahoo!-Tumblr or Microsoft-Skype not present as interesting acquisitions. Cannot be solved by Related Queries [Boldi11,Bordino13] and Query Relaxation [Mottin13,Mishra09].
8
8 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT A new perspective
9
9 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Exemplar Queries Input: Q e, an example element of interest Output: set of elements in the desired result set Exemplar Query Evaluation evaluate Q e in a database D, finding a sample s find the set of elements a similar to s given a similarity relation [PVLDB 2014, SIGMOD 2014 (Demo)]
10
10 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Challenges Define the similarity between sample and answers Determine the best data-model for the problem Find answers efficiently
11
11 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Our Approach Exemplar Queries The user query is an indication of the structure of the answers
12
12 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Problem Solution Overview [SIGMOD Record 2014]
13
13 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT General Solution Input: User Query Q, an example of the expected results Output: Set of expected results Procedure: - Detect the sample for the query Q - Find the structures similar to the sample - Rank the results
14
14 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Data Model: Knowledge graph 14
15
15 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Strict equality: Edge Isomorphism 15 S A1 A2
16
Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT 16 Similarity: Edge Isomorphism D. Mottin et al. Exemplar queries: Give me an example of what you need. PVLDB, 7(5), 2014.
17
17 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT subgraph isomorphism is NP-complete [Cook71] Solution Input: User Query Q, an example of the expected results. Output: Set of expected results Procedure: - Detect the sample for the query Q - Find the structures edge isomorphic to the sample - Rank the results - Prune the non-matching nodes Solution 1.IterativePruning: fast reject non matching nodes
18
18 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT distance 1distance 2 abcabc 200121 d-neighborhood distance 1distance 2 abcabc 100011 Query node q1 Graph node 1 Difference 100110 100110 Theorem
19
19 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT d-neighborhood distance 1distance 2 abcabc 100011 distance 1distance 2 abcabc 111210 Query node q1 Graph node 2 Difference 01120 01120 Theorem
20
20 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT The IterativePruning Algorithm 1.Start from a query node q 2.Match q with the graph nodes 3.For each adjacent node of q 4.Find nodes in the graph from candidate map of q matching the edge 5.Repeat 2. with an adjacent node of q until all nodes have been visited Theorem (Pruning Completeness) No subgraph isomoprhic solution is discarded by IterativePruning Algorithm Theorem (Pruning Completeness) No subgraph isomoprhic solution is discarded by IterativePruning Algorithm
21
21 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Solution Input: User Query Q, an example of the expected results. Output: Set of expected results Procedure: - Detect the sample for the query Q subgraph isomorphism is NP-complete [Cook71] - Prune the non-matching nodes - Find the structures edge isomorphic to the sample - Rank the results - Restrict the search space Solution 1.IterativePruning: fast reject non matching nodes Solution 1.IterativePruning: fast reject non matching nodes 2.RelevantNeighborhood: restrict the search space to “near” nodes
22
22 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Restricting the search space 22 S A1 A2 User Query Idea 1.Not all the the nodes are equally relevant 2.Nodes “far” from the query are less related Idea 1.Not all the the nodes are equally relevant 2.Nodes “far” from the query are less related
23
23 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT The Relevant Neighborhood Algorithm Prune the search space by identifying the valuable portions: Based on an approximation of Personalized PageRank Transition matrix A with non-uniform edge weights based on inverse frequency Procedure 1.Assign each node in the sample a fixed number of particles 2.Distribute the particles on neighbor nodes favoring sample edge-labels 3.Repeat 2 until the number of particles is less than a threshold Procedure 1.Assign each node in the sample a fixed number of particles 2.Distribute the particles on neighbor nodes favoring sample edge-labels 3.Repeat 2 until the number of particles is less than a threshold
24
Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT 24 Similarity: Simulation D. Mottin et al. Exemplar queries: a New Way of Searching. Submitted for publication.
25
25 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Strict equality: Edge Isomorphism S A1 A2 Why Yahoo! Tumblr are not present?
26
26 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT More freedom: Simulation S A1 A2 Tumblr matches both an acquisition and a website Match edge-label sequences instead of structures
27
27 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Use Strong Simulation [Ma14], with: bounded matchings node-topology preserving Issue: Strong Simulation preserves node labels Idea: Apply Strong Simulation algorithm on a graph where edges becomes nodes with label equal to the original edge. Pruning: d-neighborhood becomes a boolean vector a node matches a query node if the boolean and between the two vectors is positive Theorem Algorithms for Simulation
28
28 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Ranking results 28 S A1 A2 User Query Google Yahoo! CBS Combination of two factors 1.Structural: similarity of two nodes in terms of neighbor relationships 2.Distance-based: the PageRank already computed
29
29 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Experimental Setup Dataset Freebase: 76M nodes, 314M edges (entire!) Freebase Internet Domain: 2M nodes, 6M edges Synthetic datasets Testset: 100 queries manually mapped from AOL query logs Baseline: NeMa [6]: approximate answers on graphs Measures Algorithms total time User study asking to evaluate the usefulness of our approach 29
30
30 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Scalability results (10M nodes) 30 Time RelevantNeighborhood is stable on the number of answers <150ms to get the answers
31
31 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Usefulness Quality 92% people say that Exemplar Queries are useful 62% already had the need for such a service Comparison Which method is preferred? 64% Exemplar Queries 30% Other approaches
32
32 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Simulation vs Isomorphism 32 Analysis Simulation finds more answers (up to 48%) but aggregates results Isomorphism runs faster than simulation (less operations on simple queries)
33
33 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Qualitative Evaluation 33 Query: Google – YouTube – Menlo Park Approximate Graph Query Answering [Khan13] Edge Isomorphism Simulation Answers are collapsed More interesting answers
34
34 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Size increment for Simulation 25% to 46% more edges than isomorphism: Answers are collapsed
35
35 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Dealing with too many results “ One of the effects of living with electric information is that we live habitually in a state of information overload. There's always more than you can cope with.” Marshall McLuhan
36
36 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Result Refinement
37
37 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Information overload 37 I want to know about IT company acquisitions
38
38 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Too many results to visualize
39
39 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Dealing with Information Overload Faceted Search present aspects of the results [Roy08] Query reformulation Modify some of the query conditions In structured databases [Mishra09] In web search [Dang10] Frist Study of Problem on GRAPHS
40
40 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Graph Search 40
41
41 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Graph Query Reformulation Results Query Reformulations: query supergraphs … Exponential number of reformulations
42
42 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Challenges The number of reformulation is exponential Quantify the interestingness of a reformulation Finding query reformulations is NP-complete
43
43 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT A Naïve Approach: k-most frequent super-graphs Query 480 matches 450 matches 100 matches Super graphs 30 matches 420 matches Until k reformulations are found: - Retrieve the most frequent super-pattern Until k reformulations are found: - Retrieve the most frequent super-pattern Frequent ≠ Interesting !
44
44 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Our Approach Graph Query Reformulation with Diversity Finds k meaningful reformulation efficiently D. Mottin, F. Bonchi, F. Gullo. Graph Query Reformulation with Diversity, KDD 2015.
45
45 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Finding meaningful Reformulations Results Query Diversity Find k meaningful reformulations: 1.Span all the results 2.Present different aspects of the results ? Find k meaningful reformulations: 1.Span all the results 2.Present different aspects of the results ?
46
46 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Diversity Matters Results Query Objective function f(Q) λ = 1 Non optimal: f({Q 1 ’,Q 2 ’}) = 7 Optimal: f({Q 3 ’,Q 4 ’}) = 8 λ = 1 Non optimal: f({Q 1 ’,Q 2 ’}) = 7 Optimal: f({Q 3 ’,Q 4 ’}) = 8
47
47 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Problem Graph Query Reformulation with Diversity 47 Theorem (NP-hardness) The problem reduces to MAX-SUM Diversification Problem, so it is NP-hard Theorem (NP-hardness) The problem reduces to MAX-SUM Diversification Problem, so it is NP-hard [KDD 2015]
48
48 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Solution: Greedy Algorithm Greedy While k-reformulations are not found 1.Find the reformulation leading to the maximum increment of the objective function (marginal gain) 2.Add the reformulation to the results 48 Theorem The algorithm is a ½-approximation Theorem The algorithm is a ½-approximation Finding the maximum gain is #P-complete [Valiant79] Solution Fast_MMPG: Branch and bound algorithm with quality guarantees
49
49 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT The multiplicity vector Results 0000011000221102222023331 Output set of reformulations
50
50 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Upper bound on the Marginal gain Lemma The marginal gain increases if the multiplicity of the considered item is where |Q| is the number of reformulations in the reformulated set constructed so far. Lemma The marginal gain increases if the multiplicity of the considered item is where |Q| is the number of reformulations in the reformulated set constructed so far. Upper bound : is the value of the objective function considering only results with multiplicity Theorem
51
51 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Upper bound Results 0000012111 Output set of reformulations 12111
52
52 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Until the reformulation with the maximum upper bound and marginal gain is not found 1.Expand the reformulation with the max upper bound 2.Prune Reformulations with marginal gain smaller than the upper bound so far Until the reformulation with the maximum upper bound and marginal gain is not found 1.Expand the reformulation with the max upper bound 2.Prune Reformulations with marginal gain smaller than the upper bound so far The Fast_MMPG Algorithm upper bound marginal gain
53
53 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Experimental Setup Datasets: AIDS: 10k chemical compounds Financial: 17k transaction workflows Web: 13k interactions with a recommender system Baseline algorithms: k-freq: returns top-k frequent supergraphs of a query LIndex: informative patterns index Experiments: Time and objective function value varying k, query size, λ Anecdotal Scalability
54
54 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Time Comparison Number of reformulations 1.k-freq runs only slighly faster 2.Time increases linearly in k 3.Fast_MMPG has real-time performance Query size 1.Fast_MMPG comparable to k- freq 2.Time decreases with query size (less reformulations) number of reformulations (k) query size
55
55 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Objective function gain Analysis 1.Lambda correctly moves the objective function towards diversity 2.k-freq only captures coverage
56
56 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Qualitative evaluation k-freq Fast_MMPG C O O OH C O CH 3 C O Fe C O NH 2 C O CH 3 C O C O C C O CH 2 C C O NH 2 C O CH 2 C NH Query Analysis k-freq finds reformulation of the same superquery Fast_MMPG returns reformulations with more diversified structures
57
57 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Conclusions Hardly Expressible Queries Exemplar Queries: user query is an example of the desired results Efficient algorithmic solution scaling on real knowledge graphs Study of 2 similarity measures for query answering Information Overload Study of the problem in graph databases Principled objective function optimizing coverage and diversity Algorithmic solutions with quality guarantees
58
58 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Other Studied Problems “ There are no right answers to wrong questions.” Ursula K. Le Guin
59
59 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Company Based Revenue Mobile Search Hardware Cloud AppleCupertino$62B0001 GoogleM.View$80B0110 HPPalo Alto$30B0010 Yahoo!Sunnyvale$16B 0100 Empty-Answer Problem COMPANY DB COMPANY DB query = Mobile, Search, Hardware {} No answer
60
60 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Dealing with the Empty Answer Problem Ranking results based on user preferences IR [Baeza11] and database solutions [Chaudhuri04] Query relaxation Modify some of the query conditions [Mishra09] (-) Suggests all the modification together (-) Does not take user feedback into account
61
61 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Our Solution: Interactive Query Relaxation Suggests one relaxation at a time Takes user feedback into account Models user preferences Optimization centric relaxation suggestions User centric (effort, relevance) System-centric (profit) [PVLDB 2013, SIGMOD 2014 (Demo)]
62
62 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Conclusions We propose Exemplar Query Framework on Information Graphs: user query is an example of the desired results We study Exemplar Query Answering: efficiently answering and ranking of exemplar queries Graph Query Reformulation: provide insights of the exemplar query answers We show Solutions scaling on real size information graphs Principled approaches with quality guarantee Practical applicability of the problem
63
63 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Future Directions Query reformulation in connected-graphs Current: set of small graphs (simulated in big graphs) Include User preferences In exemplar queries In graph query reformulation Multiple exemplar queries Current: single exemplar queries With multiple exemplar queries semantics changes
64
Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT 64 Questions? Thank you!
65
65 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Publications Hardly Expressible Queries D. Mottin, M. Lissandrini, Y. Velegrakis, T. Palpanas. Exemplar queries: Give me an example of what you need. PVLDB, 7(5), 2014. D. Mottin, M. Lissandrini, Y. Velegrakis, T. Palpanas. Searching with XQ: the eXemplar Query Search Engine. SIGMOD, 2014. M. Lissandrini, D. Mottin, D. Papadimitriou, T. Palpanas, Y. Velegrakis. Unleashing the power of information graphs. SIGMOD Record, 43(4), 2014. D. Mottin, M. Lissandrini, Y. Velegrakis, T. Palpanas. Exemplar queries: A new Way of Searching. (under submission) Information Overload D. Mottin, F. Bonchi, F. Gullo. Graph Query Reformulation with Diversity. (KDD 2015) Empty-Answer D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, Y. Velegrakis. A probabilistic optimization framework for the empty-answer problem. PVLDB, 6(14), 2013. D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, Y. Velegrakis. IQR: An interactive query relaxation system for the empty-answer problem. SIGMOD, 2014 D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, Y. Velegrakis. A holistic and principled approach for the empty-answer problem. (under submission)
66
66 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Bibliography [Mishra09] C. Mishra and N. Koudas. Interactive query refinement. In EDBT, 2009. [Roy08] S. Basu Roy, H. Wang, G. Das, U. Nambiar, and M. Mohania. Minimum- effort driven dynamic faceted search in structured databases. In CIKM, 2008. [Chadhuri04] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum. Probabilistic ranking of database query results. In VLDB, 2004. [Baeza11] R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval. 2011. [Haveliwala02] T. H. Haveliwala. Topic-sensitive pagerank. In WWW, 2002. [Cook71] S. A. Cook. The complexity of theorem-proving procedures. In Symposium on Theory of Computing, 1971. [Ma14] S. Ma, Y. Cao, W. Fan, J, Huai, and T. Wo. Strong simulation: Capturing topology in graph pattern matching. TODS, 2014. 66
67
67 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Bibliography [Valiant79] Leslie G Valiant. The complexity of computing the permanent. Theoretical computer science, 1979. [Dang10] V. Dang and B.W.Croft. Query reformulation using anchor text. In WSDM, 2010. [Bordino13] I. Bordino, G. De F. Morales, I. Weber, and F. Bonchi. From machu picchu to rafting the urubamba river: anticipating information needs via the entity- query graph. In WSDM, 2013. [Boldi11] P. Boldi, F., C. Castillo, and S. Vigna. Query reformulation mining: models, patterns, and applications. Information retrieval, 2011. [Khan13] A. Khan, Y. Wu, C. C. Aggarwal, and X. Yan. Nema: Fast graph search with label similarity. In PVLDB, 6(3), 2013. 67
68
68 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Research Topics Probabilistic databases Consider probabilistic knowledge bases to capture noise and uncertainty Propose solutions that cope with many world semantics Propose novel similarity measures for exemplar queries Define reformulations in a probabilistic fashion Exemplar Query Answering Framework Study the problem of identifying exemplar queries need Propose solutions for keyword queries to graph samples Extend current solution with incomplete queries or multiple queries Include reformulation capabilities Study exemplar queries in other context (research papers, newspapers, …)
69
69 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT Back-up slides
70
70 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT RelevantNeighborhood
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.