Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jialong Han1, Kai Zheng2, Aixin Sun1, Shuo Shang3, and Ji-Rong Wen4

Similar presentations


Presentation on theme: "Jialong Han1, Kai Zheng2, Aixin Sun1, Shuo Shang3, and Ji-Rong Wen4"— Presentation transcript:

1 Discovering Neighborhood Pattern Queries by Sample Answers in Knowledge Base
Jialong Han1, Kai Zheng2, Aixin Sun1, Shuo Shang3, and Ji-Rong Wen4 1 Nanyang Technological University 2 The University of Queensland 3 China University of Petroleum (Beijing) 4 Renmin University of China 31/12/2018 ICDE 2016, Helsinki, Finland

2 Knowledge Bases and Structural Queries
Knowledge bases: DBpedia, Freebase, YAGO, NELL, etc. Viewed as graphs Queried by structural queries SPARQL for RDF MQL for Freebase Cypher for neo4j Which chess player was born and died in the same place ? SELECT ?uri WHERE { ?uri :type :ChessPlayer . ?uri :birthPlace ?place . ?uri :deathPlace ?place } Complete Answers M. Botvinnik P. Morphy 31/12/2018 ICDE 2016, Helsinki, Finland

3 Structural Query Discovery: Motivation
It is always not easy for a user to write structural queries. She needs to follow the syntax; She needs to be familiar with types/relations used in the KB. Can we automatically find structural queries based on representative partial answers? Which chess player was born and died in the same place? SELECT ?uri WHERE { ?uri :type :ChessPlayer . ?uri :birthPlace ?place . ?uri :deathPlace ?place } Which chess player was born and died in the same place? Complete Answers M. Botvinnik P. Morphy Representative Partial Answers M. Botvinnik 31/12/2018 ICDE 2016, Helsinki, Finland

4 Motivating Example We concentrate on Neighborhood Pattern Queries (NPQ). One “pivot”. Does not involve numeric ops, regular expressions, etc. Given example entities 𝐼 from the user, all NPQs can be classified into three kinds. Irrelevant: results does not cover 𝐼; Not relevant enough: results cover 𝐼 but does not rank them high; Relevant: 𝐼 is ranked high in the results. Popularity Order Query (a) Query (b) Query (c) B. Obama E. Lasker V. Putin M. Botvinnik P. Morphy G. Kasparov Rank: +∞ Rank: 4 Rank: 1 31/12/2018 ICDE 2016, Helsinki, Finland

5 Problem Statement and Solution Overview
Reverse Top-k Neighborhood Pattern Queries (RkNPQ) Given a knowledge base 𝐷 and a popularity order ≺ on 𝑉 𝐷 , for input nodes 𝐼⊆𝑉 𝐷 , find all neighborhood pattern queries 𝑞 s.t. 𝐷 𝑞 ⊇𝐼, and when ranking 𝐷 𝑞 according to ≺, nodes in 𝐼 all appear in the top-k results. Solution: filter and refine. Filter: generate all NPQs satisfying 1; Refine: eliminate all generated NPQs violating 2. 31/12/2018 ICDE 2016, Helsinki, Finland

6 The Filtering Stage Perform level-wise search on the query space.
Start with the simplest shapes of NPQs (single node or edge). Generate complicated ones through Extend and Join on simple ones. Completeness guaranteed by [Han, CIKM’13]. Terminate a branch if condition 1 is violated. 𝐼 = { M. Botvinnik } 31/12/2018 ICDE 2016, Helsinki, Finland

7 Trivial Refine Execute all NPQs generated by the filter stage, and test for condition 2. Use SPARQL or graph query engines like neo4j, gStore, and JENA-TDB. Drawbacks: unnecessary or redundant computations are not removed. We propose three optimizations on this stage. 𝐼 = { M. Botvinnik } 31/12/2018 ICDE 2016, Helsinki, Finland

8 Refine Optimization 1: Shared Evaluation
Observation 1: 𝐷 𝑞 of different 𝑞 overlap with each other. For q 1 , q 2 , q 1 is a sub-query of q 2 , we have 𝐷(𝑞 1 )⊇𝐷( 𝑞 2 ). Maintain 𝐷 𝑞 by (intersecting and) verifying results of sub-queries. 𝐼 = { M. Botvinnik } 31/12/2018 ICDE 2016, Helsinki, Finland

9 Refine Optimization 2: Indicator Answers
Observation 2: To verify 𝑞, 𝐷 𝑞 needs not be completely evaluated. Define indicator answers 𝐼𝐴 𝑞 = 𝑣 𝑣∈𝐷 𝑞 ∧𝑣≺ 𝑖𝑛𝑓 𝐼 ∧𝑣∉𝐼 . Only nodes in 𝐼𝐴 𝑞 affect the Top-k condition. 𝑞 meets the Top-k condition iff. 𝐼𝐴 𝑞 ≤𝑘−|𝐼|. Indicator answers are compatible with shared evaluation! For q 1 , q 2 , q 1 is a sub-query of q 2 , we have 𝐼𝐴(𝑞 1 )⊇𝐼𝐴( 𝑞 2 ). Popularity Order Query (b) Query (c) B. Obama V. Putin M. Botvinnik G. Kasparov P. Morphy E. Lasker Rank: 4 Rank: 1 31/12/2018 ICDE 2016, Helsinki, Finland

10 Refine Optimization 3: Partial Evaluation
Observation 3: Even 𝐼𝐴 𝑞 needs not be completely obtained to reject 𝑞. Only a lower bound of 𝐼𝐴 𝑞 is needed. Instead of one list 𝐼𝐴 𝑞 , we keep two: nodes confirmed/uncertain to be in 𝐼𝐴 𝑞 . Reject 𝑞 immediately if the confirmed list is long enough (>𝑘−|𝐼|). The number of “match” checks can be reduced. Popularity Order Query (b) Query (c) B. Obama V. Putin M. Botvinnik G. Kasparov P. Morphy E. Lasker Rank: 4 Rank: 1 31/12/2018 ICDE 2016, Helsinki, Finland

11 Experimental Settings
Datasets Knowledge base: DBpedia 3.9. Popularity ranking: PageRank score. Queries: 52 questions from 250 in the QALD-4-Task-1 dataset Allocated into 5 groups w.r.t. the shape (size, radius) of their ground truth query. Compared variants: RkNPQ-gStore: Trivial refine using gStore [Zou, PVLDB’11] RkNPQ-S: Shared evaluation RkNPQ-SI: Shared evaluation of Indicator answers RkNPQ-SPI: Shared and Partial evaluation of Indicator answers Methodology and Metrics: Use top-1/2 results to call our algorithms; Investigate the effectiveness (# returned queries) and efficiency (running time). 31/12/2018 ICDE 2016, Helsinki, Finland

12 Effectiveness Classify questions into Easy/Moderate/Hard w.r.t. # returned queries. Simpler question groups have more Easy/Moderate questions. More example answers cause many questions to turn Easy/Moderate. The inherent ambiguity of the input is reduced. Two examples are generally enough for a browsable output. 31/12/2018 ICDE 2016, Helsinki, Finland

13 Efficiency Compare adjacent pairs of the four variants.
When 𝐼 =𝑘=1, the three optimizations speed up RkNPQ by one to two orders of magnitudes, respectively. When 𝐼 =𝑘=2, 𝐼𝐴 𝑞 →𝐷(𝑞), making indicator answers less beneficial. 31/12/2018 ICDE 2016, Helsinki, Finland

14 Analysis of Parameter k
What happens if we fix 𝐼 =1 and increase k? # Returned queries? Running time (RkNPQ-SPI)? More queries are returned. As k increases, the number tends to converge. Larger k prevents unsuccessful search, but hardens the browse of returned queries. RkNPQ-SPI is almost always faster than RkNPQ-SI. The running time increases slowly. 31/12/2018 ICDE 2016, Helsinki, Finland

15 Related Work Reverse Engineering Structured Queries
SQL queries: [Tran, SIGMOD’09] and [Zhang, SIGMOD’13] Interactive setting: [Bonifati, EDBT’14], [Starworko, ICDT’12], and [Bonifati, EDBT’15] Reverse Query Problems for Vector Data Reverse top-k queries: [Vlachou, ICDE’10] Reverse KNN queries: [Korn, SIGMOD’00] Reverse skyline queries: [Dellis, PVLDB’07] Query by Example Entities & Tuples [Jayaram, TKDE’15], [Lim, EDBT’13], and [Mottin, PVLDB’14] Natural Language QA over Knowledge Bases [Unger, WWW’12], [Yahya, EMNLP’12], [Berant, EMNLP’13], and [Zou, SIGMOD’14]. 31/12/2018 ICDE 2016, Helsinki, Finland

16 Conclusions We propose Reverse top-k Neighborhood Pattern Queries to help users issue knowledge base queries using representative partial answers. The search space is explored under a filter-refine framework. Three optimizations on the refine stage are investigated. Shared evaluation, indicator answers, and partial evaluation. (When given enough examples) the RkNPQ-SPI algorithm can generate a small number of possible queries for the user within reasonable time. 31/12/2018 ICDE 2016, Helsinki, Finland

17 Thank you! Questions? 31/12/2018 ICDE 2016, Helsinki, Finland


Download ppt "Jialong Han1, Kai Zheng2, Aixin Sun1, Shuo Shang3, and Ji-Rong Wen4"

Similar presentations


Ads by Google