Jialong Han1, Kai Zheng2, Aixin Sun1, Shuo Shang3, and Ji-Rong Wen4

Slides:

Advertisements

Similar presentations

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.

Advertisements

Towards a Top-K SPARQL Query Benchmark Generator Shima Zahmatkesh 1, Emanuele Della Valle 1, Daniele Dell’Aglio 1, and Alessandro Bozzon 2 1 Politecnico.

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Discovering Queries based on Example Tuples

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.

1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.

Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,

School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.

Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.

Web Document Clustering: A Feasibility Demonstration Hui Han CSE dept. PSU 10/15/01.

Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.

Hierarchical Constraint Satisfaction in Spatial Database Dimitris Papadias, Panos Kalnis And Nikos Mamoulis.

Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.

GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao {

Lesley Charles November 23, 2009.

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.

Efficient Processing of Top-k Spatial Preference Queries

Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:

GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.

Exploiting Relevance Feedback in Knowledge Graph Search

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.

GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011

1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.

Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.

Outline Introduction State-of-the-art solutions Equi-Truss Experiments

Algorithms and Problem Solving

3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.

Orna Kupferman Yoad Lustig

Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS

Efficient processing of path query with not-predicates on XML data

RE-Tree: An Efficient Index Structure for Regular Expressions

TT-Join: Efficient Set Containment Join

CARPENTER Find Closed Patterns in Long Biological Datasets

Community Distribution Outliers in Heterogeneous Information Networks

StreamApprox Approximate Stream Analytics in Apache Spark

Frequent Neighborhood Patterns: Mining Algorithms and Applications

On Efficient Graph Substructure Selection

DATA CACHING IN WSN Mario A. Nascimento Univ. of Alberta, Canada

Internet of Things A Process Calculus Approach

Probably Approximately

Junqi Zhang+ Xiangdong Zhou+ Wei Wang+ Baile Shi+ Jian Pei*

Property consolidation for entity browsing

Enriching Structured Knowledge with Open Information

Discriminative Frequent Pattern Analysis for Effective Classification

Introduction Task: extracting relational facts from text

Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs

Diversified Top-k Subgraph Querying in a Large Graph

Block Matching for Ontologies

MCN: A New Semantics Towards Effective XML Keyword Search

Data Flow Analysis Compiler Design

Algorithms and Problem Solving

A Framework for Testing Query Transformation Rules

Faceted Filter Jidong Jiang

Efficient Processing of Top-k Spatial Preference Queries

Links Liang Zheng

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.

Presentation transcript:

Discovering Neighborhood Pattern Queries by Sample Answers in Knowledge Base Jialong Han1, Kai Zheng2, Aixin Sun1, Shuo Shang3, and Ji-Rong Wen4 1 Nanyang Technological University 2 The University of Queensland 3 China University of Petroleum (Beijing) 4 Renmin University of China 31/12/2018 ICDE 2016, Helsinki, Finland

Knowledge Bases and Structural Queries Knowledge bases: DBpedia, Freebase, YAGO, NELL, etc. Viewed as graphs Queried by structural queries SPARQL for RDF MQL for Freebase Cypher for neo4j Which chess player was born and died in the same place ? SELECT ?uri WHERE { ?uri :type :ChessPlayer . ?uri :birthPlace ?place . ?uri :deathPlace ?place } Complete Answers M. Botvinnik P. Morphy … 31/12/2018 ICDE 2016, Helsinki, Finland

Structural Query Discovery: Motivation It is always not easy for a user to write structural queries. She needs to follow the syntax; She needs to be familiar with types/relations used in the KB. Can we automatically find structural queries based on representative partial answers? Which chess player was born and died in the same place? SELECT ?uri WHERE { ?uri :type :ChessPlayer . ?uri :birthPlace ?place . ?uri :deathPlace ?place } Which chess player was born and died in the same place? Complete Answers M. Botvinnik P. Morphy … Representative Partial Answers M. Botvinnik ？ 31/12/2018 ICDE 2016, Helsinki, Finland

Motivating Example We concentrate on Neighborhood Pattern Queries (NPQ). One “pivot”. Does not involve numeric ops, regular expressions, etc. Given example entities 𝐼 from the user, all NPQs can be classified into three kinds. Irrelevant: results does not cover 𝐼; Not relevant enough: results cover 𝐼 but does not rank them high; Relevant: 𝐼 is ranked high in the results. Popularity Order Query (a) Query (b) Query (c) B. Obama E. Lasker V. Putin M. Botvinnik P. Morphy G. Kasparov Rank: +∞ Rank: 4 Rank: 1 31/12/2018 ICDE 2016, Helsinki, Finland

Problem Statement and Solution Overview Reverse Top-k Neighborhood Pattern Queries (RkNPQ) Given a knowledge base 𝐷 and a popularity order ≺ on 𝑉 𝐷 , for input nodes 𝐼⊆𝑉 𝐷 , find all neighborhood pattern queries 𝑞 s.t. 𝐷 𝑞 ⊇𝐼, and when ranking 𝐷 𝑞 according to ≺, nodes in 𝐼 all appear in the top-k results. Solution: filter and refine. Filter: generate all NPQs satisfying 1; Refine: eliminate all generated NPQs violating 2. 31/12/2018 ICDE 2016, Helsinki, Finland

The Filtering Stage Perform level-wise search on the query space. Start with the simplest shapes of NPQs (single node or edge). Generate complicated ones through Extend and Join on simple ones. Completeness guaranteed by [Han, CIKM’13]. Terminate a branch if condition 1 is violated. 𝐼 = { M. Botvinnik } 31/12/2018 ICDE 2016, Helsinki, Finland

Trivial Refine Execute all NPQs generated by the filter stage, and test for condition 2. Use SPARQL or graph query engines like neo4j, gStore, and JENA-TDB. Drawbacks: unnecessary or redundant computations are not removed. We propose three optimizations on this stage. 𝐼 = { M. Botvinnik } 31/12/2018 ICDE 2016, Helsinki, Finland

Refine Optimization 1: Shared Evaluation Observation 1: 𝐷 𝑞 of different 𝑞 overlap with each other. For q 1 , q 2 , q 1 is a sub-query of q 2 , we have 𝐷(𝑞 1 )⊇𝐷( 𝑞 2 ). Maintain 𝐷 𝑞 by (intersecting and) verifying results of sub-queries. 𝐼 = { M. Botvinnik } 31/12/2018 ICDE 2016, Helsinki, Finland

Refine Optimization 2: Indicator Answers Observation 2: To verify 𝑞, 𝐷 𝑞 needs not be completely evaluated. Define indicator answers 𝐼𝐴 𝑞 = 𝑣 𝑣∈𝐷 𝑞 ∧𝑣≺ 𝑖𝑛𝑓 𝐼 ∧𝑣∉𝐼 . Only nodes in 𝐼𝐴 𝑞 affect the Top-k condition. 𝑞 meets the Top-k condition iff. 𝐼𝐴 𝑞 ≤𝑘−|𝐼|. Indicator answers are compatible with shared evaluation! For q 1 , q 2 , q 1 is a sub-query of q 2 , we have 𝐼𝐴(𝑞 1 )⊇𝐼𝐴( 𝑞 2 ). Popularity Order Query (b) Query (c) B. Obama V. Putin M. Botvinnik G. Kasparov P. Morphy E. Lasker Rank: 4 Rank: 1 31/12/2018 ICDE 2016, Helsinki, Finland

Refine Optimization 3: Partial Evaluation Observation 3: Even 𝐼𝐴 𝑞 needs not be completely obtained to reject 𝑞. Only a lower bound of 𝐼𝐴 𝑞 is needed. Instead of one list 𝐼𝐴 𝑞 , we keep two: nodes confirmed/uncertain to be in 𝐼𝐴 𝑞 . Reject 𝑞 immediately if the confirmed list is long enough (>𝑘−|𝐼|). The number of “match” checks can be reduced. Popularity Order Query (b) Query (c) B. Obama V. Putin M. Botvinnik G. Kasparov P. Morphy E. Lasker Rank: 4 Rank: 1 31/12/2018 ICDE 2016, Helsinki, Finland

Experimental Settings Datasets Knowledge base: DBpedia 3.9. Popularity ranking: PageRank score. Queries: 52 questions from 250 in the QALD-4-Task-1 dataset Allocated into 5 groups w.r.t. the shape (size, radius) of their ground truth query. Compared variants: RkNPQ-gStore: Trivial refine using gStore [Zou, PVLDB’11] RkNPQ-S: Shared evaluation RkNPQ-SI: Shared evaluation of Indicator answers RkNPQ-SPI: Shared and Partial evaluation of Indicator answers Methodology and Metrics: Use top-1/2 results to call our algorithms; Investigate the effectiveness (# returned queries) and efficiency (running time). 31/12/2018 ICDE 2016, Helsinki, Finland

Effectiveness Classify questions into Easy/Moderate/Hard w.r.t. # returned queries. Simpler question groups have more Easy/Moderate questions. More example answers cause many questions to turn Easy/Moderate. The inherent ambiguity of the input is reduced. Two examples are generally enough for a browsable output. 31/12/2018 ICDE 2016, Helsinki, Finland

Efficiency Compare adjacent pairs of the four variants. When 𝐼 =𝑘=1, the three optimizations speed up RkNPQ by one to two orders of magnitudes, respectively. When 𝐼 =𝑘=2, 𝐼𝐴 𝑞 →𝐷(𝑞), making indicator answers less beneficial. 31/12/2018 ICDE 2016, Helsinki, Finland

Analysis of Parameter k What happens if we fix 𝐼 =1 and increase k? # Returned queries? Running time (RkNPQ-SPI)? More queries are returned. As k increases, the number tends to converge. Larger k prevents unsuccessful search, but hardens the browse of returned queries. RkNPQ-SPI is almost always faster than RkNPQ-SI. The running time increases slowly. 31/12/2018 ICDE 2016, Helsinki, Finland

Related Work Reverse Engineering Structured Queries SQL queries: [Tran, SIGMOD’09] and [Zhang, SIGMOD’13] Interactive setting: [Bonifati, EDBT’14], [Starworko, ICDT’12], and [Bonifati, EDBT’15] Reverse Query Problems for Vector Data Reverse top-k queries: [Vlachou, ICDE’10] Reverse KNN queries: [Korn, SIGMOD’00] Reverse skyline queries: [Dellis, PVLDB’07] Query by Example Entities & Tuples [Jayaram, TKDE’15], [Lim, EDBT’13], and [Mottin, PVLDB’14] Natural Language QA over Knowledge Bases [Unger, WWW’12], [Yahya, EMNLP’12], [Berant, EMNLP’13], and [Zou, SIGMOD’14]. 31/12/2018 ICDE 2016, Helsinki, Finland

Conclusions We propose Reverse top-k Neighborhood Pattern Queries to help users issue knowledge base queries using representative partial answers. The search space is explored under a filter-refine framework. Three optimizations on the refine stage are investigated. Shared evaluation, indicator answers, and partial evaluation. (When given enough examples) the RkNPQ-SPI algorithm can generate a small number of possible queries for the user within reasonable time. 31/12/2018 ICDE 2016, Helsinki, Finland

Thank you! Questions? 31/12/2018 ICDE 2016, Helsinki, Finland