Download presentation
Presentation is loading. Please wait.
Published byReijo Hänninen Modified over 6 years ago
1
Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers/VLDB2015
报告人:胡信晖 2019/1/18
2
Content Previous work Problem Approach index structure
search algorithm Experiments 2019/1/18
3
Previous work Finding the subtrees of the knowledge graph that contain all the keywords and return them in ranked order The returned subtrees with a heterogeneous mass of shapes might correspond to different interpretations of the query Not adequate when the user’s query is to look for a table of entities 2019/1/18
4
This article’s target 2019/1/18
5
This article’s target 原先的方法: 文章的方法 2019/1/18
6
Two basic definition Path patterns.
The path pattern for w is the concatenation of node/edge types on the path T(w). Tree patterns. A vector with the ith entry as the path pattern of the root-leaf path containing the ith keyword wi. 2019/1/18
7
INDEXING PATH PATTERNS
The problem of counting the number of tree patterns with height at most d for a keyword query q in a knowledge is #P- Complete. 所以文章首先给图的path pattern建立索 引。 2019/1/18
8
INDEXING PATH PATTERNS
2019/1/18
9
INDEXING PATH PATTERNS
索引的结果就是每一条路径长度小于 threshold d的完整的path以及path对应 的pattern、path的root、root的关键词 For IMDB, the knowledge graph contains only paths of length at most three, and the size of the indexes is 0.8 GB 2019/1/18
10
INDEXING PATH PATTERNS
2019/1/18
11
SEARCHING WITH PATH INDEX
Pattern Enumeration-Join Approach Linear-Time Enumeration Approach 2019/1/18
12
Pattern Enumeration-Join Approach
Consider a query “database software company revenue” — “w1,w2,w3,w4” 2019/1/18
13
Pattern Enumeration-Join Approach
缺点在于,如果形成的tree pattern是 empty的话,会存在很多不必要的运算 因此最坏的情况下的运行时间对于 subtree的数量以及index size来说是指数 级的 2019/1/18
14
Linear-Time Enumeration Approach
Based on the simple fact that a node in the knowledge graph is the root of some valid subtree if and only if it can reach every keyword at some node。 首先搜索这些node形成candidate nodes 如此一来,从这些candidate node出发 形成的tree pattern就不会空,从而避免 许多不必要的计算 2019/1/18
15
Linear-Time Enumeration Approach
And to generate each valid subtree, the time it needs is linear in its tree size。 2019/1/18
16
Experiments Datasets. Wiki and IMDB
Queries. We randomly selected 500queries from Bing’s log for experiments on Wiki. The numbers of keywords in the queries vary from 1 to 10, and for each we have 50 queries. For IMDB, we randomly constructed 500 queries from IMDB’s vocabulary. Index size and height threshold d. D = 2, 3, and 4 for the Wiki dataset. For IMDB, the knowledge graph contains only paths of length at most three. 2019/1/18
17
Experiments 2019/1/18
18
Experiments 2019/1/18
19
Experiments For queries with larger numbers of valid subtrees (query 1 and query 2), LETopK becomes much (5x-20x) faster than PETopK, while preserving reasonably high precision (above 80%). For the query with a smaller number of valid subtrees (query 3), when LETopK is faster than PETopK, the precision of LETopK is still consistently stable at round 0.95. 2019/1/18
20
谢谢 2019/1/18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.