Summarizing Answer Graphs Induced by keyword Queries Yinghui Wu (UCSB)

Slides:



Advertisements
Similar presentations
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Advertisements

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
gSpan: Graph-based substructure pattern mining
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Schema Summarization cong Yu Department of EECS University of Michigan H. V. Jagadish Department of EECS University of Michigan
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
On Computing Compression Trees for Data Collection in Wireless Sensor Networks Jian Li, Amol Deshpande and Samir Khuller Department of Computer Science,
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Efficient and Robust Computation of Resource Clusters in the Internet Efficient and Robust Computation of Resource Clusters in the Internet Chuang Liu,
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Overview of Search Engines
Graph Algebra with Pattern Matching and Aggregation Support 1.
Yinghui Wu LFCS Lab Lunch Homomorphism and Simulation Revised for Graph Matching.
4/20/2017.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute.
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Querying Structured Text in an XML Database By Xuemei Luo.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
A Graph-based Friend Recommendation System Using Genetic Algorithm
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Path-Hop: efficiently indexing large graphs for reachability queries Tylor Cai and C.K. Poon CityU of Hong Kong.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Marina Drosou, Evaggelia Pitoura Computer Science Department
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
1 LinkClus: Efficient Clustering via Heterogeneous Semantic Links Xiaoxin Yin, Jiawei Han Univ. of Illinois at Urbana-Champaign Philip S. Yu IBM T.J. Watson.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
Exploiting Relevance Feedback in Knowledge Graph Search
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Containment of Partially Specified Tree-Pattern Queries
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Outline Introduction State-of-the-art solutions
Answering pattern queries using views
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
RE-Tree: An Efficient Index Structure for Regular Expressions
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Computing Full Disjunctions
Probabilistic Data Management
Associative Query Answering via Query Feature Similarity
Query-Friendly Compression of Graph Streams
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Generic and Automatic Address Configuration for Data Center Networks
Efficient Subgraph Similarity All-Matching
MCN: A New Semantics Towards Effective XML Keyword Search
A Framework for Testing Query Transformation Rules
Approximate Graph Mining with Label Costs
Presentation transcript:

Summarizing Answer Graphs Induced by keyword Queries Yinghui Wu (UCSB)

Keyword query over knowledge graph 2 … Aspen, companyFord, company New York, city…Chicago, city USA, country history Jaguar XJ Jaguar S type Black Jaguar animal White Jaguar animal history habitat North America continent South America continent … Offer m Offer 1 New York, city …Chicago, city USA, country Jaguar XK 001 Jaguar XK 007 Q = ‘Jaguar’, ‘America’, ‘history’ Ambiguous! … Searching big (graph) data with keyword query: too ambiguous! South American Jaguars history Argentina South America continent … Keyword search is ambiguous over schema-less graphs

Graph queries? Graph queries: Xpath, Xquery, SPARQL, regular path languages,... - explicitly define relationships among keywords - Higher expressive power, much lower usability! - Complex syntax and grammar! - Writing good queries require users to understand data beforehand! 3 Graph queries helps, but are too hard to write for end users

Graph Summarization 4 … Aspen, companyFord, company New York, city…Chicago, city USA, country history Jaguar XJ Jaguar S type Black Jaguar animal White Jaguar animal history habitat North America continent South America continent … Offer m Offer 1 New York, city …Chicago, city USA, country Jaguar XK 001 Jaguar XK 007 Q = ‘Jaguar’, ‘America’, ‘history’ Car company city history USA, country history habitat Americas, continent Ambiguous! … “A summary is worth a thousand words” Idea: summarize answer graphs to suggest graph queries! suggested graph queries

Outline Searching big (graph) data ◦ keyword searching is ambiguous ◦ graph queries are good, but too hard to write for end users! ◦ Idea: use summaries of answer graphs to suggest graph queries ◦ Traditional (graph) compression and summarization do not work Answer graph summarization ◦ “query-aware” summaries ◦ conciseness and coverage ◦ 1-summarization, α-summarization, K summarization ◦ Experimental results Conclusion

Keyword queries over graphs Keyword query: a set of keywords Q(k1, … km) A data graph: G = (V,E,L) of a set of labelled nodes and edges Answering keyword query Q in G ◦Q -> a set of answer graphs G =(G1,.. Gn) induced by Q in G ◦Gi contains a set of keyword nodes corresponding to keywords in Q, and a set of intermediate nodes on the paths connecting two keyword nodes. ◦Paths in Gi: connections /relationship of the keywords 6

Result graphs: examples 7 “workshop, paper, Ricardo” (XRank, SIGMOD 03) “Database, Papakonstantinous” (EASE, SIGMOD 08) Papakonstantinous “..Keyword search on graphs..” “wright london” (“From Keywords to Semantic Queries”, Web Semant. 2009) “Texas apparel retailer '” (“Query Biased Snippet Generation in XML Search”, SIGMOD 2008) Keyword processing generates answer graphs

Keyword induced answer graph summarization 8 Striking a balance between usability-expressiveness trade-off Keyword queries Keyword induced query suggestion graph queries (SPARQL, pattern queries, XQuery…) Query interpretationQuery transformation Query evaluation Result summarization Query refinement usabilityexpressiveness Our work

Application: query suggestion/expansion 9 Answer graph summarization for keyword query suggestion Keyword query: “Jaguar”, “America”, “history” Black Jaguar animal White Jaguar animal history habitat North America continent South America continent … Aspen, companyFord, company New York, city…Chicago, city USA, country history Jaguar XJ Jaguar S type Car company city history USA, country history habitat Americas, continent Answer graphs Suggested queries refined queries Suggest structured queries

Application: result understanding Q = “protected area, habitat, mammal, fish, bird” “Show me the summary for bird, habitat and protected area.” 10 Habitat (South America) bird (grebe) bird (crane) (Protected area) Rara national park Habitat (Burma) Answer graph summarization for result understanding

Answer graph and summaries An answer graph induced by Q ◦keyword nodes and intermediate nodes A summary graph Gs for a set of answer graphs G ◦an abstraction that preserves pairwise connection relationships of keywords ◦Each node is a group of keyword nodes or intermediate nodes ◦For any path between two keyword nodes in Gs, there is a path with the same label connecting two keyword nodes in the union of answer graphs in G 11 … Aspen, companyFord, company New York, city…Chicago, city USA, country history Jaguar XJ Jaguar S type company city history USA, country Q: {Jaguar, USA, history} answer graph a summary graph never suggest “false” paths! Summarizing connection relationships among keywords

A comparison with graph summarization 12 “Graph Summarization with Bounded Error”, SIGMOD 08 “Efficient Aggregation for Graph Summarization”, SIGMOD 08 “Top K exploration of query candidates for efficient keyword search on graph-shaped data”, ICDE 09 not “query- aware”! Require schema! Traditional summarization do not work well for keyword query our summarization are keyword query-aware, requires no schema, and preserve path information without extra data structures

Quality of a summary Conciseness (summary size) Coverage: α-summary, where α=2*M/(|Q|(|Q|-1), and M is the number of “covered” keyword pairs ◦A keyword pair (k1, k2) in Q is “covered” by Gs if for every answer graph in G and every path between k1 and k2, there is a path of the same label in Gs 13 … Aspen, companyFord, company New York, city…Chicago, city USA, country history Jaguar XJ Jaguar S type … Offer m Offer 1 New York, city …Chicago, city USA, country Jaguar XK 001 Jaguar XK 007 Car company city history USA, country offer Q={‘Jaguar, American, history’} 1-summary G s0 Quality: conciseness and information coverage

14 a1* a2* b1 b2d1 f1*e1*c1* a3* e1*e2* g1* d2d3 a4* e3* g2* d4d5d6d7 d8d9 a* bd c* a* d e* g* Example … G1 G2G3 0.1-summary G s summary G s2 Q = ‘a,c,e,f,g’ (‘a, c’), {G1, G2} (‘a, e, g’), {G1, G2} Bisimulation, (R.Gentilini et.al, 2003) can’t merge b1 and b2! Error-tolerant and structure-based summary (R.Gentilini et.al, 2003) Introduce “false paths”! a* d e* g* d (‘a, e, g’), {G3} G s3

Find Summary graphs with high quality Minimum α-summarization: Given keyword query Q and its induced answer graph set G, identify a α-summary graph with minimum size ◦special case: minimum 1-summarization K summarization: Given Q, G and integer K, find a summary graph set Gs where (1) each summary graph in Gs is a 1-summary graph for a subset G i of G, (2) all G i forms a partition of G, and (3) the total size of summary graphs is minimized. 15 ProblemsComplexityAlgorithmsApplication Minimum 1- summarization PTIME O(|Q| 2 | G |+| G | 2 ) Structured query suggestion, query expansion Minimum α- summarization NP-c O(m|| G | 2 ) Structured query suggestion, query expansion, result summarization K-summarizationNP-c O(I*K*|G m | 2 +(|Q| 2 | G |+| G | 2 ) Result classification, result diversification, query expansion based on clustered results

Compute 1-summary Dominance relation R(k,k’) ◦A binary relation over the nodes in an answer graph ◦A pair of nodes (v1,v2) is in R(k,k’) iff they have the same label, and for any path between keyword nodes for k and k’ passing v1, there is a path of the same label between keyword nodes for k and k’ passing v2. ◦A node v2 dominates v1 w.r.t a keyword pair (k,k’) if (v1, v2) is in R(k,k’); they are equivalent if they dominate each other ◦Keyword nodes for the same keyword are always equivalent 16 a1* a2* b1 b2d1 f1*e1*c1* R(a, c)

A sufficient and necessary condition 17 Given Q and G, a summary graph Gs is a minimum 1-summary graph for G and Q, If and only if for each keyword pair (k,k’) from Q, - for each intermediate node vs in Gs, there is a node [vs] in Gs; - for any vi and vj in [vs], (vi, vj) is in R(k,k’); - for any intermediate nodes vs1 and vs2 in Gs with same label and any nodes v1 in [vs1], v2 in [vs2], v1 and v2 do not dominate each other. a4* e3* g2* d4d5d6d7 d8d9 a* d e* g* … G3 PTIME checkable minimum 1-summary graph are essentially unique

Computing minimum 1- summary 18 Summary graph construction Assign a node for each node set Inserting edges between nodes Reduce answer graphs Remove dominated nodes Combine equivalent node sets Compute dominance relation Induce connection graph Fixpoint computation … company company city … city USA, country history Jaguar XJ … offer offer city … city USA, country Jaguar XJJaguar S type Q= “Jaguar”, “America”, “history” company city history USA, country Jaguar (car) offer Subgraph induced by keyword pairs and paths connecting them Node u is dominated by v for keyword pair in terms of path labels Computing summary graphs with minimum size

Compute α-summary Minimum α-summary: a greedy heuristic ◦computes connection graph induced by all keyword pairs ◦Start with the minimum connection graph; each time select a keyword pair and its connection graph minimum merge cost (estimation of the increased size to the summary) ◦Repeat until an α-summary is constructed 19 g1* d3 a3* (a,g) a3* e2* g1* d3 +(e,g) a1* a2* b1 b2d1 f1*e1*c1* a3* e1*e2* g1* d2d3 a1* a2* b2 d1 e1* a3* e1*e2*g1* d2 d3 +(a,e) a* b2d1 a* d2 e2* g1* d3 e1* 0.3-summary (a,e,g) can be used to find a minimum α and summary for specified keywords trade-off between information coverage and summary size

Computing K summary 20 Minimum K-summary: a K-center clustering process ◦Initializes K “center” answer graphs ◦Iteratively refines K cluster by merging answer graphs with minimum estimated merge cost until convergence ◦Computes K summary graphs for each cluster trade-off between information coverage and summary size a1* a2* b1 b2d1 f1*e1*c1* a3* e1*e2* g1* d2d3 a4* e3* g2* d4d5d6d7 d8d9 … G1 G2 G3 b1 b2d1 f*e*c* a* d e* g* a* {} {} {} }{ 2 summary

Experimental study Datasets: ◦DBLP with 2.47 million nodes and edges, with 24 labels (types); ◦DBpedia with 1.2 million nodes and 16 million edges, with 122 types; ◦YAGO with 1.6 million nodes and 4.48 million edges, with richer schemas: 2595 types Answer graph generation: ◦Keyword search algorithms from ◦“Bidirectional expansion for keyword search on graph databases”, VLDB 2005 ◦“Ease: an effective 3-in-1 keyword search method for undstructed, semi-structured and structured data, SIGMOD 2008” 21

Experimental study: effectiveness 22 query suggestion with good information coverage (67% path labels, α=0.3) Query: “Jaguar”, “North America” Suggested queries: “interesting” expansion

Experimental study: effectiveness 23 Significantly compress the original graphs with good coverage ratio

Experimental study: efficiency 24 Efficient in general, and scale well with the number of graphs, coverage requirement and partition size

Conclusion New challenge for keyword searching over knowledge graph ◦ keyword querying is ambiguous! ◦ graph queries are more specific, but are hard to write! Idea: (graph) query suggestion and result analysis by summarizing answer graphs, induced by keywords Exact and heuristic algorithms for computing 1-summary, α- summary and K summary Application: query interpretation, result understanding and suggest an interactive keyword searching framework 25

Future work Consider keywords of different weights or “interestingness” Performance guarantees on summary quality and improved efficiency Enhance keyword search with summary structures 26

Resources All of projects will be announced in this link: - Ontology-based subgraph matching -Ness and Nema -Sedge: Acknowledgement: Information Network Science CTA, ARL Our group: Xifeng Yan, Shengqi, Fangqiu Han… 27