Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim 2011. 3. 23.

Slides:



Advertisements
Similar presentations
Graph Mining Laks V.S. Lakshmanan
Advertisements

 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
Two-dimensional pattern matching M.G.W.H. van de Rijdt 23 August 2005.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
gSpan: Graph-based substructure pattern mining
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
RDFBrowser A tool to analyse metadata Bernhard Schueler CSCI 8350, Spring 2002,UGA.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Knowledge Graph: Connecting Big Data Semantics
University of Illinois at Urbana-Champaign Graph Indexing: Tree + Δ ≥ Graph Peixiang Zhao Jeffrey Xu Yu Philip S. Yu Peixiang Zhao Jeffrey Xu Yu Philip.
Connected Substructure Similarity Search Haichuan Shang The University of New South Wales & NICTA, Australia Joint Work: Xuemin Lin (The University of.
Mining Graphs.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Cloud Service Placement via Subgraph matching
Rakesh Agrawal Ramakrishnan Srikant
Graph Substructure Search Xuemin Lin School of Computer Science and Engineering University of New South Wales Sydney, Australia.
IGraph: A Framework for Comparisons of Disk-Based Graph Indexing Techniques Jeffrey Xu Yu et. al. VLDB ‘10 Presented by Tao Yu.
Association Analysis (7) (Mining Graphs)
Chen Chen 1, Cindy X. Lin 1, Matt Fredrikson 2, Mihai Christodorescu 3, Xifeng Yan 4, Jiawei Han 1 1 University of Illinois at Urbana-Champaign 2 University.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
Performance and Scalability: Apriori Implementation.
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
What Is Sequential Pattern Mining?
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
Slides are modified from Jiawei Han & Micheline Kamber
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Topological Summaries: Using Graphs for Chemical Searching and Mining Graphs are a flexible & unifying model Scalable similarity searches through novel.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
by Dayu Yuan The Pennsylvania State University
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/ /11/141.
Graph Indexing From managing and mining graph data.
CHAPTER 13 GRAPH ALGORITHMS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA.
Data Mining: Principles and Algorithms Graph Pattern Mining Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign
Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.
1 Substructure Similarity Search in Graph Databases R 陳芃安.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
Gspan: Graph-based Substructure Pattern Mining
Probabilistic Data Management
September 19, 2018.
Jiawei Han Department of Computer Science
Graph Search with Indexing
TT-Join: Efficient Set Containment Join
Mining, Indexing and Searching Graphs in Biological Databases
Design of Declarative Graph Query Languages: On the Choice between Value, Pattern and Object based Representations for Graphs Hasan Jamil Department of.
Graph Database Mining and Its Applications
Mining and Searching Graphs in Biological Databases
Data Warehousing Mining & BI
Efficient Subgraph Similarity All-Matching
Slides are modified from Jiawei Han & Micheline Kamber
Approximate Graph Mining with Label Costs
Presentation transcript:

Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim

Outline Category of graph queries Querying in collection DB References 2/22

Category of Graph Queries: Matching Type Exact subgraph matching –Find graphs in DB which have all components of the query graph Similarity subgraph matching –Find graphs in DB which have some components of the query graph –Similarity measure is needed Super graph matching –Find graphs in DB which are contained in the query graph Query graphExact subgraphSimilarity Subgraph Query graph 3/22

Category of Graph Queries: Target DB Collection DB: large number of small graphs –e.g. Chemical compounds –Retrieval component –IDs of graphs which contain matching parts Large graphs: small number of large graphs –e.g. Social network, RDF graph –Retrieval component –All matching subgraphs G1G1 G2G2 G3G3 G4G4 G7G7 G6G6 G5G5 Query graph G 1, G 3, G 5 Results: graph ID list Querying Collection DB Query graph Results: matching subgraphs Querying Large Graphs 4/22

Query Processing in Collection DB Processing flow Verification uses usual pair-wise subgraph isomorphism algorithm Most of techniques focus on filtering techniques –The cost of verification is high –To reduce the number of verification execution Query Filtering Candidate graph set Verification Answer Graphs 5/22

Query Processing in Large Graphs Processing flow Focus on node indexing –To reduce search space –Use structural information of nodes Build subgraph by joining candidate nodes –Join methods are not relatively researched –Optimization using join ordering Query Index search Candidate node sets Building subgraphs Answer subgraphs 6/22

Graph Indexing Techniques Target DatabaseQuery Type GraphGrep [Shasha et al., PODS’02] Collection DBExactFeature(Path) based index gIndex [Yan et al., SIGMOD’04] Collection DBExactFeature(Graph) based index Grafil [Yan et al., SIGMOD’05] Collection DBExact & SimilarityFeature based similarity search C-tree [He and Singh, ICDE’06] Collection DBExact & SimilarityClosure based index QuickSI [Shang et al., VLDB’08] Collection DBExactVerification algorithm Tale [Tian and Patel, ICDE’08] Collection DBExact & SimilaritySimilarity search using node index GraphQL [He and Singh, SIGMOD’08] Large graphsExactNode indexing Spath [Zhao and Han, VLDB’10] Large graphsExact Node indexing using neighborhood information 7/22

Outline Category of graph queries Querying in collection DB References 8/22

GraphGrep(1/2) [Shasha et al. PODS’02] First work adopts the filtering-and-verification framework Path-based index –Fingerprint of database –Enumerate the set of all paths(length <= L) of all graphs in DB –For each path, the number of occurrences in each graphs are stored in hash table B A C B B A C B D E C AB B C Keyg1g1 g2g2 g3g3 h(CA)101 … h(ABCB)220 g1g1 g2g2 g3g3 Index 9/22

GraphGrep(2/2): Query Processing Filtering –Make the fingerprint of query q –Hash all paths (length <= L) of q –Compare the fingerprint of the query with the fingerprint of database –Discard a graph whose value in fingerprint is less than the value in query fingerprint Verification –Check subgraph isomorphism tests Keyg1g1 g2g2 g3g3 h(AB)221 h(AC)101 h(BAC)201 B A C B B A C B D E C AB B C g1g1 g2g2 g3g3 Index B AC AB:1 AC:1 BAC:1 Query Candidates = {g 1, g 3 } Verification 10/22

gIndex(1/6) [Yan et al., SIGMOD’04] Path-based approach has week points –Path is too simple: structural information is lost –There are too many paths: the set of paths in a graph database usually is huge Solution –Use graph structure instead of path as the basic index feature cccc cc cc cc c c cc c c c c c c Sample Database c cc c c c Query ccc ccc Paths in Query Graph Cannot Filter Any Graphs In Database 11/22

gIndex(2/6): Frequent Fragment The number of graph structure is large  Index only frequent subgraphs support(g) –The number of graphs in D (graph database), where g is a subgraph minSup –Minimum support threshold –Index a fragment, g only if support(g) ≥ minSup Size-increasing support –Frequent fragments are increasing as the size of a fragment increases –Low minSup for small fragments, high minSup for large fragment 12/22

gIndex(3/6): Frequent Fragment AA B AA BB AA BB A A BB AA AB AAB ABB BAB ABA AB B A AA B A BB BA B A BA B A BB A A A B B A A A BB Size=1 Size=2Size=3 Size=4 F=3 F=4 BB F=3 F=2 F=1 F=2 F=1 minSup=1 minSup=2 13/22

gIndex(4/6): Discriminative Fragment AA B AA BB AA BB AAB ABB AB B A Size=2 Size=3 D f1 ={g 1, g 2, g 3 } D f2 ={g 2, g 3, g 4 } D f3 ={g 2, g 3 }=D f1 ∩D f2 f1f1 f2f2 f3f3 g1g1 g2g2 g3g3 A A BB g4g4 14/22

a gIndex(5/6): gIndex Tree Use graph serialization method –For fast graph isomorphism checking during index search –DFS coding [Yan et al. ICDM’02] –Translate a graph into a unique edge sequence gIndex Tree –Prefix tree which consists of the edge sequences of discriminative fragments –Record all size-n discriminative fragments in level n –Black nodes  discriminative fragments –Have ID lists: the ids of graphs containing f i –White nodes  redundant fragments; for Apriori pruning X X ZY b a b a X X ZY b b a v0v0 v1v1 v2v2 v3v3 DFS Coding f1f1 f2f2 f3f3 e1e1 e2e2 e3e3 Level 0 Level 1 Level 2 … gIndex Tree 15/22

gIndex(6/6): Searching Searching process –Given a query q, enumerate all q’s fragments (size <= maxSize) –Locate the fragments in gIndex tree –Intersect the id lists associated with the fragments Apriori pruning –Generating every fragment is inefficient –If a fragment is not in gIndexTree, we need not check its super-graphs any more –Redundant fragments need to be recorded for Apriori pruning f1f1 f2f2 f3f3 e1e1 e2e2 e3e3 Level 0 Level 1 Level 2 … gIndex Tree Query Fragments  stop … 16/22

Grafil(1/4) [Yan et al., SIGMOD’05] Feature Vector G1G1 {u 1, u 2, …, u n } G2G2 … GnGn Subgraph exact search Subgraph similarity search {v 1, v 2, …, v n } Query 17/22

Grafil(2/4): Feature Misses Query Relaxed Queries Features fafa fbfb fcfc fafa fbfb fcfc 124 fafa fbfb fcfc 103 fafa fbfb fcfc 012 fafa fbfb fcfc 012 Miss 1 edges =4 =3 Feature Miss 7-4=3 7-3=4 Maximum Feature Misses m max =4 18/22

Grafil(3/4): Feature Miss Estimation Problem –Given a query Q and a set of features contained in Q, if the relaxation ratio is given, what is the maximal number of features that can be missed? Use edge-feature matrix –Find the maximum number of columns that can be hit by k rows –K: the number of missing edges in Q Classic maximum coverage problem (set k-cover) –Proved NP-complete Features fafa fbfb fcfc Query fafa f b1 f b2 f c1 f c2 f c3 f c4 e1e e2e e3e Edge-Feature Matrix e1e1 e2e2 e3e3 19/22

Grafil(4/4): Feature Conjugation Compensate the misses of a feature by occurrences of another features in G Using all the features together in one filter would deteriorate the filtering performance Solution –Use multiple filters –Feature set selection Query Features fafa fafa fbfb 34 m max =4 (3-0)+0=3 ≤ m max A B A A A A C B B B fbfb C A A A A A C Graph Relaxation Ratio = 1 20/22

Graph Indexing Techniques Target DatabaseQuery Type GraphGrep [Shasha et al., PODS’02] Collection DBExactFeature(Path) based index gIndex [Yan et al., SIGMOD’04] Collection DBExactFeature(Graph) based index Grafil [Yan et al., SIGMOD’05] Collection DBExact & SimilarityFeature based similarity search C-tree [He and Singh, ICDE’06] Collection DBExact & SimilarityClosure based index QuickSI [Shang et al., VLDB’08] Collection DBExactVerification algorithm Tale [Tian and Patel, ICDE’08] Collection DBExact & SimilaritySimilarity search using node index GraphQL [He and Singh, SIGMOD’08] Large graphsExactNode indexing Spath [Zhao and Han, VLDB’10] Large graphsExact Node indexing using neighborhood information 21/22

References [Shasha et al., PODS’02] Dennis Shasha, Jaso T. L. Wang, Rosalba Giugno, Algorithmics and Applications of Tree and Graph Searching. PODS, [Yan et al., SIGMOD’04] Xifeng Yan, Philip S. Yu, Jiawei Han, Graph Indexing: A Frequent Structure-based Approach. SIGMOD, [Yan et al., SIGMOD’05] Xifeng Yan, Philip S. Yu, Jiawei Han, Substructure Similarity Search in Graph Databases. SIGMOD, [Tian and Patel, ICDE’08] Yuanyuan Tian, Jignesh M. Patel. TALE: A Tool for Approximate Large Graph Matching. ICDE, [He and Singh, SIGMOD’08] Huahai He, Ambuj K. Singh. Graphs-at-a-time: query language and access methods for graph databases. SIGMOD, [Zhao and Han, VLDB’10] Peiziang Zhao, Jiawei Han. On Graph Query Optimization in Large Networks. VLDB, [He and Singh, ICDE’06] Huahai He, Ambuj K. Singh, Closure-Tree: An Index Structure for Graph Queries. ICDE, 2006 [Shang et al., VLDB’08] Haichuan Shang, Ying Zhang, Xuemin Lin, Jeffrey Xu Yu, Taming Verification Hardness: An Efficient Algorithm for Testing Subgraph Isomorphism. VLDB, /22