On Efficient Graph Substructure Selection

Slides:



Advertisements
Similar presentations
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
Advertisements

1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Fast Algorithms For Hierarchical Range Histogram Constructions
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
Connected Substructure Similarity Search Haichuan Shang The University of New South Wales & NICTA, Australia Joint Work: Xuemin Lin (The University of.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Cloud Service Placement via Subgraph matching
Parallel Subgraph Listing in a Large-Scale Graph Yingxia Shao  Bin Cui  Lei Chen  Lin Ma  Junjie Yao  Ning Xu   School of EECS, Peking University.
Discovering Substructures in Chemical Toxicity Domain Masters Project Defense by Ravindra Nath Chittimoori Committee: DR. Lawrence B. Holder, DR. Diane.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
Constrained Pattern Assignment for Standard Cell Based Triple Patterning Lithography H. Tian, Y. Du, H. Zhang, Z. Xiao, M. D.F. Wong Department of ECE,
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
Escape Routing For Dense Pin Clusters In Integrated Circuits Mustafa Ozdal, Design Automation Conference, 2007 Mustafa Ozdal, IEEE Trans. on CAD, 2009.
Computer Science and Engineering Loyalty-based Selection: Retrieving Objects That Persistently Satisfy Criteria Presented By: Zhitao Shen Joint work with.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao {
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard.
Graph Indexing From managing and mining graph data.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
Chapter 13: Query Processing
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.
Timetable Problem solving using Graph Coloring
Auburn University
Cohesive Subgraph Computation over Large Graphs
School of Computer Science & Engineering
Outline Introduction State-of-the-art solutions
COMP3017 Advanced Databases
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Database Management System
Parallel Programming By J. H. Wang May 2, 2017.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Computing Full Disjunctions
Lectures on Network Flows
Chapter 12: Query Processing
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
CS223 Advanced Data Structures and Algorithms
Design of Declarative Graph Query Languages: On the Choice between Value, Pattern and Object based Representations for Graphs Hasan Jamil Department of.
ICS 353: Design and Analysis of Algorithms
SAT-Based Area Recovery in Technology Mapping
Advanced Algorithms Analysis and Design
Clustering.
Diversified Top-k Subgraph Querying in a Large Graph
Efficient Subgraph Similarity All-Matching
MCN: A New Semantics Towards Effective XML Keyword Search
Similarity Search: A Matching Based Approach
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
A Framework for Testing Query Transformation Rules
ICS 353: Design and Analysis of Algorithms
Donghui Zhang, Tian Xia Northeastern University
Approximate Graph Mining with Label Costs
Clustering.
Efficient Aggregation over Objects with Extent
Discovering Frequent Poly-Regions in DNA Sequences
Presentation transcript:

On Efficient Graph Substructure Selection School of Computer Science and Engineering Xiang Zhao† H. Shang§ W. Zhang † X. Lin† W. Xiao ‡ † The University of New South Wales, Australia § The University of Tokyo, Japan ‡ National University of Defense Technology, China

Outline Motivation Preliminaries Proposed Method Experiments Conclusion 2

Single-labeled Graphs Motivation Single-labeled Graphs Multi-attributed Graphs 3

Preliminaries A data graph is a multi-attributed graph r = (Vr, Er, lr), where Vr is the vertex set, Er ⊆ Vr×Vr is the edge set, and lr is an attribute function. v ∈ Vr has an attribute set Av, and each attribute a ∈ Av is assigned a value lr(a). Note v may not have a particular attribute a, in which case lr(a) = nil. 4

Example 5

Preliminaries (Cont.) A query graph is denoted by s = (Vs, Es, φs), where Vs is the vertex set, Es ⊆ Vs×Vs is the edge set, and φs is an attribute selection function. Es enforces the connection constraints φs exerts the attribute constraints. v ∈ Vs has an attribute set Av such that a ∈ Av is assigned a selection condition φs(a). 6

Example 7

Preliminaries (Cont.) Given u ∈ Vr, v ∈ Vs, u satisfies v on attribute a, provided lr(a) = φs(a), if φs(a) defines an equality condition; or lr(a) ∈ φs(a), if φs(a) defines a range condition; or lr(a) ⊆ φs(a), if φs (a) defines a set containment condition; or arbitrary value in the domain, if φs (a) is a wildcard. u matches v, if u’s attribute values satisfies v’s corresponding condition conjunctively, for all v’s attribute constraints. 8

Problem Statement Given a data graph r and a query graph s, a substructure mapping is an injection f : Vs → Vr such that ∀v ∈ Vs, f(v) ∈ Vr, f(v) matches v; and ∀(u, v) ∈ Es, (f(u), f(v)) ∈ Er. Given a set of data graph as database R and a query graph s, the problem of substructure selection finds all substructure mappings from the query graph s to each data graph r in R. 9

Example 10

Basic Algorithmic Framework Backtrack algorithm following DFS fashion Order the query vertices into a sequence Try to match query vertices iteratively Test attribute and connection constraints Success, match the next vertex in succession Failure, backtrack and map the previous query vertex to another data vertex Terminates when all full mappings are found 11

Example 12

Advanced Techniques Performance issue: Proposed optimization: Not leverage any index Naïve vertex-a-time extension Proposed optimization: Two-tier index and construction algorithm Cost-based query processing algorithm 13

Substructure Selection Index Template graphs SS-index Mappings 14

Primitive Operations Index retrieval Graph scan Attribute validation Connection validation Mapping extension Mapping join Given a substructure selection problem, finding the query execution plan with minimum cost is NP-hard. 15

Practical Plan Generation Guidelines: A proper execution order reduces the overall processing cost Joining overlapped mapping reduces intermediate results Eager constraints validation reduces intermediate results Early validation of selective constraints reduces intermediate results A cost-based heuristic algorithm that iteratively selects the template graph with minimum cost till all vertices are covered 16

Experimental Results on AIDS Tree patterns provides the best balance between index size and runtime performance As expected, SS-index builds index faster than GADDI, but slower than QuickSI 17

Experimental Results (cont.) SS-search effectively reduces the query response time when query becomes larger, due to the effect of SS-index 18

Experimental Results (cont.) SS-search performs the best when Varying the percentage of vertices with attribute constraints Varying the average range of selection conditions 19

Experimental Results (cont.) SS-search scales well with large database, better than alternatives SS-index can be as scalable as QuickSI if use only a portion of the database for mining frequent substructures 20

Conclusion A new type of query substructure selection that handles general structure search on multi-attributed graphs A novel structure SS-index to speed up the online computation via judiciously materializing partial embeddings SS-search algorithm that dynamically composes effective query execution plan for efficient mapping discovery 21

Thank you! Questions? 22

Related Work Many indices for subgraph search, e.g., gIndex[SIGMOD 2004], FG-index[SIGMOD 2007], CT-index[ICDE 2011], CP-index[CIKM 2011] Containment search or all-matching computation Single-labeled graphs DELTA[CIKM 2012] is the only work that deals with multi-labeled Transform into a multi-dimensional indexing problem Fixed attribute set 23