Yinghui Wu, SIGMOD 2011 1 Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.

Slides:



Advertisements
Similar presentations
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Advertisements

Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
Incremental Maintenance for Materialized Views over Semistructured Data Written By: Serge Abiteboul Jason McHuge Michael Rys Vasilis Vassalos Janet L.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
PROXY FOR CONNECTIVITY We consider the k shortest edge disjoint paths between a pair of nodes and define a hyperlink, whose ‘connectivity’ is defined as:
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Witness and Counterexample Li Tan Oct. 15, 2002.
1 QSX: Querying Social Graphs Graph Pattern Matching Graph pattern matching via subgraph isomorphism Graph pattern matching via graph simulation Revisions.
Yinghui Wu LFCS Lab Lunch Homomorphism and Simulation Revised for Graph Matching.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute.
1 QSX: Querying Social Graphs Querying big graphs Parallel query processing Boundedly evaluable queries Query-preserving graph compression Query answering.
Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Virtual Network Mapping: A Graph Pattern Matching Approach Yang Cao 1,2, Wenfei Fan 1,2, Shuai Ma University of Edinburgh 2 Beihang University.
Efficient Gathering of Correlated Data in Sensor Networks
1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren,
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Association Rules with Graph Patterns Yinghui Wu Washington State University Wenfei Fan Jingbo Xu University of Edinburgh Southwest Jiaotong University.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
Evolving RBF Networks via GP for Estimating Fitness Values using Surrogate Models Ahmed Kattan Edgar Galvan.
CPT-S Topics in Computer Science Big Data 1 1 Yinghui Wu EME 49.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
Zaiben Chen et al. Presented by Lian Liu. You’re traveling from s to t. Which gas station would you choose?
Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
CPT-S Topics in Computer Science Big Data 1 Yinghui Wu EME 49.
1 Substructure Similarity Search in Graph Databases R 陳芃安.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Gspan: Graph-based Substructure Pattern Mining
Certifying Algorithms [MNS11]R.M. McConnell, K. Mehlhorn, S. Näher, P. Schweitzer. Certifying algorithms. Computer Science Review, 5(2), , 2011.
Outline Introduction State-of-the-art solutions
Answering pattern queries using views
CPT-S 415 Big Data Yinghui Wu EME B45.
RE-Tree: An Efficient Index Structure for Regular Expressions
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Probabilistic Data Management
CPT-S Topics in Computer Science Big Data
Associative Query Answering via Query Feature Similarity
DATA CACHING IN WSN Mario A. Nascimento Univ. of Alberta, Canada
Simulation based approach Shang Zechao
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
G-CORE: A Core for Future Graph Query Languages
Efficient Subgraph Similarity All-Matching
Incremental Maintenance of XML Structural Indexes
5.4 T-joins and Postman Problems
Graph Homomorphism Revisited for Graph Matching
Resource Allocation for Distributed Streaming Applications
Presentation transcript:

Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute of Technology Zijing Tan Fudan University

Yinghui Wu SIGMOD 2011 Outline Graph pattern matching in real-life scenario graph pattern matching is expensive Real life graphs are changing over time Incremental graph pattern matching Simulation, bounded simulation and subgraph isomorphism Incrementally computes changes to the match results Incremental simulation Incremental bounded simulation Incremental subgraph isomorphism Conclusion 2 Incremental solutions based on (extended) graph pattern matching

Yinghui Wu SIGMOD 2011 Real Life Graph Pattern Matching Given a pattern M(Gp, G) graph (a query) Gp and a data graph G, to find the set of matches in G for Gp usually in terms of … subgraph isomorphism (proximity search, biology and chemistry network querying, object identification ) graph simulation (social querying, program verification) bounded simulation (social matching, semantic network) 3 A routine process in real life applications How to define?

Yinghui Wu SIGMOD 2011 Example: querying FriendFeed 4 Subgraph isomorphism, simulation and bounded simulation Ann, CTO Pat, DB Dan, DB Bill, Bio Mat, Bio Don, CTO Ross, Med Tom, Bio P Ann, CTO Pat, DB Bill, Bio subgraph isomorphism edge-edge bijection P * Ann, CTO Pat, DB Dan, DB Bill, BioMat, Bio (bounded) simulation edge-path relation

Yinghui Wu SIGMOD 2011 Batch algorithm vs. Incremental algorithm Graph pattern matching is expensive! NP-complete for subgraph isomorphism cubic-time for bounded simulation quadratic-time for simulation Incremental graph pattern matching Computes new matches from old matches! G ⊕ ∆G P GM(Gp,G) ∆G M(Gp,G) ⊕ ∆M P ∆M How to measure complexity? Typically small (5%/week in Web graphs)

Yinghui Wu SIGMOD 2011 Complexity of incremental algorithms Result graphs Union of isomorphic subgraphs for subgraph isomorphism A graph Gr = (Vr, Er) for (bounded) simulation  Vr : the nodes in G matching pattern nodes in Gp  Er: the paths in G matching edges in Gp Affected Area ( AFF) the difference between Gr and Gr’, the result graph of Gp in G and G ⊕ ∆G, respectively. |CHANGED| = |∆G| + |AFF| Optimal, bounded and unbounded problem expressible by f(|CHANGED|)? P Ann, CTO Pat, DB Bill, Bio subgraph isomorphism P * Ann, CTO Pat, DB Dan, DB Bill, BioMat, Bio (bounded) simulation edge-path relation Measure the complexity with the size of changes

Yinghui Wu SIGMOD 2011 Complexity of incremental algorithms (cont) Ann, CTO Pat, DBDan, DB Bill, BioMat, Bio Don, CTOPat, DB Ann, CTO Dan, DB Bill, Bio Mat, Bio Ross, Med Tom, Bio P * CTO DB Bio Insert e 2 G Gr ∆G Insert e 1 e2e2 Don, CTO Tom, Bio e3e3 e4e4 e5e5 e1e1 Insert e 3 Insert e 4 Insert e 5 affected area

Yinghui Wu SIGMOD 2011 Incremental Simulation matching Problem statement Input: Gp, G, Gr, ∆G Output: ∆Gr, the updates to Gr s.t. M sim (G ⊕ ∆G) = M(Gp,G) ⊕ ∆M Complexity unbounded even for unit updates and general patterns bounded for single-edge deletions and general patterns bounded for single-edge insertions and DAG patterns, within optimal time O(|AFF|) In O(|∆G|(|Gp||AFF| + |AFF| 2 )) for batch updates and general patterns Measure the complexity with the size of changes

Yinghui Wu SIGMOD 2011 Incremental Simulation matching: optimal results unit deletions and general patterns: Algorithm IncMatch optimal with the size of changes - Ann, CTO Pat, DBDan, DB Bill, BioMat, Bio Don, CTOPat, DB Ann, CTO Dan, DB Bill, Bio Mat, Bio P CTO DB Bio delete e 6 G Graffected area / ∆Gr e6e6 e6e6 1. identify s-s edges 2. find invalid match 3. propagate affected Area and refine matches

Yinghui Wu SIGMOD 2011 Incremental Simulation matching: optimal results unit insertion and DAG patterns: Algorithm IncMatch optimal with the size of changes + Ann, CTO Pat, DB Dan, DB Bill, BioMat, Bio Don, CTOPat, DB Ann, CTO Dan, DB Bill, Bio Mat, Bio P CTO DB Bio insert e 7 G Gr candidate 1.identify cs and cc edges 2. find new valid matches 3. propagate affected Area and refine matches e7e7 e7e7 e7e7 Linear time wrt. the size of changes

Yinghui Wu SIGMOD 2011 Incremental Simulation matching: optimal results Batch updates: Algorithm IncMatch optimal with the size of changes Ann, CTO Pat, DB Dan, DB Bill, BioMat, Bio Don, CTOPat, DB Ann, CTO Dan, DB Bill, Bio Mat, Bio P CTO DB Bio insert e 7 G Gr candidate 1.identify cs and cc edges 2. find new valid matches 3. propagate affected Area and refine matches e7e7 e7e7 e7e7 Linear time wrt. the size of changes

Yinghui Wu SIGMOD 2011 Incremental bounded graph Simulation Problem statement Input: Gp, G, Gr, ∆G Output: ∆Gr, the updates to Gr s.t. M bsim (G ⊕ ∆G) = M(Gp,G) ⊕ ∆M Complexity unbounded even for unit updates and path patterns In O(|∆G|(|AFF|log|AFF| + |Gp||AFF| + |AFF| 2 )) for batch updates and general patterns Measure the complexity with the size of changes

Yinghui Wu SIGMOD 2011 Incremental bounded graph simulation Weighted landmark vectors A list of nodes L in a graph G, s.t for each pair (u,v) of nodes in G, there is an node in L on a shortest path from u to v Answering distance query: linear time Weights on landmark: “high quality” : not changed frequently lm 1 lm 2 … lm i … lm k A landmark vector LM 23 … 2…441 … 1…4 Don, CTO Pat, DB Ann, CTO Dan, DB Bill, Bio Mat, Bio Tom, Bio G

Yinghui Wu SIGMOD 2011 Incremental bounded graph Simulation Unit updates cc, cs, ss pairs Only the cs / cc pairs (resp. ss) with updated distances satisfying (resp. not satisfying) the bound of a pattern edge may affect the matching result A two-step strategy for incremental bounded simulation Identify all cc, cs, (ss) pairs via a landmark vector find changes ∆M to matches, by treating cc, cs (ss) as insertions of the edges to Gr (deletions from Gr) “reducing” bounded simulation in G to simulation in Gr

Yinghui Wu SIGMOD 2011 Incremental bounded Simulation matching unit insertion and general patterns: Algorithm IncBMatch + Gr e2e2 P * CTO DB Bio Ann, CTO Pat, DBDan, DB Bill, BioMat, Bio Gr Don, CTO Tom, Bio Don, CTO Pat, DB Ann, CTO Dan, DB Mat, Bio Tom, Bio … Pat, DB Ann, CTO … … Step 1: identify cc and cs pairs Step 2: find the changes to match by inserting edge (Don, Tom) in Gr and propagating changes unit deletion is similarly processed as unit insertion

Yinghui Wu SIGMOD 2011 Incremental subgraph isomorphism Incremental subgraph isomorphism matching: Input: Gp, G, Gr, ∆G Output: ∆Gr, the updates to Gr s.t. Miso(G ⊕ ∆G) = M iso (Gp,G) ⊕ ∆M Incremental subgraph isomorphism: Input: Gp, G, Gr, ∆G Output: true if there is a subgraph in G ⊕ ∆G that is isomorphi = M iso (Gp,G) ⊕ ∆M Complexity IncIsoMatch is unbounded even for unit updates over DAG graphs for path patterns IncIso is NP-complete even for path pattern and unit update

Yinghui Wu SIGMOD 2011 Experimental evaluation 17 Experimental setting Youtube network, with 187K nodes and 1M edges,. We use snapshots each of 18K nodes and 48K edges. Citation network, with 630K nodes and 633K edges. We use snapshots each of 18K nodes and 62K edges. Synthetic data, with randomly generated updates. Pattern generator, controlled by the number of nodes, edges, predicates and bounds on edges. ProblemBatchIncremental IncSimMatch s IncMatch,IncMatch n, HORNSAT IncBSimMatch bs IncBMatch, IncBMatch m IncIsoMatVF2IncIsoMatch,IsoUMatch OptimizationsBatchLM,minDeltaInsLM

Yinghui Wu SIGMOD 2011 Experimental results:incremental graph simulation 18 Incremental simulations improve batch algorithms by over 40%-50% Inserting edgesremoving edges 30% - 40%I changes30% - 40% changes

Yinghui Wu SIGMOD 2011 Experimental results:incremental graph simulation 19 Incremental simulations improve batch algorithms by over 40%-50% Inserting edges over Youtube Inserting edges over Citation 30% - 40%I changesMore than 50% changes

Yinghui Wu SIGMOD 2011 Experimental results: incremental bounded simulation 20 Incremental bounded matching improved batch ones by over 50% - 60% 20% changes Inserting edges over Youtube Inserting edges over Citation

Yinghui Wu SIGMOD 2011 Experimental results: incremental subgraph matching, and optimizations 21 Effectiveness of reducing redundant updates and maintaining landmarks

Yinghui Wu SIGMOD 2011 Experimental results: incremental subgraph isomorphism 22 IncIsoMatch outperforms VF2 when the changes are no more than 20% Inserting edges

Yinghui Wu SIGMOD 2011 Conclusion Incremental solutions for graph pattern matching Incremental graph pattern matching  Incremental simulation  Incremental bounded simulation  Incremental subgraph matching Algorithms for each of these problems 23 Incremental graph pattern matching ProblemComplexityIncremental IncSimUnbounded Bounded for unit deletion/ unit insertion and DAG patterns IncMatch IncBSimUnboundedIncBMatch IncIsoMatUnbounded NP-complete IncIsoMatch,IsoUMatch Measure complexity with size of changes

Yinghui Wu SIGMOD 2011 Future work Larger datasets with various applications Optimization techniques from exploring real-life user patterns? Bounded incremental heuristic algorithms for subgraph isomorphism Incremental graph matching over distributed graph data 24 Incremental graph pattern matching

Yinghui Wu SIGMOD Thank you! Incremental graph pattern matching

Yinghui Wu SIGMOD 2011 Subgraph isomorphism and Graph Simulation Node label equivalence Edge-to-edge function/relation 26 Identical label matching, edge-to-edge function/relations Capable enough? A B D B v1v1v1v1 v2v2v2v2 E G A B DE P P A B DEED BB A G