Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yinghui Wu, SIGMOD 2011 1 Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.

Similar presentations


Presentation on theme: "Yinghui Wu, SIGMOD 2011 1 Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute."— Presentation transcript:

1 Yinghui Wu, SIGMOD 2011 1 Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute of Technology Zijing Tan Fudan University

2 Yinghui Wu SIGMOD 2011 Outline Graph pattern matching in real-life scenario graph pattern matching is expensive Real life graphs are changing over time Incremental graph pattern matching Simulation, bounded simulation and subgraph isomorphism Incrementally computes changes to the match results Incremental simulation Incremental bounded simulation Incremental subgraph isomorphism Conclusion 2 Incremental solutions based on (extended) graph pattern matching

3 Yinghui Wu SIGMOD 2011 Real Life Graph Pattern Matching Given a pattern M(Gp, G) graph (a query) Gp and a data graph G, to find the set of matches in G for Gp usually in terms of … subgraph isomorphism (proximity search, biology and chemistry network querying, object identification ) graph simulation (social querying, program verification) bounded simulation (social matching, semantic network) 3 A routine process in real life applications How to define?

4 Yinghui Wu SIGMOD 2011 Example: querying FriendFeed 4 Subgraph isomorphism, simulation and bounded simulation Ann, CTO Pat, DB Dan, DB Bill, Bio Mat, Bio Don, CTO Ross, Med Tom, Bio P Ann, CTO Pat, DB Bill, Bio subgraph isomorphism edge-edge bijection P * 1 2 1 Ann, CTO Pat, DB Dan, DB Bill, BioMat, Bio (bounded) simulation edge-path relation

5 Yinghui Wu SIGMOD 2011 Batch algorithm vs. Incremental algorithm Graph pattern matching is expensive! NP-complete for subgraph isomorphism cubic-time for bounded simulation quadratic-time for simulation Incremental graph pattern matching Computes new matches from old matches! G ⊕ ∆G P GM(Gp,G) ∆G M(Gp,G) ⊕ ∆M P ∆M How to measure complexity? Typically small (5%/week in Web graphs)

6 Yinghui Wu SIGMOD 2011 Complexity of incremental algorithms Result graphs Union of isomorphic subgraphs for subgraph isomorphism A graph Gr = (Vr, Er) for (bounded) simulation  Vr : the nodes in G matching pattern nodes in Gp  Er: the paths in G matching edges in Gp Affected Area ( AFF) the difference between Gr and Gr’, the result graph of Gp in G and G ⊕ ∆G, respectively. |CHANGED| = |∆G| + |AFF| Optimal, bounded and unbounded problem expressible by f(|CHANGED|)? P Ann, CTO Pat, DB Bill, Bio subgraph isomorphism P * 1 2 1 Ann, CTO Pat, DB Dan, DB Bill, BioMat, Bio (bounded) simulation edge-path relation Measure the complexity with the size of changes

7 Yinghui Wu SIGMOD 2011 Complexity of incremental algorithms (cont) Ann, CTO Pat, DBDan, DB Bill, BioMat, Bio Don, CTOPat, DB Ann, CTO Dan, DB Bill, Bio Mat, Bio Ross, Med Tom, Bio P * 1 2 1 CTO DB Bio Insert e 2 G Gr ∆G Insert e 1 e2e2 Don, CTO Tom, Bio e3e3 e4e4 e5e5 e1e1 Insert e 3 Insert e 4 Insert e 5 affected area

8 Yinghui Wu SIGMOD 2011 Incremental Simulation matching Problem statement Input: Gp, G, Gr, ∆G Output: ∆Gr, the updates to Gr s.t. M sim (G ⊕ ∆G) = M(Gp,G) ⊕ ∆M Complexity unbounded even for unit updates and general patterns bounded for single-edge deletions and general patterns bounded for single-edge insertions and DAG patterns, within optimal time O(|AFF|) In O(|∆G|(|Gp||AFF| + |AFF| 2 )) for batch updates and general patterns Measure the complexity with the size of changes

9 Yinghui Wu SIGMOD 2011 Incremental Simulation matching: optimal results unit deletions and general patterns: Algorithm IncMatch optimal with the size of changes - Ann, CTO Pat, DBDan, DB Bill, BioMat, Bio Don, CTOPat, DB Ann, CTO Dan, DB Bill, Bio Mat, Bio P CTO DB Bio delete e 6 G Graffected area / ∆Gr e6e6 e6e6 1. identify s-s edges 2. find invalid match 3. propagate affected Area and refine matches

10 Yinghui Wu SIGMOD 2011 Incremental Simulation matching: optimal results unit insertion and DAG patterns: Algorithm IncMatch optimal with the size of changes + Ann, CTO Pat, DB Dan, DB Bill, BioMat, Bio Don, CTOPat, DB Ann, CTO Dan, DB Bill, Bio Mat, Bio P CTO DB Bio insert e 7 G Gr candidate 1.identify cs and cc edges 2. find new valid matches 3. propagate affected Area and refine matches e7e7 e7e7 e7e7 Linear time wrt. the size of changes

11 Yinghui Wu SIGMOD 2011 Incremental Simulation matching: optimal results Batch updates: Algorithm IncMatch optimal with the size of changes Ann, CTO Pat, DB Dan, DB Bill, BioMat, Bio Don, CTOPat, DB Ann, CTO Dan, DB Bill, Bio Mat, Bio P CTO DB Bio insert e 7 G Gr candidate 1.identify cs and cc edges 2. find new valid matches 3. propagate affected Area and refine matches e7e7 e7e7 e7e7 Linear time wrt. the size of changes

12 Yinghui Wu SIGMOD 2011 Incremental bounded graph Simulation Problem statement Input: Gp, G, Gr, ∆G Output: ∆Gr, the updates to Gr s.t. M bsim (G ⊕ ∆G) = M(Gp,G) ⊕ ∆M Complexity unbounded even for unit updates and path patterns In O(|∆G|(|AFF|log|AFF| + |Gp||AFF| + |AFF| 2 )) for batch updates and general patterns Measure the complexity with the size of changes

13 Yinghui Wu SIGMOD 2011 Incremental bounded graph simulation Weighted landmark vectors A list of nodes L in a graph G, s.t for each pair (u,v) of nodes in G, there is an node in L on a shortest path from u to v Answering distance query: linear time Weights on landmark: “high quality” : not changed frequently lm 1 lm 2 … lm i … lm k A landmark vector LM 23 … 2…441 … 1…4 Don, CTO Pat, DB Ann, CTO Dan, DB Bill, Bio Mat, Bio Tom, Bio G

14 Yinghui Wu SIGMOD 2011 Incremental bounded graph Simulation Unit updates cc, cs, ss pairs Only the cs / cc pairs (resp. ss) with updated distances satisfying (resp. not satisfying) the bound of a pattern edge may affect the matching result A two-step strategy for incremental bounded simulation Identify all cc, cs, (ss) pairs via a landmark vector find changes ∆M to matches, by treating cc, cs (ss) as insertions of the edges to Gr (deletions from Gr) “reducing” bounded simulation in G to simulation in Gr

15 Yinghui Wu SIGMOD 2011 Incremental bounded Simulation matching unit insertion and general patterns: Algorithm IncBMatch + Gr e2e2 P * 1 2 1 CTO DB Bio Ann, CTO Pat, DBDan, DB Bill, BioMat, Bio Gr Don, CTO Tom, Bio Don, CTO Pat, DB Ann, CTO Dan, DB Mat, Bio Tom, Bio … Pat, DB Ann, CTO … … Step 1: identify cc and cs pairs Step 2: find the changes to match by inserting edge (Don, Tom) in Gr and propagating changes unit deletion is similarly processed as unit insertion

16 Yinghui Wu SIGMOD 2011 Incremental subgraph isomorphism Incremental subgraph isomorphism matching: Input: Gp, G, Gr, ∆G Output: ∆Gr, the updates to Gr s.t. Miso(G ⊕ ∆G) = M iso (Gp,G) ⊕ ∆M Incremental subgraph isomorphism: Input: Gp, G, Gr, ∆G Output: true if there is a subgraph in G ⊕ ∆G that is isomorphi = M iso (Gp,G) ⊕ ∆M Complexity IncIsoMatch is unbounded even for unit updates over DAG graphs for path patterns IncIso is NP-complete even for path pattern and unit update

17 Yinghui Wu SIGMOD 2011 Experimental evaluation 17 Experimental setting Youtube network, with 187K nodes and 1M edges,. We use snapshots each of 18K nodes and 48K edges. Citation network, with 630K nodes and 633K edges. We use snapshots each of 18K nodes and 62K edges. Synthetic data, with randomly generated updates. Pattern generator, controlled by the number of nodes, edges, predicates and bounds on edges. ProblemBatchIncremental IncSimMatch s IncMatch,IncMatch n, HORNSAT IncBSimMatch bs IncBMatch, IncBMatch m IncIsoMatVF2IncIsoMatch,IsoUMatch OptimizationsBatchLM,minDeltaInsLM

18 Yinghui Wu SIGMOD 2011 Experimental results:incremental graph simulation 18 Incremental simulations improve batch algorithms by over 40%-50% Inserting edgesremoving edges 30% - 40%I changes30% - 40% changes

19 Yinghui Wu SIGMOD 2011 Experimental results:incremental graph simulation 19 Incremental simulations improve batch algorithms by over 40%-50% Inserting edges over Youtube Inserting edges over Citation 30% - 40%I changesMore than 50% changes

20 Yinghui Wu SIGMOD 2011 Experimental results: incremental bounded simulation 20 Incremental bounded matching improved batch ones by over 50% - 60% 20% changes Inserting edges over Youtube Inserting edges over Citation

21 Yinghui Wu SIGMOD 2011 Experimental results: incremental subgraph matching, and optimizations 21 Effectiveness of reducing redundant updates and maintaining landmarks

22 Yinghui Wu SIGMOD 2011 Experimental results: incremental subgraph isomorphism 22 IncIsoMatch outperforms VF2 when the changes are no more than 20% Inserting edges

23 Yinghui Wu SIGMOD 2011 Conclusion Incremental solutions for graph pattern matching Incremental graph pattern matching  Incremental simulation  Incremental bounded simulation  Incremental subgraph matching Algorithms for each of these problems 23 Incremental graph pattern matching ProblemComplexityIncremental IncSimUnbounded Bounded for unit deletion/ unit insertion and DAG patterns IncMatch IncBSimUnboundedIncBMatch IncIsoMatUnbounded NP-complete IncIsoMatch,IsoUMatch Measure complexity with size of changes

24 Yinghui Wu SIGMOD 2011 Future work Larger datasets with various applications Optimization techniques from exploring real-life user patterns? Bounded incremental heuristic algorithms for subgraph isomorphism Incremental graph matching over distributed graph data 24 Incremental graph pattern matching

25 Yinghui Wu SIGMOD 2011 25 Thank you! Incremental graph pattern matching

26 Yinghui Wu SIGMOD 2011 Subgraph isomorphism and Graph Simulation Node label equivalence Edge-to-edge function/relation 26 Identical label matching, edge-to-edge function/relations Capable enough? A B D B v1v1v1v1 v2v2v2v2 E G A B DE P P A B DEED BB A G


Download ppt "Yinghui Wu, SIGMOD 2011 1 Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute."

Similar presentations


Ads by Google