Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yinghui Wu, ICDE 2011 1 Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.

Similar presentations


Presentation on theme: "Yinghui Wu, ICDE 2011 1 Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh."— Presentation transcript:

1 Yinghui Wu, ICDE 2011 1 Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh Jianzhong Li Harbin Institute of Technology

2 Yinghui Wu, ICDE 2011 Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Yinghui Wu University of Edinburgh Harbin Institute of Technology Real-life networks are huge and complex. Traditional function-based querying model – capable enough? Reachability Queries and Graph Pattern Queries: novel query model and method for querying large, complex networks “ Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden) Terrorist Collaboration Network 1970 - 2010

3 Yinghui Wu, ICDE 2011 Outline Real-life graphs bear multiple edge types traditional models and methods may not be capable enough Reachability Queries and Graph Pattern Queries nodes carrying predicates edges carrying regular expressions Fundamental problems query containment and equivalence query minimization Query evaluation Join-based and Split-based algorithms Conclusion 3 A first step towards revising simulation for graph pattern matching

4 Yinghui Wu, ICDE 2011 Graph Pattern Matching: the problem Given a pattern graph (a query) P and a data graph G, decide whether G matches P, and if so, find all the matches of P in G. Applications social queries, social matching biology and chemistry network querying key work search, proximity search, … 4 Widely employed in a variety of emerging real life applications How to define?

5 Yinghui Wu, ICDE 2011 Subgraph isomorphism and Graph Simulation Node label equivalence Edge-to-edge function/relation 5 Identical label matching, edge-to-edge function/relations Capable enough? A B D B v1v1v1v1 v2v2v2v2 E G A B DE P P A B DEED BB A G v1v1v1v1 v2v2v2v2

6 Yinghui Wu, ICDE 2011 Considering edge types… 6 Real life graphs have multiple edge types Essembly: a social voting network friends-allies friends-nemeses strangers-nemeses strangers-allies Biologist Businessman Doctors Alice the journalist

7 Yinghui Wu, ICDE 2011 Querying Essembly network: an example 7 Pattern queries with multiple edge types Essembly Network Biologists supporting cloning fa <=2 sn Alice Doctors against cloning fa <=2 sa <=2 fn Pattern fa+ friends-allies friends-nemeses strangers-nemeses strangers-allies …

8 Yinghui Wu, ICDE 2011 Graph reachability and pattern queries Real life graphs usually bear different edge types… data graph G = (V, E, f A, f C ) Reachability query (RQ) : (u 1, u 2, f u1, f u2, f e ) where f e is a subclass of regular expression of:  F ::= c | c ≤k | c + | FF Q r (G): set of node pairs (v 1, v 2 ) that there is a nonempty path from v 1 to v 2, and the edge colors on the path match the pattern specified by f e. 8 Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ fa <=2 fn

9 Yinghui Wu, ICDE 2011 Graph pattern queries 9  graph pattern queries PQ Q p =(V p, E p, f v, f e ) where for each edge e=(u,u’), Q e =(u 1, u 2, f v (u), f v (u’), f e (e)) is an RQ.  Q p (G) is the maximum set (e, S e ) (unique!) for any e 1 (u 1,u 2 ) and e 2 (u 2,u 3 ), if (v 1,v 2 ) is in S e1, then there is a v 3 that (v 2,v 3 ) is in S e2. for any two edges e 1 (u 1,u 2 ) and e 2 (u 1,u 3 ), if (v 1,v 2 ) is in S e1, then there is a v 3 that (v 1,v 3 ) is in S e2  PQ vs. simulation  search condition on query nodes  mapping edges to paths  constrain the edges on the path with a regular expression RQ and simulation are special cases of PQ Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+

10 Yinghui Wu, ICDE 2011 Reachability and graph pattern query: examples 10 fa fn sn sa Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ fa <=2 fn fa fn fa fa fn fa fn fa fa fn Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+ fa fa fa fa sa fn fa sn

11 Yinghui Wu, ICDE 2011 Fundamental problems: query containment  PQ Q 1 (V 1, E 1, f v1, f e1 ) is contained in Q 2 (V 2, E 2, f v2, f e2 ) if there exists a mapping λ from E 1 to E 2 s.t for any data graph G and e in E 1, S e is a subset of S λ(e), i.e., λ is a renaming function that Q 1 (G) is mapped to Q 2 (G).  Query containment and equivalence problems can all be determined in cubic time Query similarity based on a revision of graph simulation Determine the query similarity in cubic time 11 Query containment and equivalence for PQs can be solved efficiently

12 Yinghui Wu, ICDE 2011 Query containment: example 12 B1B1 C1C1 Q1Q1 C3C3 C2C2 h <=1 h <=2 h <=3 B2B2 Q2Q2 C4C4 h <=1 B3B3 C5C5 Q3Q3 C6C6 h <=3 Q 2 is contained in Q 1 and Q 3 Q 1 and Q 3 are equivalent

13 Yinghui Wu, ICDE 2011 Fundamental problems: query minimization size of a query: |Vp| + |Ep| Query minimization problem input: a PQ Q p output: a minimized PQ Q m equivalent to Q p Query minimization problem can be solved in cubic time in the size of the query: compute the maximum node equivalent classes based on a revision of graph simulation; determine the number of redundant nodes and edges based on the equivalent classes; remove redundant and isolated nodes and edges 13 Query minimization for PQs can be solved efficiently

14 Yinghui Wu, ICDE 2011 query minimization: example 14 R B Q1Q1 B C f h <=2 g <=3 g CCC h <=2 g <=3 R B B f g CC h <=2 g <=3 h <=2 g <=3 R B B f g CC h <=2 g <=3 h <=2 Q2Q2 Q3Q3

15 Yinghui Wu, ICDE 2011 Evaluating graph pattern queries 15 PQ can be answered in cubic time. Join-based Algorithm JoinMatch  Matrix index vs distance cache  join operation for each edge in PQ until a fixpoint is reached (wrt. a reversed topological order) Split-based Algorithm SplitMatch  blocks: treating pattern node and data node uniformly  partition-relation pair Graph pattern matching can be solved in polynomial time

16 Yinghui Wu, ICDE 2011 Example of JoinMatch 16 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+ Step 1: identify the candidates for each query node

17 Yinghui Wu, ICDE 2011 Example of JoinMatch 17 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+ Step 2: filter the candidate sets for each query edge

18 Yinghui Wu, ICDE 2011 Example of JoinMatch 18 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+ Step 2: filter the candidate sets for each query edge

19 Yinghui Wu, ICDE 2011 Example of JoinMatch 19 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+ Step 2: filter the candidate sets for each query edge

20 Yinghui Wu, ICDE 2011 Example of JoinMatch 20 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+ Step 3: return the final result

21 Yinghui Wu, ICDE 2011 Experimental results – effectiveness of PQs 21 Effectiveness of PQs: edge to path relations

22 Yinghui Wu, ICDE 2011 Experimental results – querying real life graphs 22 Evaluation algorithms are sensitive to pattern edges Varying |Vp|Varying |Ep| Size of query in average (8,15,3,4,5) for (|V|,|E|,|pred|,|c|,|b|)

23 Yinghui Wu, ICDE 2011 Experimental results – querying real life graphs 23 The algorithms are sensitive to the number of predicates Varying |pred|Varying b

24 Yinghui Wu, ICDE 2011 Experimental results – querying synthetic graphs 24 The algorithms scale well over large synthetic graphs Varying |V| (x10 5 ) Varying b

25 Yinghui Wu, ICDE 2011 Experimental results – querying synthetic graphs 25 The algorithms scale well over large synthetic graphs Varying αVarying cr E=V α |sim(u)|<=V*cr

26 Yinghui Wu, ICDE 2011 Conclusion Simulation revised for graph pattern matching Reachability Queries and Graph Pattern Queries  query containment and minimization – cubic time  query evaluation – cubic time Future work extending RQs and PQs by supporting general regular expressions incremental evaluation of RQs and PQs 26 Simulation revised for graph pattern matching

27 Yinghui Wu, ICDE 2011 27 “ Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden) Terrorist Collaboration Network (1970 - 2010) Thank you! Q&A


Download ppt "Yinghui Wu, ICDE 2011 1 Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh."

Similar presentations


Ads by Google