Download presentation
Presentation is loading. Please wait.
Published byKayden Goodkind Modified over 9 years ago
1
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department University of Georgia, Athens, GA October 7, 2013 A Distributed Vertex-Centric Approach for Pattern Matching in Massive Graphs
2
Outline 2 Introduction Subgraph Pattern Matching Types of Subgraph Pattern Matching Strict Simulation Distributed Algorithms Experiments
3
Introduction 3 Labeled Directed Graph G(V, E, l) Versatile and expressive Social networks, web search engines, genome sequencing, etc. Data sizes are growing rapidly Facebook: 1 billion+ users, average degree of 140 Twitter: 200 million+ users, 400 million tweets each day Bigger datasets mean bigger graphs Pattern (Query Graph): the interesting small graph Data Graph: the massive graph contains the information Pattern Matching: finding the instances of the pattern in the data graph
4
Subgraph Pattern Matching 4 There are different paradigms for pattern matching: Exact Topological Matching Sub-graph Isomorphism Simulation Matching (meaning of the relations) Graph Simulation Dual Simulation Strong Simulation Strict Simulation SD: Software Developer Bio: Biologist PM: Product Manager John Ann Sara Mary Bob Alice Pattern GraphData Graph Endorsed by
5
Example: Graph Simulation 5 PM SA DB SA DB PM SD AI DB AI DBAI SA Data Graph DB AI PM SA PMSA 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1516 17 181920 21 22 2324 Pattern Graph PM SA DB AI PM: Product Manager SA: Software Analyst DB: Database Designer AI: AI specialist SD: Software Developer Label and Children conditions Quadratic time complexity
6
Example: Dual Simulation 6 PM SA DB SA DB PM SD AI DB AI DBAI SA Pattern Graph Data Graph PM SA DB AI DB AI PM SA PMSA 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1516 17 181920 21 22 2324 PM: Product Manager SA: Software Analyst DB: Database Designer AI: AI specialist SD: Software Developer Adds Parents condition Cubic time complexity
7
Strong Simulation 7 Extends dual simulation with a locality condition A ball G b [ v, r ] is a subgraph of a graph G containing: all vertices V b within an undirected distance r of the vertex v all the edges between the vertices in V b S. Ma, Y. Cao, W. Fan, J. Huai, and T. Wo, “Capturing topology in graph pattern matching,” Proc. VLDB Endow., vol. 5, no. 4, pp. 310–321, Dec. 2011.
8
Example: Strong Simulation 8 PM SA DB SA DB PM SD AI DB AI DBAI SA Data Graph DB AI PM SA PMSA 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1516 17 181920 21 22 2324 Pattern Graph PM SA DB AI PM: Product Manager SA: Software Analyst DB: Database Designer AI: AI specialist SD: Software Developer Adds Locality condition Cubic time complexity
9
Example: Subgraph Isomorphism 9 PM SA DB SA DB PM SD AI DB AI DBAI SA Data Graph DB AI PM SA PMSA 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1516 17 181920 21 22 2324 2 Subgraph results: {4,6,7,8} {5,6,7,8} Pattern Graph PM SA DB AI PM: Product Manager SA: Software Analyst DB: Database Designer AI: AI specialist SD: Software Developer Exact Topological Match NP-complete in general case
10
Strict Simulation 10 Strict Simulation is a smart modification of Strong Simulation Smaller balls Faster computation The same quality of results or better Cubic time complexity
11
Strict vs Strong Simulation 11
12
Example: Strict vs Strong Simulation 12 d q = 2 Strong Simulation: G b [2,2] contains all vertices Strict Simulation: G b [2,2] contains {2,1,10,3,4} Blue and white vertics in the result of Strong Simulation White vertices in the result of Strict Simulation
13
Vertex Centric BSP 13 Bulk Synchronous Parallel Vertex Centric Each vertex of the graph is a computing unit Each vertex initially knows only its id, label, and out going edges Algorithm terminates when every vertex votes to halt. GPS, a free implementation of Pregel, developed in infoLab, Stanford University.
14
Distributed Algorithm for Graph Simulation 14 Initialization: Pattern is received from the Master Set the match flag to true match flag: indicating if the vertex is a candidate member of the result match set matchSet: set of vertices in Q that are candidate match Vertexmatch flag matchSet G1 TrueEmpty G2 TrueEmpty G3 TrueEmpty G4 TrueEmpty G5 TrueEmpty G6 TrueEmpty G7 TrueEmpty G8 TrueEmpty Pattern Graph a bc b c c b e a f d G1 G2 G3 G4 G5 G6 G7 G8 Q3Q2 Q1
15
Distributed Algorithm for Graph Simulation 15 Vertexmatch flag matchSet G1 FalseEmpty G2 TrueQ2 G3 TrueQ1 G4 TrueQ3 G5 TrueQ2 G6 TrueQ3 G7 FalseEmpty G8 FalseEmpty Pattern Graph a bc b c c b e a f d G1 G2 G3 G4 G5 G6 G7 G8 Data Graph 1 st Superstep Q3Q2 Q1
16
Distributed Algorithm for Graph Simulation 16 Vertexmatch flag matchSet G1 FalseEmpty G2 TrueQ2 G3 TrueQ1 G4 TrueQ3 G5 TrueQ2 G6 TrueQ3 G7 FalseEmpty G8 FalseEmpty Pattern Graph a bc b c c b e a f d G1 G2 G3 G4 G5 G6 G7 G8 Data Graph 2 nd Superstep Q3Q2 Q1
17
Distributed Algorithm for Graph Simulation 17 Vertexmatch flag matchSet G1 FalseEmpty G2 TrueQ2 G3 TrueQ1 G4 TrueQ3 G5 FalseEmpty G6 TrueQ3 G7 FalseEmpty G8 FalseEmpty Pattern Graph a bc b c c b e a f d G1 G2 G3 G4 G5 G6 G7 G8 Data Graph 3 rd and 4 th Supersteps Q3Q2 Q1
18
Other Distributed Graph Algorithms 18 Dual simulation In addition to storing the match sets of its children, a vertex also stores the match sets of its parents During match set evaluation, a vertex takes both of these into account Strong simulation Run dual simulation to obtain match relation R d For each matching vertex v in R d : Create a ball centered at v with radius d Q containing any vertex in G Perform dual simulation on the ball Strict simulation Run dual simulation to obtain match relation R d For each matching vertex v in R d : Create a ball centered at v with radius d Q containing only vertices in R d Perform dual simulation on the ball
19
Experiments 19 System Cluster of 12 machines 128 GB RAM Two 2GHz Intel Xeon E-2620 GPS (developed in infoLab, Stanford University) Datasets Synthesized graphs: | V |= 100e6, |E| =| V | α, α = 1.2 Real world graphs uk-2002: | V |= 18e6, |E| =298e6 ljournal: | V |= 5e6, |E| =79e6 Amazon-2008: | V |= 7e5, |E| =5e6 http://law.di.unimi.it/datasets.php
20
Experiment 1: Comparison of Strong and Strict Simulation 20 Synthesized ( |V |= 1e6, α = 1.2) Amazon-2008 ( |V |= 7e5, |V| = 5e6) Amazon-2008 ( |V |= 7e5, |V q | = 20)
21
Experiment 2: Running time 21 uk-2002-hc ( |V |= 18e6, |E|=298e6) ( l = 200 & |Vq| = 20) Synthesized ( |V |= 1e8, α = 1.2, |E| =| V | α ) Quality of result: Graph Simulation < Dual Simulation < Strict Simulation
22
Experiment 3: Impact of dataset 22 Synthesized ( |V |= 100e6) ( l = 200 & |Vq| = 20) Synthesized ( α = 1.2, |E| =| V | α )
23
Experiment 4: Impact of pattern 23 uk-2002-hc ( |V |= 18e6, |E|=298e6) Synthesized ( |V |= 100e6, α = 1.2, |E| =| V | α )
24
Related work 24 Pattern matching models P-homomorphism Graph Simulation, Dual and Strong Bounded simulation Distributed algorithms for pattern matching S. Ma, Y. Cao, J. Huai, and T. Wo, “Distributed graph pattern matching,” WWW 2012. Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li, “Efficient subgraph matching on billion node graphs,” VLDB 2012.
25
25 Introducing a new Pattern Matching model Design, implementation, and test of 4 distributed algorithms for Pattern Matching Graph Simulation Dual Simulation Strong Simulation Strict Simulation Conclusion
26
Questions? 26 Thanks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.