Lu Qin Center of Quantum Computation and Intelligent Systems, University of Technology, Australia Jeffery Xu Yu The Chinese University of Hong Kong, China Lijun Chang The University of New South Wales, Australia Hong Cheng The Chinese University of Hong Kong, China Xuemin Lin The University of New South Wales, Australia East China Normal University, China Mahmoud Agbareya, 13 January 2015
Agenda Introduction The Scalable Graph Processing Class ( SGC ) SGC Algorithms Performance Studies 2
Agenda Introduction The Scalable Graph Processing Class ( SGC ) SGC Algorithms Performance Studies 3
Introduction What is MapRecuce? MapReduce Class ( MRC ) Minimal MapReduce Class ( MMC ) 4
Introduction What is MapRecuce? MapReduce Class ( MRC ) Minimal MapReduce Class ( MMC ) 5
What is MapReduce Programming model for processing large data sets in distributed systems. Process the data as (key, value) pairs. May executes in rounds Each round has three phases: map, shuffle and reduce. Each round runs in many machines – each machine is dedicated for one task (map or reduce) Introduced by two researchers from Google in Most popular implementation is Hadoop. 6
What is MapReduce (cont.) Example 7
Introduction What is MapRecuce? MapReduce Class ( MRC ) Minimal MapReduce Class ( MMC ) 8
MapReduce Class ( MRC ) Definition 9
10 MapReduce Class ( MRC ) (Graph version) Definition
Introduction What is MapRecuce? MapReduce Class ( MRC ) Minimal MapReduce Class ( MMC ) 11
Minimal MapReduce Class ( MMC ) Definition 12
13 Minimal MapReduce Class ( MMC ) Definition
Agenda Introduction The Scalable Graph Processing Class ( SGC ) SGC Algorithms Performance Studies 14
The Scalable Graph Processing Class ( SGC ) Motivation Preliminaries SGC Definition Two graph operators in SGC : NE Join EN Join 15
The Scalable Graph Processing Class ( SGC ) Motivation Preliminaries SGC Definition Two graph operators in SGC : NE Join EN Join 16
Motivation 17
Motivation We aim to define a MapReduce class in which, graph algorithm has the following three properties: Scalability: The algorithm can always be speeded up by adding more machines. Stability: The algorithms stops in bounded number of rounds. Robustness: The algorithm never fails regardless of how much memory each machine has. 18
The Scalable Graph Processing Class ( SGC ) Motivation Preliminaries SGC Definition Two graph operators in SGC : NE Join EN Join 19
Preliminaries 20
Preliminaries 21
Preliminaries 22
The Scalable Graph Processing Class ( SGC ) Motivation Preliminaries SGC Definition Two graph operators in SGC : NE Join EN Join 23
24 Scalable Graph Processing Class ( SGC ) definition
25 Scalable Graph Processing Class ( SGC ) definition
The Scalable Graph Processing Class ( SGC ) Motivation Preliminaries SGC Definition Two graph operators in SGC : NE Join EN Join 26
Two graph operators in SGC 27
The Scalable Graph Processing Class ( SGC ) Motivation Preliminaries SGC Definition Two graph operators in SGC : NE Join EN Join 28
NE Join 29
NE Join 30
NE Join in MapReduce 31
NE Join in MapReduce 32
The Scalable Graph Processing Class ( SGC ) Motivation Preliminaries SGC Definition Two graph operators in SGC : NE Join EN Join 33
EN Join 34
EN Join 35
EN Join 36
EN Join in MapReduce 37
Agenda Introduction The Scalable Graph Processing Class ( SGC ) SGC Algorithms Performance Studies 38
SGC Algorithms Basic Graph Algorithms: Breadth First Search Page Rank Graph Keyword Search Advanced Algorithms: Connected Component Minimum Spanning Forest 39
SGC Algorithms Basic Graph Algorithms: Breadth First Search Page Rank Graph Keyword Search Advanced Algorithms: Connected Component Minimum Spanning Forest 40
Breadth First Search 41
SGC Algorithms Basic Graph Algorithms: Breadth First Search Page Rank Graph Keyword Search Advanced Algorithms: Connected Component Minimum Spanning Forest 42
Page Rank 43
Page Rank 44
SGC Algorithms Basic Graph Algorithms: Breadth First Search Page Rank Graph Keyword Search Advanced Algorithms: Connected Component Minimum Spanning Forest 45
Graph Keyword Search 46
Graph Keyword Search 47
SGC Algorithms Basic Graph Algorithms: Breadth First Search Page Rank Graph Keyword Search Advanced Algorithms: Connected Component Minimum Spanning Forest 48
49 Forest Initializing Conditional Star Hooking Unconditional Star Hooking Pointer Jumping Star Detection Procedure
Connected Component 50 Forest Initializing: Line 1: find the minimum neighbor for each node and set it to be the parent.
Connected Component 51 Forest Initializing:
Connected Component 52
Connected Component 53 Forest Initializing:
Connected Component 54 Star Detection: Rules to detect that node is not in star (applied in order)
Connected Component 55
Connected Component 56
Connected Component 57
Connected Component 58
Connected Component 59 Conditional Star Hooking (inside the loop):
Connected Component 60 Conditional Star Hooking (inside the loop): After Conditional Star Hooking, it’s guaranteed that there are no edges between two starts.
Connected Component 61
Connected Component 62 Unconditional Star Hooking (inside the loop):
Connected Component 63
Connected Component 64 Pointer Jumping (inside the loop):
Connected Component 65
SGC Algorithms Basic Graph Algorithms: Breadth First Search Page Rank Graph Keyword Search Advanced Algorithms: Connected Component Minimum Spanning Forest 66
Minimum Spanning Forest 67
Minimum Spanning Forest 68
69 Forest Initializing Cycle Breaking Edge Hooking Pointer Jumping
Minimum Spanning Forest 70
Minimum Spanning Forest Forest Initialization 71
Minimum Spanning Forest 72
Minimum Spanning Forest Cycle Breaking 73
Minimum Spanning Forest Pointer Jumping 74
Minimum Spanning Forest 75
Minimum Spanning Forest 76
Minimum Spanning Forest 77
Minimum Spanning Forest 78
Minimum Spanning Forest Edge Hooking 79
Minimum Spanning Forest 80
Agenda Introduction The Scalable Graph Processing Class ( SGC ) SGC Algorithms Performance Studies 81
82
83
84
85
86
87