Graph Data Management Lab, School of Computer Science Add title here: Large graph processing Put conference information here: The 12-th International Conference of Date Engineering Graph Re-partitioning
2 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science Graph Data Management Lab, School of Computer Science Motivation of Problem In a distributed shared memory system for huge graph processing, when loading the graph into memory, there will always be a partition of the vertices because each vertex is hashed to one machine To reduce the cost of cross-machine access, we need to refine the partition of the vertices However, moving a vertex is also costly and can not be ignored
3 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science Graph Data Management Lab, School of Computer Science Related works Pure Partioning of a graph is NP-hard
4 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science Graph Data Management Lab, School of Computer Science Problem definition Input: Given a Graph G=(V,E), a positive integer m and a initial many-to-one reflection R': V -> M, M={1,2,...,m}. Problem: Find a many-to-one reflection R such that: (1)The number of cross-volume edges is minimized: minimizing where (2)The number of moved vertices is minimized: minimizing where
5 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science Graph Data Management Lab, School of Computer Science Baseline Solution A simple greedy algorithm: Consider a vertex in machine M1, its neighbor may be distributed over different machines. If most of its neighbor is not in M1, this vertex needs to be moved.
6 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science Graph Data Management Lab, School of Computer Science Algorithm Framework 1: For each machine M: 2: For each vertex v in M: 3: Find the machine Mv, which is the machine with the largest number of neighbors of vertex v 4: If Mv != M 5: add (Mv,v) to the moving buffer 6: end if 7: end for 8: end for 9: Sort the (Mv,v) pairs
7 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science Graph Data Management Lab, School of Computer Science Algorithm Framework Cont'd 10: Construct a graph T. Each vertex of T represents a machine, weighted arc represents the number of vertex should be moved from P1 to P2 11: while we can find a cycle from T extracting the minimal weight of the edges 12: remove edges of the cycles from T 13: end while Maybe the whole algorithm can be executed multiple times to get the best solution?
8 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science Graph Data Management Lab, School of Computer Science Elaboration In the above framework, step suggests an interesting problem: Given an edge weighted (now only integer weight) directed graph G, let G1,G2,...Gk be a subgraph sequence of G, such that (1) each Gi is a cycle of G and all arcs of Gi have the same weight; (2)G=G1\cup...Gk ; (3)Gk is either empty or no cycle can be found; For simplicity, we refer to this subgarph sequence as a cycle decomposition of G. Target: how to construct a cycle decomposition such that the sum of edge weight in Gk is minimized?
9 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science Graph Data Management Lab, School of Computer Science Solution - Maximum flow Consider the graph G’=(V’,E’), where: (1) for each vertex v in V, there’s two corresponding vertices v_in and v_out in V’, and an arc v_in -> v_out in E’,whose capacity is +infinite. (2) for each arc u->v: there’s an arc u_out -> v_in in E’,whose capacity is the weight of the original arc u -> v. (3) There’s two new vertices called source and sink in V’. (4) for each vertex v_out in V’, there’s an arc source -> v_out, whose initial capacity is 0. (5) for each vertex v_in in V’, there’s an arc v_in -> sink, whose initial capacity is 0.
10 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science Graph Data Management Lab, School of Computer Science Solution Cont'd For each vertex v in the original graph (the order of them can be arbitrarily chosen): (1) Modify the capacity of source -> v_out and v_in -> sink to infinite. (2) Re-calculate the maximum flow from source to sink. At last, the flow at each edge u_out -> v_in is the answer to the problem.
11 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science Graph Data Management Lab, School of Computer Science Future work To check if this algorithm really "work", i.e. if this can reduce the number of cross-volume edges by 10%?20%?Or less? Will this algorithm terminate in a few number of (iterating) steps? Or goes into a loop? Other algorithms?