Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph Data Management Lab, School of Computer Science Add title here: Large graph processing

Similar presentations


Presentation on theme: "Graph Data Management Lab, School of Computer Science Add title here: Large graph processing"— Presentation transcript:

1 Graph Data Management Lab, School of Computer Science GDM@FUDAN Add title here: Large graph processing www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Put conference information here: The 12-th International Conference of Date Engineering Graph Re-partitioning

2 2 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Motivation of Problem  In a distributed shared memory system for huge graph processing, when loading the graph into memory, there will always be a partition of the vertices because each vertex is hashed to one machine  To reduce the cost of cross-machine access, we need to refine the partition of the vertices  However, moving a vertex is also costly and can not be ignored

3 3 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Related works  Pure Partioning of a graph is NP-hard

4 4 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Problem definition  Input: Given a Graph G=(V,E), a positive integer m and a initial many-to-one reflection R': V -> M, M={1,2,...,m}.  Problem: Find a many-to-one reflection R such that:  (1)The number of cross-volume edges is minimized:  minimizing where  (2)The number of moved vertices is minimized:  minimizing where

5 5 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Baseline Solution  A simple greedy algorithm:  Consider a vertex in machine M1, its neighbor may be distributed over different machines. If most of its neighbor is not in M1, this vertex needs to be moved.

6 6 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Algorithm Framework  1: For each machine M:  2: For each vertex v in M:  3: Find the machine Mv, which is the machine with the largest number of neighbors of vertex v  4: If Mv != M  5: add (Mv,v) to the moving buffer  6: end if  7: end for  8: end for  9: Sort the (Mv,v) pairs

7 7 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Algorithm Framework Cont'd  10: Construct a graph T. Each vertex of T represents a machine, weighted arc represents the number of vertex should be moved from P1 to P2  11: while we can find a cycle from T extracting the minimal weight of the edges  12: remove edges of the cycles from T  13: end while  Maybe the whole algorithm can be executed multiple times to get the best solution?

8 8 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Elaboration  In the above framework, step 11 - 13 suggests an interesting problem:  Given an edge weighted (now only integer weight) directed graph G, let G1,G2,...Gk be a subgraph sequence of G, such that  (1) each Gi is a cycle of G and all arcs of Gi have the same weight;  (2)G=G1\cup...Gk ;  (3)Gk is either empty or no cycle can be found; For simplicity, we refer to this subgarph sequence as a cycle decomposition of G.  Target: how to construct a cycle decomposition such that the sum of edge weight in Gk is minimized?

9 9 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Solution - Maximum flow  Consider the graph G’=(V’,E’), where:  (1) for each vertex v in V, there’s two corresponding vertices v_in and v_out in V’, and an arc v_in -> v_out in E’,whose capacity is +infinite.  (2) for each arc u->v: there’s an arc u_out -> v_in in E’,whose capacity is the weight of the original arc u -> v.  (3) There’s two new vertices called source and sink in V’.  (4) for each vertex v_out in V’, there’s an arc source -> v_out, whose initial capacity is 0.  (5) for each vertex v_in in V’, there’s an arc v_in -> sink, whose initial capacity is 0.

10 10 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Solution Cont'd  For each vertex v in the original graph (the order of them can be arbitrarily chosen):  (1) Modify the capacity of source -> v_out and v_in -> sink to infinite.  (2) Re-calculate the maximum flow from source to sink.  At last, the flow at each edge u_out -> v_in is the answer to the problem.

11 11 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Future work  To check if this algorithm really "work", i.e. if this can reduce the number of cross-volume edges by 10%?20%?Or less?  Will this algorithm terminate in a few number of (iterating) steps? Or goes into a loop?  Other algorithms?


Download ppt "Graph Data Management Lab, School of Computer Science Add title here: Large graph processing"

Similar presentations


Ads by Google