Download presentation
Presentation is loading. Please wait.
Published byBrice Ray Modified over 9 years ago
1
Graph Data Management Lab, School of Computer Science GDM@FUDAN Add title here: Large graph processing www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Put conference information here: The 12-th International Conference of Date Engineering Graph Re-partitioning
2
2 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Motivation of Problem In a distributed shared memory system for huge graph processing, when loading the graph into memory, there will always be a partition of the vertices because each vertex is hashed to one machine To reduce the cost of cross-machine access, we need to refine the partition of the vertices However, moving a vertex is also costly and can not be ignored
3
3 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Related works Pure Partioning of a graph is NP-hard
4
4 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Problem definition Input: Given a Graph G=(V,E), a positive integer m and a initial many-to-one reflection R': V -> M, M={1,2,...,m}. Problem: Find a many-to-one reflection R such that: (1)The number of cross-volume edges is minimized: minimizing where (2)The number of moved vertices is minimized: minimizing where
5
5 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Baseline Solution A simple greedy algorithm: Consider a vertex in machine M1, its neighbor may be distributed over different machines. If most of its neighbor is not in M1, this vertex needs to be moved.
6
6 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Algorithm Framework 1: For each machine M: 2: For each vertex v in M: 3: Find the machine Mv, which is the machine with the largest number of neighbors of vertex v 4: If Mv != M 5: add (Mv,v) to the moving buffer 6: end if 7: end for 8: end for 9: Sort the (Mv,v) pairs
7
7 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Algorithm Framework Cont'd 10: Construct a graph T. Each vertex of T represents a machine, weighted arc represents the number of vertex should be moved from P1 to P2 11: while we can find a cycle from T extracting the minimal weight of the edges 12: remove edges of the cycles from T 13: end while Maybe the whole algorithm can be executed multiple times to get the best solution?
8
8 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Elaboration In the above framework, step 11 - 13 suggests an interesting problem: Given an edge weighted (now only integer weight) directed graph G, let G1,G2,...Gk be a subgraph sequence of G, such that (1) each Gi is a cycle of G and all arcs of Gi have the same weight; (2)G=G1\cup...Gk ; (3)Gk is either empty or no cycle can be found; For simplicity, we refer to this subgarph sequence as a cycle decomposition of G. Target: how to construct a cycle decomposition such that the sum of edge weight in Gk is minimized?
9
9 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Solution - Maximum flow Consider the graph G’=(V’,E’), where: (1) for each vertex v in V, there’s two corresponding vertices v_in and v_out in V’, and an arc v_in -> v_out in E’,whose capacity is +infinite. (2) for each arc u->v: there’s an arc u_out -> v_in in E’,whose capacity is the weight of the original arc u -> v. (3) There’s two new vertices called source and sink in V’. (4) for each vertex v_out in V’, there’s an arc source -> v_out, whose initial capacity is 0. (5) for each vertex v_in in V’, there’s an arc v_in -> sink, whose initial capacity is 0.
10
10 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Solution Cont'd For each vertex v in the original graph (the order of them can be arbitrarily chosen): (1) Modify the capacity of source -> v_out and v_in -> sink to infinite. (2) Re-calculate the maximum flow from source to sink. At last, the flow at each edge u_out -> v_in is the answer to the problem.
11
11 Add title here: Large graph processing The 12-th International Conference of Date Engineering Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Graph Data Management Lab, School of Computer Science GDM@FUDAN www.gdm.fudan.edu.cn Email: shawyh@fudan.edu.cn Future work To check if this algorithm really "work", i.e. if this can reduce the number of cross-volume edges by 10%?20%?Or less? Will this algorithm terminate in a few number of (iterating) steps? Or goes into a loop? Other algorithms?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.