Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University

Similar presentations


Presentation on theme: "1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University"— Presentation transcript:

1 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

2 2 What are the BGLs?  A collection of libraries for computation on graphs/networks. Graph data structures Graph algorithms Graph input/output  Common design Flexibility/customizability throughout Obsessed with performance Common interfaces throughout the collection  All open source, freely available online Intro

3 3 The BGL Family  The Original (sequential) BGL  BGL-Python  The Parallel BGL  Parallel BGL-Python Intro

4 4 The Original BGL  The largest and most mature BGL ~7 years of research and development Many users, contributors outside of the OSL Steadily evolving  Written in C++ Generic Highly customizable Efficient (both storage and execution) IntroBGL

5 5 BGL: Graph Data Structures  Graphs: adjacency_list : highly configurable with user-specified containers for vertices and edges adjacency_matrix compressed_sparse_row  Adaptors: subgraphs, filtered graphs, reverse graphs LEDA and Stanford GraphBase  Or, use your own… IntroBGL

6 6 Original BGL: Algorithms  Searches (breadth-first, depth-first, A*)  Single-source shortest paths (Dijkstra, Bellman- Ford, DAG)  All-pairs shortest paths (Johnson, Floyd-Warshall)  Minimum spanning tree (Kruskal, Prim)  Components (connected, strongly connected, biconnected)  Maximum cardinality matching  Max-flow (Edmonds-Karp, push-relabel)  Sparse matrix ordering (Cuthill-McKee, King, Sloan, minimum degree)  Layout (Kamada-Kawai, Fruchterman-Reingold, Gursoy-Atun)  Betweenness centrality  PageRank  Isomorphism  Vertex coloring  Transitive closure  Dominator tree IntroBGL

7 7 Task: Biconnected Components Input GraphOutput Graph Articulation points: B G A IntroBGL

8 8 Define a Graph Type  Determine vertex/edge properties: struct Vertex { string name; }; struct Edge { int bicomponent; };  Determine the graph type: typedef adjacency_list Graph; IntroBGL

9 9 Read in a GraphViz DOT File  Build an empty graph: Graph g;  Map vertex properties: dynamic_properties dyn; dyn.property(“node_id”, get(&Vertex::name, g));  Read in the GraphViz graph: ifstream in(“biconnected_components.dot”); read_graphviz(in, g, dyn); IntroBGL

10 10 Run Biconnected Components  Keep track of the articulation points: vector art_points;  Compute biconnected components: biconnected_components (g, get(&Edge::bicomponent, g), back_inserter(art_points)); IntroBGL

11 11 Output results  Attach bicomponent number to the “label” property of edges: dyn.property(“label”, get(&Edge::bicomponent, g));  Write results to another GraphViz file: ofstream out(“bc_out.dot”); write_graphviz(out, g, dyn);  Show articulation points: cout << “Articulation points: “; for (int i = 0;i < art_points.size(); ++i) { cout << g[art_points[i]].name << ‘ ‘; } IntroBGL

12 12 Task: Biconnected Components Input GraphOutput Graph Articulation points: B G A IntroBGL

13 13 Original BGL Summary  The original BGL is large, stable, efficient Lots of algorithms, graph types Peer-reviewed code with many users, nightly regression testing, etc. Performance comparable to FORTRAN.  Who should use the BGL? Programmers comfortable with C++ Users with graph sizes from tens of vertices to millions of vertices IntroBGL

14 14 BGL-Python  Python is ideal for rapid prototyping: It’s a scripting language (no compiler) Dynamically typed means less typing for you Easy to use: you already know Python…  BGL-Python provides access to the BGL from within Python Similar interfaces to C++ BGL Easier to learn than C++ Great for scripting, GUI applications help(bgl.dijkstra_shortest_paths) IntroBGLPython

15 15 Example: Biconnected Components import boost.graph as bgl # Pull in the BGL bindings g = bgl.Graph.read_graphviz("biconnected_components.dot") # Compute biconnected components and articulation points bicomponent = g.edge_property_map(‘int’) art_points = bgl.biconnected_components(g, bicomponent); # Save results with bicomponent numbers as edge labels g.edge_properties[‘label’] = bicomponent g.write_graphviz("biconnected_components_out.dot") print "Articulation points: ", node_id = g.vertex_properties[‘node_id’] for v in art_points: print node_id[v],’ ’, print "" IntroBGLPython

16 16 Wrapping the BGL in Python  BGL-Python is not a… “port” reimplementation  BGL-Python wraps the C++ BGL Python calls translate to C++ calls C++ can call back into Python  Most of the speed of C++  Most of the flexibility of Python

17 17 Performance: Shortest Paths IntroBGLPython

18 18 BGL-Python Summary  BGL-Python is all about tradeoffs: More gradual learning curve Faster time-to-solution Lower performance  Our typical approach: 1. Prototype in Python to get your ideas down 2. Port to C++ when performance matters IntroBGLPython

19 19

20 20 The Parallel BGL  A version of the C++ BGL for computational clusters Distributed memory for huge graphs Parallel processing for improved performance  An active research project  Closely related to the original BGL Parallelizing BGL programs should be “easy” IntroBGLParallelPython

21 21 Parallel BGL: Distributed Graphs A simple, directed graph … distributed across 3 processors. IntroBGLParallelPython

22 22 Parallel Graph Algorithms  Breadth-first search  Eager Dijkstra’s single-source shortest paths  Crauser et al. single- source shortest paths  Depth-first search  Minimum spanning tree (Boruvka, Dehne & Götz)  Connected components  Strongly connected components  Biconnected components  PageRank  Graph coloring  Fruchterman-Reingold layout  Max-flow (Dinic’s) IntroBGLParallelPython

23 23 Performance: Sparse graphs

24 24 Scalability (~547k vertices/node) Up to 70M Vertices 1B Edges Small-World Graph

25 25 Performance vs. CGMgraph 96k vertices 10M edges Erdos-Renyi 17x 30x IntroBGLParallelPython

26 26 Parallel BGL Summary  The Parallel BGL is built for huge graphs Millions to hundreds of millions of nodes Distributed-memory parallel processing on clusters Future work will permit larger graphs…  Parallel programming has a learning curve Parallel graph algorithms much harder to write Distributed graph manipulation can be tricky  Parallel BGL is an active research library IntroBGLParallelPython

27 27 Distributed Graph Layout IntroBGLParallelPython

28 28 Parallel BGL in Python  Preliminary support for the Parallel BGL in Python Just import boost.graph.distributed Similar interface to sequential BGL-Python  Several options for usage with MPI: Straight MPI: mpirun -np 2 python script.py pyMPI: allows interactive use of the interpreter  Initially used to prototype our distributed Fruchterman-Reingold implementation. IntroBGLParallelPython

29 29 Porting for Performance IntroBGLParallelPythonPorting

30 30 Which BGL is Right for You?  Is any BGL right for you?  Depends on how large your networks are: Up to 1/2 million vertices, any BGL will do C++ BGL can push to a couple million vertices For tens of millions or larger, Parallel BGL only  Other considerations: You can prototype in Python, port to C++ Algorithm authors might prefer the original BGL Parallelism is very hard to manage IntroBGLParallelPythonPorting

31 31 Conclusion  The Boost Graph Library family is a collection of full-featured graph libraries All are flexible, customizable, efficient Easy to port from Python to C++ Can port from sequential to parallel Always growing, improving  Is one of the BGLs right for you? A typical “build or buy” decision IntroBGLParallelPythonPortingConclusion

32 32 For More Information…  (Original) Boost Graph Library http://www.boost.org/libs/graph/doc http://www.boost.org/libs/graph/doc  Parallel Boost Graph Library http://www.osl.iu.edu/research/pbgl http://www.osl.iu.edu/research/pbgl  Python Bindings for (Parallel) BGL http://www.osl.iu.edu/~dgregor/bgl-python http://www.osl.iu.edu/~dgregor/bgl-python  Contact us! Douglas Gregor dgregor@osl.iu.edu Andrew Lumsdaine lums@osl.iu.edu IntroBGLParallelPythonPortingConclusion

33 33 Other BGL Variants  QuickGraph (C#) http://www.codeproject.com/cs/miscctrl/quickgraph.asp http://www.codeproject.com/cs/miscctrl/quickgraph.asp  Ruby Graph Library http://rubyforge.org/projects/rgl/ http://rubyforge.org/projects/rgl/  Rooster Graph (Scheme) http://savannah.nongnu.org/projects/rgraph/ http://savannah.nongnu.org/projects/rgraph/  RBGL (an R interface to the C++ BGL) http://www.bioconductor.org/packages/bioc/1.8/html/RBGL. html http://www.bioconductor.org/packages/bioc/1.8/html/RBGL. html  Disclaimer: These are all separate projects. We do not maintain them. IntroBGLParallelPythonPorting

34 34 Comparative Performance IntroBGL


Download ppt "1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University"

Similar presentations


Ads by Google