Mayank Bhatt, Jayasi Mehar Topology-Aware Distributed Graph Processing for Tightly-Coupled Clusters Mayank Bhatt, Jayasi Mehar DPRG: http://dprg.cs.uiuc.edu
Our work explores the problem of graph partitioning, focused on reducing the communication cost on tightly coupled clusters
Why? Experimenting with cloud frameworks on HPC systems Interest in supercomputing as a service More big data jobs running on supercomputers
Tightly-Coupled Clusters Supercomputers Compute nodes embedded inside the network topology Messages routed via compute nodes Communication patterns can influence performance “Hop count” is an approximate measure of cost of communication
Blue Waters Interconnect 3D Torus Subset of nodes returned for running job Static routing - number of hops between two nodes will remain constant
Graph Processing Systems Lot of real world data is expressed in the form of graphs Billion of vertices, trillions of edges, need to distribute Algorithms - ex. Shortest path, PageRank 2 stages - Ingress and Processing
Types of Partitioning System of choice: PowerGraph Masters and Mirrors Masters communicate with all mirrors Our hypothesis: placing masters and mirrors close by should reduce communication cost Vertex Cuts Edge Cuts
Master mirror placement Place replicas of a vertex first and then decide where to place the master Place the master of each vertex first and then decide where to place the replica - Hashing M R M R M
Random Partitioning Fast ingress Communication cost between master and mirrors can be high Replication factor could be high M R R
Oblivious Partitioning Slower ingress Heuristic based partitioning Leads to smaller replication factor than random Starting point to optimize Master mirror communication M R
Grid Partitioning Intersecting constraint sets Leads to a controlled replication factor Master mirror communication not optimized M R
Topology Aware Variants Make the partitioning step aware of the underlying network topology Place masters and mirrors such that communication cost is minimized
Choosing a master Pick master such that total number of hops are minimum Geometric centroid Edge degrees of each replica can be different Weighted Centroid
Grid Centroid Edges are placed using the Grid partitioning Strategy first Load: number of masters on candidate Number of edges on mirror Number of hops between mirror and candidate
Restricted Oblivious
Restricted Oblivious Number of edges on candidate Maximum number of edges on a node Minimum number of edges on a node Number of hops between candidate and master
Experiments Cluster size: 36 nodes Algorithm: Approximate diameter Graph: Power-law, 20 million vertices
Tradeoff between runtime and ingress
Data intensive algorithms benefit more Graph Algorithms Data intensive algorithms benefit more
Improvements depend on type of graph Graph Type Improvements depend on type of graph
Network Data Transfer
Other System Optimizations Controlling the frequency of data injection into network impacts runtime in certain algorithms Smaller network buffers => flushed more frequently
Small computation and network data benefit from frequent flushing Buffer Sizes PageRank Approximate Diameter Small computation and network data benefit from frequent flushing
Decisions, decisions
DPRG: http://dprg.cs.uiuc.edu Conclusions Two new topology-aware algorithms for graph partitioning No ‘one size fits all’ approach to graph partitioning We propose a decision tree that can help decide which partitioning algorithm is best System optimizations complement performance DPRG: http://dprg.cs.uiuc.edu
Questions and Feedback? DPRG: http://dprg.cs.uiuc.edu