Download presentation
Presentation is loading. Please wait.
Published byLilly Aune Modified over 6 years ago
1
Graph Colouring as a Challenge Problem for Dynamic Graph Processing on Distributed Systems
Scott Sallinen, Keita Iwabuchi, Suraj Poudel, Maya Gokhale, Matei Ripeanu, Roger Pearce This work was partially performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA (LLNL-PRES ). Lawrence Livermore National Laboratory University of British Columbia
2
Graphs are Everywhere 19th century shipping routes
3
Graphs are Constantly Changing
Transportation and flight network Showcases a key characteristic of real world graphs – hubs or high centrality locations 21st century transportation network
4
Graphs Evolve Networks change over time.
How do we analyze this? Could store static snapshots. But how do we answer queries? Q: What is the most friends Cop has ever had?
5
How should we query a Graph?
Pause, Snapshot Incrementally Pros: + Classic, known algorithms + Queries on static graphs: optimized Cons: - Lose info between snapshots - Granularity depends on # snapshots - Needs pre-processing - Might rebuild solutions from scratch Pros: + Maintain complete granularity + No pre-processing Cons: - No longer ‘classic’ - Need infrastructure support - Need to design new algorithms That’s not a problem, that’s a research question!
6
Research Directions Infrastructure support for large scale dynamic graph processing. Novel algorithm and design principles. Infrastructure Algorithm
7
Design Space Algorithm Hook Incoming Graph Changes
Modify Graph Storage Algorithm Hook 3. How do we store a graph like this? 1. How should we model changes? 2. What should an algorithm look like?
8
Incoming Graph Changes
1. Modeling Changes Incoming Graph Changes Algorithm Hook Modify Graph Storage Changes occur in networks as edge centric events.
9
Incoming Graph Changes
2. Designing Algorithms Incoming Graph Changes Algorithm Hook Modify Graph Storage Queries update based on events. Events caused by: Adds, deletes, updates, other events Q: What is the most friends Cop has ever had? Add edge to cop: degree++ most = max(most, degree) Delete edge to cop: degree--
10
Incoming Graph Changes
3. Storing the Graph Incoming Graph Changes Algorithm Hook Modify Graph Storage ‘Vertex as a Process’ design. Distribute events to associated vertex-process (Consistent Hashing) Improve upon existing frameworks: [1] Distributed Static Graph Analysis [2] Efficient Dynamic Storage [1] Roger Pearce, Maya Gokhale, Nancy M Amato. “Multithreaded asynchronous graph traversal for in-memory and semi-external memory.” Supercomputing (2010) [2] Keita Iwabuchi, Scott Sallinen, Roger Pearce, Brian Van Essen, Maya Gokhale, Satoshi Matsuoka. “Towards a Distributed Large Scale Dynamic Graph Data Store.” GABB (2016).
11
Research Contributions
Infrastructure support for large scale dynamic graph processing. Novel algorithm and design principles. Start with a challenge problem.
12
Graph Colouring Goal: Build ‘independent sets’.
Pre-processing for allocation. (Similar to sorting) Independence discovery. Clustering analysis. Sample selection.
13
Motivation to use Colouring
No solution exists for true dynamic colouring. Literature describes vertex addition, not edge changes. (Doesn’t make sense – a person doesn’t know all their friends ever!) Not embarrassingly parallel, and not explored at large scale (statically OR dynamically)
14
Graph Colouring as Events
Incoming Graph Changes Algorithm Hook Modify Graph Storage Queries update based on events. Add edge: Case 1: A is NOT the same colour as B Case 2: A is the same colour as B Recolour the vertex with the lower hash. If hash(A) < hash(B), A recolours. Otherwise, B recolours.
15
Graph Colouring as Events
Incoming Graph Changes Algorithm Hook Modify Graph Storage Queries update based on events. Delete edge: If a vertex can ‘reduce’ it’s colour, do so. (Decrease colour count)
16
Graph Colouring as Events
Incoming Graph Changes Algorithm Hook Modify Graph Storage Events continue, and we maintain a valid colouring. Add edge: Delete edge:
17
Research Contributions
Infrastructure support for large scale dynamic graph processing. Novel algorithm and design principles. So what’s the evaluation?
18
Evaluation Hard to compare Static vs. Dynamic
Comparing a single solution of the final graph, to a ‘continuous’ solution: ALL solutions during evolution up, to the final graph. Platform: Catalyst cluster, Lawrence Livermore National Laboratory 324 nodes: 12-core Intel Xeon E5-2695v2 (2.4 GHz) GB Memory Intel 910 PCI-attached NAND Flash
19
Analyzing Performance
Performance on par, or better than one solution with GraphLab (static). At high node count: A continuous solution is only 2x as expensive as one snapshot. Only the blue line is a dynamic solution Static graphs are randomized and ingested Friendster Dataset, ~4 billion edges.
20
Analyzing Solution Quality
Compare to a Static, Greedy Algorithm Solution quality is similar, varying at most by 16%. Solution space shows a remarkably similar distribution. With a novel, non-deterministic approach, we must show that the quality of the solution is good.
21
Scaling to Massive Graphs
Real-world Web hyperlink graph with ~250 Billion edges (5 TB) GraphLab cannot colour this graph, even with 300 nodes (38 TB agg. mem.) Inspired by lack of large scale graphs in literature, we use a
22
Real Dynamic Graphs To fully show the capabilities of the framework, we curated a real dynamic crawl of Wikipedia. Vertices are pages, and edges are hyperlinks. No ‘static equivalent’, the entire dataset is a set of dynamic events. The relationship between edge additions and deletions is captured.
23
Wikipedia Dynamic Graph
We present sampled results of the algorithm based on year. this is not snapshotting
24
Note on Digesting Events in Parallel
The framework and colouring algorithm supports parallel, asynchronous events. (details in paper) Digesting event streams in parallel has an obvious boost in performance.
25
Lessons Learned Infrastructure support for large scale dynamic graph processing. An event driven dynamic framework is not only possible, but highly performant and scalable. Novel algorithm and design principles. Dynamic algorithms are able to close the performance gap to static ones, while capturing true dynamic properties. We believe the techniques, strategies, and behaviour associated with this, will be useful for future dynamic algorithm design.
26
Thank you! Collaborators:
Scott Sallinen Keita Iwabuchi Roger Pearce Matei Ripeanu Suraj Poudel Maya Gokhale
27
Questions? Lessons Learned
Infrastructure support for large scale dynamic graph processing. An event driven dynamic framework is not only possible, but highly performant and scalable. Novel algorithm and design principles. Dynamic algorithms are able to close the performance gap to static ones, while capturing true dynamic properties. We believe the techniques, strategies, and behaviour associated with this, will be useful for future dynamic algorithm design. Questions?
29
Static method: Baseline Comparison - GraphLab
Graphlab can not be run on a small node count, despite 128 GB memory per node (graphs being ~60 GB in edge pair txt format) GraphLab scales poorly (to negatively) with increasing colour count and node count. The plots present runtime for GraphLab and our solution using LDF and Hash vertex priority assignment for 1 to 64 nodes. Missing data-points in the plot indicate that GraphLab was unable to load the data.
30
Static method: Baseline Comparison - Literature
We see long colouring times in literature. Multiple orders of magnitude of difference, not all explained by hardware. Faster due to allowing asynchronicity (instead of a BSP/map-reduce model) Especially useful for communication among large compute node counts. Cluster solutions for the Friendster graph. Left: ~2 hours. 26 nodes: 32 cores, 64 GB RAM each. (S. Salihoglu et. al, VLDB 2014) Right: ~10 minutes. 15 nodes: two E GHz (12 cores), 48 GB RAM each. (Y. Lu et. al, VLDB 2015) This paper: 35 seconds on 1 node w/ two E GHz (24 cores), 128 GB RAM. 3 seconds on 16 nodes.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.