GraphX: Graph Processing in a Distributed Dataflow Framework Joseph Gonzalez Postdoc, UC-Berkeley AMPLab Co-founder, GraphLab Inc. Joint work with Reynold Xin, Ankur Dave, Daniel Crankshaw, Michael Franklin, and Ion Stoica OSDI 2014 The goal of this project was to distill the essential advances developed in the context of specialized graph processing systems and lift those advances into general purpose distributed dataflow systems unifying graphs and tables and enabling a single system to easily and efficiently support both types of data and span the entire analytics pipeline. To motivate the need for this unified approach to analytics let me begin by illustrating modern analytics pipelines. UC BERKELEY
Modern Analytics Link Table Hyperlinks PageRank Top 20 Pages Raw Title Link Hyperlinks PageRank Top 20 Pages Title PR Raw Wikipedia < / > XML Top Communities Com. PR.. Discussion Table User Disc. Community Detection User Community Com. Editor Graph
Tables Link Table Hyperlinks PageRank Top 20 Pages Raw Wikipedia Title Link Hyperlinks PageRank Top 20 Pages Title PR Raw Wikipedia < / > XML Top Communities Com. PR.. Modern analytics involves many stages and many views of data. <Make animation auto + practice shortened versions of second and third path> Build community topic table separtely Discussion Table User Disc. Community Detection User Community Com. Editor Graph
Graphs Link Table Hyperlinks PageRank Top 20 Pages Raw Wikipedia Title Link Hyperlinks PageRank Top 20 Pages Title PR Raw Wikipedia < / > XML Top Communities Com. PR.. Modern analytics involves many stages and many views of data. <Make animation auto + practice shortened versions of second and third path> Build community topic table separtely Discussion Table User Disc. Community Detection User Community Com. Editor Graph
Separate Systems Tables Graphs
Separate Systems Dataflow Systems Graphs Table Result Row
Separate Systems Dataflow Systems Graph Systems Table Dependency Graph Result Row Dependency Graph Pregel Add list of papers to bottom of slide
Difficult to Use Users must Learn, Deploy, and Manage multiple systems Leads to brittle and often complex interfaces
Limited reuse internal data-structures across stages Inefficient Extensive data movement and duplication across the network and file system < / > XML HDFS Limited reuse internal data-structures across stages
GraphX Unifies Computation on Tables and Graphs Table View Graph View GraphX Spark Dataflow Framework Enabling a single system to easily and efficiently support the entire pipeline
? Separate Systems Dataflow Systems Graph Systems Dependency Graph Pregel Table Row Row Result Row Row
Separate Systems Dataflow Systems Graph Systems Pregel Table Dependency Graph Row Row Result Row Row
PageRank on the Live-Journal Graph ? Graphlab vs hadoop speedup Build spark second ----- Meeting Notes (3/21/14 15:09) ----- How many vertices is LJ, Wiki, Twitter? ----- Meeting Notes (3/24/14 20:39) ----- Animate, build in spark x faster than hadoop graphlab x faster than spark ----- Meeting Notes (10/3/14 09:56) ----- Try putting this before these optimizations Hadoop is 60x slower than GraphLab Spark is 16x slower than GraphLab
Distill the lessons learned from specialized graph systems Key Question How can we naturally express and efficiently execute graph computation in a general purpose dataflow framework? Distill the lessons learned from specialized graph systems
Key Question Representation Optimizations How can we naturally express and efficiently execute graph computation in a general purpose dataflow framework? Representation Optimizations
Raw Wikipedia Hyperlinks PageRank Top 20 Pages Link Table Editor Graph < / > XML Hyperlinks PageRank Top 20 Pages Title PR Link Table Link Editor Graph Community Detection User Com. Discussion Table Disc. Top Communities PR..
Example Computation: PageRank Express computation locally: Iterate until convergence Rank of Page i Weighted sum of neighbors’ ranks Random Reset Prob.
“Think like a Vertex.” - Malewicz et al., SIGMOD’10
Graph-Parallel Pattern Gonzalez et al. [OSDI’12] Gather information from neighboring vertices
Graph-Parallel Pattern Gonzalez et al. [OSDI’12] Apply an update the vertex property
Graph-Parallel Pattern Gonzalez et al. [OSDI’12] Scatter information to neighboring vertices
Many Graph-Parallel Algorithms MACHINE LEARNING NETWORK ANALYSIS Collaborative Filtering Community Detection Alternating Least Squares Triangle-Counting Stochastic Gradient Descent K-core Decomposition Tensor Factorization K-Truss Structured Prediction Graph Analytics Loopy Belief Propagation PageRank Max-Product Linear Programs Personalized PageRank Shortest Path Gibbs Sampling Graph Coloring Semi-supervised ML Graph SSL CoEM
Specialized Computational Pattern Specialized Graph Optimizations
Graph System Optimizations Specialized Data-Structures Vertex-Cuts Partitioning Remote Caching / Mirroring Message Combiners Active Set Tracking ----- Meeting Notes (10/3/14 09:56) ----- Graph System Optimizations Make more crisp vertex-cut to mirror transition The Price we paid to do this was a specialized system. The question is how to make these optimizations work in a generalized.
Representation Optimizations Distributed Join Optimization Graphs Horizontally Partitioned Tables Join Vertex Programs Dataflow Operators Optimizations Advances in Graph Processing Systems Distributed Join Optimization Materialized View Maintenance
Property Graph Data Model B C A D F E Property Graph Vertex Property: User Profile Current PageRank Value ----- Meeting Notes (10/3/14 10:02) ----- Add legend for vertex and edge properties. ----- Meeting Notes (10/3/14 10:16) ----- separate structure from properties in our table representation enabling us to reuse structural information while transforming properties. Edge Property: Weights Timestamps
Encoding Property Graphs as Tables Vertex Table (RDD) Routing Table (RDD) B C D E A F 1 2 Edge Table (RDD) A B C D E F Property Graph Part. 1 Machine 1 Machine 2 B C D E A F B C A D B C A A D D Vertex Cut F E A D ----- Meeting Notes (10/3/14 10:02) ----- Add legend for vertex and edge properties. ----- Meeting Notes (10/3/14 10:16) ----- separate structure from properties in our table representation enabling us to reuse structural information while transforming properties. Part. 2 F E
Separate Properties and Structure Reuse structural information across multiple graphs Input Graph Transformed Graph Vertex Table Routing Table Edge Routing Table Edge Table Vertex Table Transform Vertex Properties The normalized representation is crucial to being able to support iterative computation as well as having multiple graphs and sub-graphs derived from the same base data.
Table Operators Table operators are inherited from Spark: map filter groupBy sort union join leftOuterJoin rightOuterJoin reduce count fold reduceByKey groupByKey cogroup cross zip sample take first partitionBy mapWith pipe save ...
Graph Operators (Scala) class Graph [ V, E ] { def Graph(vertices: Table[ (Id, V) ], edges: Table[ (Id, Id, E) ]) // Table Views ----------------- def vertices: Table[ (Id, V) ] def edges: Table[ (Id, Id, E) ] def triplets: Table [ ((Id, V), (Id, V), E) ] // Transformations ------------------------------ def reverse: Graph[V, E] def subgraph(pV: (Id, V) => Boolean, pE: Edge[V,E] => Boolean): Graph[V,E] def mapV(m: (Id, V) => T ): Graph[T,E] def mapE(m: Edge[V,E] => T ): Graph[V,T] // Joins ---------------------------------------- def joinV(tbl: Table [(Id, T)]): Graph[(V, T), E ] def joinE(tbl: Table [(Id, Id, T)]): Graph[V, (E, T)] // Computation ---------------------------------- def mrTriplets(mapF: (Edge[V,E]) => List[(Id, T)], reduceF: (T, T) => T): Graph[T, E] } Make mrTriplets more visible. Emphasize how this API blurs distinction between graphs and tables
Graph Operators (Scala) class Graph [ V, E ] { def Graph(vertices: Table[ (Id, V) ], edges: Table[ (Id, Id, E) ]) // Table Views ----------------- def vertices: Table[ (Id, V) ] def edges: Table[ (Id, Id, E) ] def triplets: Table [ ((Id, V), (Id, V), E) ] // Transformations ------------------------------ def reverse: Graph[V, E] def subgraph(pV: (Id, V) => Boolean, pE: Edge[V,E] => Boolean): Graph[V,E] def mapV(m: (Id, V) => T ): Graph[T,E] def mapE(m: Edge[V,E] => T ): Graph[V,T] // Joins ---------------------------------------- def joinV(tbl: Table [(Id, T)]): Graph[(V, T), E ] def joinE(tbl: Table [(Id, Id, T)]): Graph[V, (E, T)] // Computation ---------------------------------- def mrTriplets(mapF: (Edge[V,E]) => List[(Id, T)], reduceF: (T, T) => T): Graph[T, E] } capture the Gather-Scatter pattern from specialized graph-processing systems . def mrTriplets(mapF: (Edge[V,E]) => List[(Id, T)], reduceF: (T, T) => T): Graph[T, E]
Triplets Join Vertices and Edges The triplets operator joins vertices and edges: Vertices B A C D Triplets Edges A B C D A A B B A C B D
Map-Reduce Triplets Map-Reduce triplets collects information about the neighborhood of each vertex: Src. or Dst. MapFunction( ) (B, ) Reduce (B, ) (C, + ) (D, ) A B MapFunction( ) (C, ) A C MapFunction( ) (C, ) B C MapFunction( ) (D, ) C D Message Combiners
Using these basic GraphX operators we implemented Pregel and GraphLab in under 50 lines of code!
The GraphX Stack (Lines of Code) Spark (30,000) Pregel API (34) PageRank (20) Connected Comp. (20) K-core (60) Triangle Count (50) LDA (220) SVD++ (110) The simple set of GraphX operators are capable of expressing not only existing graph processing systems but can also be more convenient to express certain graph algorithms directly. Some algorithms are more naturally expressed using the GraphX primitive operators
Representation Optimizations Distributed Join Optimization Graphs Horizontally Partitioned Tables Join Vertex Programs Dataflow Operators Optimizations Advances in Graph Processing Systems Distributed Join Optimization Materialized View Maintenance
Join Site Selection using Routing Tables (RDD) B C D E A F 1 2 Vertex Table (RDD) Edge Table (RDD) Never Shuffle Edges! A B C D E F B C D E A F The routing table is one of the essential ideas GraphX introduces to optimize distributed joins. It is important to note that we never shuffle edges and this can be a big win since the edge table is often orders of magnitude larger than the vertex table.
Caching for Iterative mrTriplets Vertex Table (RDD) Edge Table (RDD) Mirror Cache A B C D E F A B C D E A F A B C D A Reusable Hash Index Scan B C Mirror Cache D D D E F A E Reusable Hash Index Scan F
Incremental Updates for Triplets View Vertex Table (RDD) Edge Table (RDD) Mirror Cache A B C D E F Change A A B C D E A F B C D A Change title Mirror Cache D E F A Change E Scan
Aggregation for Iterative mrTriplets Vertex Table (RDD) Edge Table (RDD) Mirror Cache A B C D E F Change B C D E A F A Local Aggregate B B Change C C D Change Mirror Cache Change A Local Aggregate Change D Scan D E Change F F
Reduction in Communication Due to Cached Updates Most vertices are within 8 hops of all vertices in their comp.
Benefit of Indexing Active Vertices Without Active Tracking Active Vertex Tracking
Without Join Elimination Identify and bypass joins for unused triplet fields Java bytecode inspection Without Join Elimination Join Elimination Better Factor of 2 reduction in communication
Additional Optimizations Indexing and Bitmaps: To accelerate joins across graphs To efficiently construct sub-graphs Lineage based fault-tolerance Exploits Spark lineage to recover in parallel Eliminates need for costly check-points Substantial Index and Data Reuse: Reuse routing tables across graphs and sub-graphs Reuse edge adjacency information and indices Split Hash Table: reuse key sets across hash tables
System Comparison Goal: Demonstrate that GraphX achieves performance parity with specialized graph- processing systems. Setup: 16 node EC2 Cluster (m2.4xLarge) + 1GigE Compare against GraphLab/PowerGraph (C++), Giraph (Java), & Spark (Java/Scala) 8 GB RAM, 8 Virtual Cores
PageRank Benchmark EC2 Cluster of 16 x m2.4xLarge (8 cores) + 1GigE Twitter Graph (42M Vertices,1.5B Edges) UK-Graph (106M Vertices, 3.7B Edges) 18x 7x Runtime (Seconds) ----- Meeting Notes (10/3/14 10:02) ----- Make GraphX first. GraphX performs comparably to state-of-the-art graph processing systems.
Connected Comp. Benchmark EC2 Cluster of 16 x m2.4xLarge (8 cores) + 1GigE Twitter Graph (42M Vertices,1.5B Edges) UK-Graph (106M Vertices, 3.7B Edges) 10x 8x Runtime (Seconds) Out-of-Memory GraphX performs comparably to state-of-the-art graph processing systems.
Graphs are just one stage…. What about a pipeline?
A Small Pipeline in GraphX Raw Wikipedia < / > XML Hyperlinks PageRank Top 20 Pages HDFS Compute Spark Preprocess Spark Post. 1492 605 Also faster in human time but it is hard to measure. 375 342 Timed end-to-end GraphX is the fastest
Adoption and Impact GraphX is now part of Apache Spark Part of Cloudera Hadoop Distribution In production at Alibaba Taobao Order of magnitude gains over Spark Inspired GraphLab Inc. SFrame technology Unifies Tables & Graphs on Disk
GraphX Unified Tables and Graphs New API Blurs the distinction between Tables and Graphs New System Unifies Data-Parallel Graph-Parallel Systems Enabling users to easily and efficiently express the entire analytics pipeline
Integrated Frameworks What did we Learn? Specialized Systems Integrated Frameworks Graph Systems GraphX
Integrated Frameworks Future Work Specialized Systems Integrated Frameworks Graph Systems GraphX Parameter Server ?
Integrated Frameworks Future Work Specialized Systems Integrated Frameworks Graph Systems GraphX Parameter Server Asynchrony Non-deterministic Shared-State
Thank You http://amplab.cs.berkeley.edu/projects/graphx/ jegonzal@eecs.berkeley.edu Reynold Xin Ankur Dave Daniel Crankshaw Michael Franklin Ion Stoica
Related Work Specialized Graph-Processing Systems: GraphLab [UAI’10], Pregel [SIGMOD’10], Signal- Collect [ISWC’10], Combinatorial BLAS [IJHPCA’11], GraphChi [OSDI’12], PowerGraph [OSDI’12], Ligra [PPoPP’13], X-Stream [SOSP’13] Alternative to Dataflow framework: Naiad [SOSP’13]: GraphLINQ Hyracks: Pregelix [VLDB’15] Distributed Join Optimization: Multicast Join [Afrati et al., EDBT’10] Semi-Join in MapReduce [Blanas et al., SIGMOD’10] Semi-Join in MapReduce
Edge Files Have Locality UK-Graph (106M Vertices, 3.7B Edges) GraphLab rebalances the edge-files on-load. Runtime (Seconds) GraphX preserves the on-disk layout through Spark. Better Vertex-Cut ----- Meeting Notes (10/3/14 10:05) ----- Skip
Scalability Scales slightly better than PowerGraph/GraphLab Runtime Twitter Graph (42M Vertices,1.5B Edges) Scales slightly better than PowerGraph/GraphLab Runtime GraphX Each node has 8 virtual cores. (Runtime is in 10 iterations) Linear Scaling EC2-Nodes
Apache Spark Dataflow Platform Zaharia et al., NSDI’12 Resilient Distributed Datasets (RDD): RDD Load Map RDD RDD Reduce HDFS HDFS ----- Meeting Notes (10/3/14 09:56) ----- Replace with standard spark slide
Apache Spark Dataflow Platform Zaharia et al., NSDI’12 Resilient Distributed Datasets (RDD): Persist in Memory RDD Load Map RDD RDD Reduce HDFS HDFS ----- Meeting Notes (10/3/14 09:56) ----- Switch to standard spark slide. .cache() Optimized for iterative access to data.
PageRank Benchmark EC2 Cluster of 16 x m2.4xLarge Nodes + 1GigE Twitter Graph (42M Vertices,1.5B Edges) UK-Graph (106M Vertices, 3.7B Edges) Runtime (Seconds) ✓ ? ? GraphX performs comparably to state-of-the-art graph processing systems.
Shared Memory Advantage Spark Shared Nothing Model Core Core Core Core Shuffle Files GraphLab Shared Memory Core Shared De-serialized In-Memory Graph ----- Meeting Notes (10/3/14 10:02) ----- Skip
Shared Memory Advantage Spark Shared Nothing Model Twitter Graph (42M Vertices,1.5B Edges) Core Core Core Core Shuffle Files Runtime (Seconds) GraphLab No SHM. TCP/IP Core
PageRank Benchmark Twitter Graph (42M Vertices,1.5B Edges) UK-Graph (106M Vertices, 3.7B Edges) GraphX performs comparably to state-of-the-art graph processing systems.
Connected Comp. Benchmark EC2 Cluster of 16 x m2.4xLarge Nodes + 1GigE Twitter Graph (42M Vertices,1.5B Edges) UK-Graph (106M Vertices, 3.7B Edges) 10x 8x Runtime (Seconds) Out-of-Memory Out-of-Memory GraphX performs comparably to state-of-the-art graph processing systems.
Fault-Tolerance Leverage Spark Fault-Tolerance Mechanism
Graph-Processing Systems Pregel oogle CombBLAS Kineograph X-Stream GraphChi Ligra GPS Representation Expose specialized API to simplify graph programming.
Vertex-Program Abstraction Pregel_PageRank(i, messages) : // Receive all the messages total = 0 foreach( msg in messages) : total = total + msg // Update the rank of this vertex R[i] = 0.15 + total // Send new messages to neighbors foreach(j in out_neighbors[i]) : Send msg(R[i]) to vertex j i ----- Meeting Notes (10/3/14 09:56) ----- Might cut this.
The Vertex-Program Abstraction R[4] * w41 R[3] * w31 R[2] * w21 + GraphLab_PageRank(i) // Compute sum over neighbors total = 0 foreach( j in neighbors(i)): total += R[j] * wji // Update the PageRank R[i] = 0.15 + total 4 1 3 2 Low, Gonzalez, et al. [UAI’10]
Example: Oldest Follower 23 42 Calculate the number of older followers for each user? val olderFollowerAge = graph .mrTriplets( e => // Map if(e.src.age > e.dst.age) { (e.srcId, 1) else { Empty } , (a,b) => a + b // Reduce ) .vertices B C 30 A Cut this slide ----- Meeting Notes (10/3/14 10:02) ----- Polish this slide and go through faster. D E 19 75 F 16
Enhanced Pregel in GraphX Require Message Combiners pregelPR(i, messageList ): messageSum // Receive all the messages total = 0 foreach( msg in messageList) : total = total + msg messageSum // Update the rank of this vertex R[i] = 0.15 + total combineMsg(a, b): // Compute sum of two messages return a + b Remove Message Computation from the Vertex Program // Send new messages to neighbors foreach(j in out_neighbors[i]) : Send msg(R[i]/E[i,j]) to vertex sendMsg(ij, R[i], R[j], E[i,j]): // Compute single message return msg(R[i]/E[i,j])
PageRank in GraphX // Load and initialize the graph val graph = GraphBuilder.text(“hdfs://web.txt”) val prGraph = graph.joinVertices(graph.outDegrees) // Implement and Run PageRank val pageRank = prGraph.pregel(initialMessage = 0.0, iter = 10)( (oldV, msgSum) => 0.15 + 0.85 * msgSum, triplet => triplet.src.pr / triplet.src.deg, (msgA, msgB) => msgA + msgB)
Example Analytics Pipeline // Load raw data tables val articles = sc.textFile(“hdfs://wiki.xml”).map(xmlParser) val links = articles.flatMap(article => article.outLinks) // Build the graph from tables val graph = new Graph(articles, links) // Run PageRank Algorithm val pr = graph.PageRank(tol = 1.0e-5) // Extract and print the top 20 articles val topArticles = articles.join(pr).top(20).collect for ((article, pageRank) <- topArticles) { println(article.title + ‘\t’ + pageRank) }
Apache Spark Dataflow Platform Zaharia et al., NSDI’12 Resilient Distributed Datasets (RDD): Persist in Memory RDD Load Map RDD RDD Reduce HDFS HDFS .cache() Lineage: HDFS RDD Reduce Map Load