Pregelix: Big(ger) Graph Analytics on A Dataflow Engine

Slides:



Advertisements
Similar presentations
Pregel: A System for Large-Scale Graph Processing
Advertisements

Object-Orientation Meets Big Data Language Techniques towards Highly- Efficient Data-Intensive Computing Harry Xu UC Irvine.
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
epiC: an Extensible and Scalable System for Processing Big Data
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Parallel Computing MapReduce Examples Parallel Efficiency Assignment
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
Distributed Graph Processing Abhishek Verma CS425.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Facade: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications UC Irvine USA Khanh Nguyen Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang,
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 22: Stream Processing, Graph Processing All slides © IG.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Pregel: A System for Large-Scale Graph Processing
Hyracks: A new partitioned-parallel platform for data-intensive computation Vinayak Borkar UC Irvine (joint work with M. Carey, R. Grover, N. Onose, and.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
GRAPH PROCESSING Hi, I am Mayank and the second presenter for today is Shadi. We will be talking about Graph Processing.
Distributed shared memory. What we’ve learnt so far  MapReduce/Dryad as a distributed programming model  Data-flow (computation as vertex, data flow.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.
PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki
Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees Da Yan (CUHK), James Cheng (CUHK), Kai Xing (HKUST), Yi Lu (CUHK), Wilfred.
Data Structures and Algorithms in Parallel Computing Lecture 4.
Data Structures and Algorithms in Parallel Computing
Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.
MapReduce Joins Shalish.V.J. A Refresher on Joins A join is an operation that combines records from two or more data sets based on a field or set of fields,
Acknowledgement: Arijit Khan, Sameh Elnikety. Google: > 1 trillion indexed pages Web GraphSocial Network Facebook: > 1.5 billion active users 31 billion.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
REX: RECURSIVE, DELTA-BASED DATA-CENTRIC COMPUTATION Yavuz MESTER Svilen R. Mihaylov, Zachary G. Ives, Sudipto Guha University of Pennsylvania.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Presented by: Omar Alqahtani Fall 2016
TensorFlow– A system for large-scale machine learning
Big Data is a Big Deal!.
Miraj Kheni Authors: Toyotaro Suzumura, Koji Ueno
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Large-scale file systems and Map-Reduce
Spark Presentation.
Harry Xu University of California, Irvine & Microsoft Research
PREGEL Data Management in the Cloud
Yak: A High-Performance Big-Data-Friendly Garbage Collector
Speculative Region-based Memory Management for Big Data Systems
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Introduction to Spark.
Distributed Systems CS
Data Structures and Algorithms in Parallel Computing
Towards Effective Partition Management for Large Graphs
Yak: A High-Performance Big-Data-Friendly Garbage Collector
湖南大学-信息科学与工程学院-计算机与科学系
Distributed Systems CS
CS110: Discussion about Spark
Replication-based Fault-tolerance for Large-scale Graph Processing
Pregelix: Think Like a Vertex, Scale Like Spandex
Parallel Applications And Tools For Cloud Computing Environments
Overview of big data tools
Distributed Systems CS
DryadInc: Reusing work in large-scale computations
MapReduce: Simplified Data Processing on Large Clusters
Lecture 29: Distributed Systems
Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.
Presentation transcript:

Pregelix: Big(ger) Graph Analytics on A Dataflow Engine Yingyi Bu (UC Irvine) Joint work with: Vinayak Borkar (UC Irvine) , Michael J. Carey (UC Irvine), Tyson Condie (UCLA), Jianfeng Jia (UC Irvine)

Outline Introduction Pregel Semantics The Pregel Logical Plan The Pregelix System Experimental Results Related Work Conclusions

Introduction Big Graphs are becoming common web graph social network ......

Introduction How Big are Big Graphs? Weapons for mining Big Graphs Web: 8.53 Billion pages in 2012 Facebook active users: 1.01 Billion de Bruijn graph: 3 Billion nodes ...... Weapons for mining Big Graphs Pregel (Google) Giraph (Facebook, LinkedIn, Twitter, etc.) Distributed GraphLab (CMU) GraphX (Berkeley)

Programming Model Think like a vertex receive messages update states send messages

Programming Model Vertex public abstract class Vertex<I extends WritableComparable, V extends Writable, E extends Writable, M extends Writable> implements Writable{ public abstract void compute(Iterator<M> incomingMessages); ....... } Helper methods sendMsg(I vertexId, M msg) voteToHalt() getSuperstep()

More APIs Message Combiner Global Aggregator Graph Mutations Combine messages Reduce network traffic Global Aggregator Aggregate statistics over all live vertices Done for each iteration Graph Mutations Add vertex Delete vertex A conflict resolution function

Pregel Semantics Bulk-synchronous Compute invocation Global halting A global barrier between iterations Compute invocation Once per active vertex in each superstep A halted vertex is activated when receiving messages Global halting Each vertex is halted No messages are in flight Graph mutations Partial ordering of operations User-defined resolve function

Process-centric runtime superstep:3 halt: false master Vertex { id: 1 halt: false value: 3.0 edges: (3,1.0), (4,1.0) } Vertex { id: 3 edges: (2,1.0), (3,1.0) Vertex { id: 2 halt: false value: 2.0 edges: (3,1.0), (4,1.0) } Vertex{ id: 4 value: 1.0 edges: (1,1.0) <2, 3.0> <3, 1.0> <4, 3.0> <5, 1.0> worker-1 worker-2 message <id, payload> control signal

Issues and Opportunities Out-of-core support “I’m trying to run the sample connected components algorithm on a large data set on a cluster, but I get a “java.lang.OutOfMemoryError: Java heap space” error.” 26 similar threads on Giraph-users mailing list during the past year!

Issues and Opportunities Physical flexibility PageRank, SSSP, CC, Triangle Counting Web graph, social network, RDF graph 8 machine school cluster, 200 machine Facebook data center One-size fits-all?

Issues and Opportunities Software simplicity Pregel GraphLab Giraph Hama ...... Vertex/map/msg data structures Task scheduling Memory management Message delivery Network management

The Pregelix Approach Relation Schema Vertex (vid, halt, value, edges) Msg (vid, payload) GS (halt, aggregate, superstep) vid payload 2 3.0 4 3.0 5 1.0 1 1.0 vid msg halt value edges Msg vid=vid 1 NULL false 3.0 (3,1.0),(4,1.0) 3 false 3.0 vid halt value edges 1.0 (2,1.0),(3,1.0) 2 2.0 5 1.0 NULL NULL NULL false (3,1.0),(4,1.0) 4 false 1.0 2 3.0 false 2.0 (1,1.0) (3,1.0),(4,1.0) 1 false 3.0 4 3.0 false 1.0 (3,1.0),(4,1.0) (1,1.0) 3 false 3.0 (2,1.0),(3,1.0) Vertex

Pregel UDFs compute combine aggregate resolve Executed at each active vertex in each superstep combine Aggregation function for messages aggregate Aggregate function for the global states resolve Used to resolve graph mutations

Logical Plan vid combine … Vertexi+1 Msgi+1 UDF Call (compute) Flow Data D2 Vertex tuples D3 Msg tuples D7 Msg tuples after combination D3 D2 D4,D5,D6 … UDF Call (compute) D1 (V.halt =false || M.payload != NULL) M.vid=V.vid Msgi(M) Vertexi(V)

Logical Plan … vid(resolve) … GSi+1 superstep=G.superstep+1 Flow Data D4 The global halting state contribution D5 Values for aggregate D8 The global halt state D9 The global aggregate value D10 The increased superstep GSi+1 D10 D8 D9 superstep=G.superstep+1 Agg(bool-and) Agg(aggregate) D4 D5 D2,D3,D6 … UDF Call (compute) GSi(G) D1 Flow Data D6 Vertex tuples for deletions and insertions Vertexi+1 vid(resolve) D6 D2,D3,D4,D5 … UDF Call (compute) D1

The Pregelix System Pregel Physical Plans Vertex/map/msg data structures Operators Access methods Task scheduling Task scheduling Record/Index management Memory management Data exchanging Buffer management Message delivery Connection management Network management A general purpose parallel dataflow engine

The Runtime The Hyracks data-parallel execution engine Hyracks Hadoop Runtime Choice? Hyracks Hadoop The Hyracks data-parallel execution engine Out-of-core operators Connectors Access methods User-configurable task scheduling Extensibility

Parallelism msg value vid msg halt value edges vid halt edges 2 3.0 false 2.0 1 NULL false 3.0 (3,1.0),(4,1.0) (3,1.0),(4,1.0) 4 3.0 false 1.0 3 1.0 false 3.0 (1,1.0) (2,1.0),(3,1.0) 5 1.0 NULL NULL NULL vid=vid vid=vid vid msg vid halt value edges vid msg vid halt value edges 2 3.0 2 false 2.0 (3,1.0) (4,1.0) 3 1.0 1 false 3.0 (3,1.0) (4,1.0) 4 3.0 4 false 1.0 5 1.0 3 false 3.0 (1,1.0) (2,1.0),(3,1.0) Msg-1 Vertex-1 Msg-2 Vertex-2 vid msg vid msg 2 3.0 3 1.0 5 1.0 4 3.0 output-Msg-1 output-Msg-2 Worker-1 Worker-2

Physical Choices Vertex storage Group-by Data redistribution Join B-Tree LSM B-Tree Group-by Pre-clustered group-by Sort-based group-by HashSort group-by Data redistribution m-to-n merging partitioning connector m-to-n partitioning connector Join Index Full outer join Index Left outer join

Data Storage Vertex Msg GS Partitioned B-tree or LSM B-tree Partitioned local files, sorted GS Stored on HDFS Cached in each worker

Physical Plan: Message Combination vidcombine vidcombine vidcombine vidcombine vidcombine vidcombine (Sort-based) (Sort-based) (Sort-based) (HashSort) (HashSort) (HashSort) vidcombine vidcombine vidcombine vidcombine vidcombine vidcombine (Sort-based) (Sort-based) (Sort-based) (HashSort) (HashSort) (HashSort) Sort-Groupby-M-to-N-Partitioning HashSort-Groupby-M-to-N-Partitioning vidcombine vidcombine vidcombine vidcombine vidcombine vidcombine (Preclustered) (Preclustered) (Preclustered) (Preclustered) (Preclustered) (Preclustered) vidcombine vidcombine vidcombine vidcombine vidcombine vidcombine (Sort-based) (Sort-based) (Sort-based) (HashSort) (HashSort) (HashSort) Sort-Groupby-M-to-N-Merge-Partitioning HashSort-Groupby-M-to-N-Merge-Partitioning M-to-N Partitioning Connector M-To-N Partitioning Merging Connector

Physical Plan: Message Delivery … Function Call (NullMsg) D12 D2 -- D6 Vidi+1 (halt = false) UDF Call (compute) … D1 D11 D2 -- D6 (V.halt = false || M.paylod != NULL) UDF Call (compute) D1 Index Left Outer Join Index Full Outer Join Merge (choose()) M.vid=I.vid M.vid=V.vid M.vid=V.vid Msgi(M) Vertexi(V) Vidi(I) Msgi(M) Vertexi(V)

Caching Iteration-aware (sticky) scheduling? Pregel, Giraph, GraphLab all have caches for this kind of iterative jobs. What do you do for caching? Iteration-aware (sticky) scheduling? 1 Loc: location constraints Caching of invariant data? B-tree buffer pool -- customized flushing policy: never flush dirty pages File system cache -- free

Experimental Results Setup Machines a UCI cluster ~ 32 machines 4 cores, 8GB memory, 2 disk drives. Datasets Yahoo! webmap (1,413,511,393 vertice, adjacency list, ~70GB) and its samples. The Billions of Tuples Challenge dataset (172,655,479 vertices, adjacency list, ~17GB), its samples, and its scale-ups. Giraph Latest trunk (revision 770) 4 vertex computation threads, 8GB JVM heap

Execution Time In-memory In-memory Out-of-core Out-of-core

Execution Time In-memory In-memory Out-of-core Out-of-core

Execution Time In-memory In-memory Out-of-core Out-of-core

Parallel Speedup

Parallel Scale-up

Throughput

Plan Flexibility In-memory Out-of-core 15x

Software Simplicity Lines-of-Code Giraph: 32,197 Pregelix: 8,514

More systems

More Systems

Related Work Parallel Data Management Big Graph Processing Systems Gama, GRACE, Teradata Stratosphere (TU Berlin) REX (UPenn) AsterixDB (UCI) Big Graph Processing Systems Pregel (Google) Giraph (Facebook, LinkedIn, Twitter, etc.) Distributed GraphLab (CMU) GraphX (Berkeley) Hama (Sogou, etc.) --- Too slow!

Conclusions Pregelix offers: Transparent out-of-core support Physical flexibility Software simplicity We target Pregelix to be an open-source production system, rather than just a research prototype: http://pregelix.ics.uci.edu

Q & A