PAGE: A Partition Aware Graph Computation Engine Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma EECS, Peking University, China.

Slides:



Advertisements
Similar presentations
Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.
Advertisements

Streaming Graph Partitioning KDD 8/15 Streaming Graph Partitioning for Large Distributed Graphs Isabelle Stanton, UC Berkeley Gabriel Kliot, Microsoft.
Benchmarking traversal operations over graph databases Marek Ciglan 1, Alex Averbuch 2 and Ladialav Hluchý 1 1 Institute of Informatics, Slovak Academy.
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
epiC: an Extensible and Scalable System for Processing Big Data
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Armend Hoxha Trevor Hodde Kexin Shi Mizan: A system for Dynamic Load Balancing in Large-Scale Graph Processing Presented by:
Efficient Cohesive Subgraph Detection in Parallel
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Parallel Subgraph Listing in a Large-Scale Graph Yingxia Shao  Bin Cui  Lei Chen  Lin Ma  Junjie Yao  Ning Xu   School of EECS, Peking University.
LFGRAPH: SIMPLE AND FAST DISTRIBUTED GRAPH ANALYTICS Hoque, Imranul, Vmware Inc. and Gupta, Indranil, University of Illinois at Urbana-Champaign – TRIOS.
Improved Mesh Partitioning For Parallel Substructure Finite Element Computations Shang-Hsien Hsieh, Yuan-Sen Yang and Po-Liang Tsai Department of Civil.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Google’s PageRank: The Math Behind the Search Engine Author:Rebecca S. Wills, 2006 Instructor: Dr. Yuan Presenter: Wayne.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Multilevel Hypergraph Partitioning G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar Computer Science Department, U of MN Applications in VLSI Domain.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Adaptive Control of Virtualized Resources in Utility Computing Environments HP Labs: Xiaoyun Zhu, Mustafa Uysal, Zhikui Wang, Sharad Singhal University.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.
Mizan Mizan: Optimizing Graph Mining in Large Parallel Systems Panos Kalnis King Abdullah University of Science and Technology (KAUST) H. Jamjoom (IBM.
Deeply Embedded Large Scale Networks Specify and Control Emerging Behavior.
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
QoS Supported Clustered Query Processing in Large Collaboration of Heterogeneous Sensor Networks Debraj De and Lifeng Sang Ohio State University Workshop.
Data Structures and Algorithms in Parallel Computing Lecture 4.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Dzmitry Kliazovich University of Luxembourg, Luxembourg
Data Structures and Algorithms in Parallel Computing Lecture 7.
Static Process Scheduling
Managing Web Server Performance with AutoTune Agents by Y. Diao, J. L. Hellerstein, S. Parekh, J. P. Bigus Presented by Changha Lee.
Data Structures and Algorithms in Parallel Computing
International Graduate School of Dynamic Intelligent Systems, University of Paderborn Fighting Against Two Adversaries: Page Migration in Dynamic Networks.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
1 Supporting a Volume Rendering Application on a Grid-Middleware For Streaming Data Liang Chen Gagan Agrawal Computer Science & Engineering Ohio State.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Online Parameter Optimization for Elastic Data Stream Processing Thomas Heinze, Lars Roediger, Yuanzhen Ji, Zbigniew Jerzak (SAP SE) Andreas Meister (University.
IncApprox The marriage of incremental and approximate computing Pramod Bhatotia Dhanya Krishnan, Do Le Quoc, Christof Fetzer, Rodrigo Rodrigues* (TU Dresden.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Mizan:Graph Processing System
Warehouse Scaled Computers
CPS : Information Management and Mining
Miraj Kheni Authors: Toyotaro Suzumura, Koji Ueno
Guangxiang Du*, Indranil Gupta
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Performance Evaluation of Adaptive MPI
Data Structures and Algorithms in Parallel Computing
Towards Effective Partition Management for Large Graphs
Pregelix: Big(ger) Graph Analytics on A Dataflow Engine
Wei Jiang Advisor: Dr. Gagan Agrawal
Pregelix: Think Like a Vertex, Scale Like Spandex
Parallel Applications And Tools For Cloud Computing Environments
تقسیم گراف در سیستم های کلان داده مرکزیت راس
A Parallelization of State-of-the-Art Graph Bisection Algorithms
Resource Allocation for Distributed Streaming Applications
Computational Advertising and
Support for Adaptivity in ARMCI Using Migratable Objects
Parallel Exact Stochastic Simulation in Biochemical Systems
Presentation transcript:

PAGE: A Partition Aware Graph Computation Engine Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma EECS, Peking University, China

Agenda  Background Design of PAGE Experiment result Conclusion 2/19

Background Prevalent large scale graphs – Social networks – Web graph – … Graph computing systems – Pregel (Google) – Giraph (Apache) – GPS (Stanford) – GraphLab (CMU) – … 3/19

Background Graph Partitioning – Offline approach METIS (Karypis Lab) – Online approach Streaming partitioning Linear Deterministic Greedy(LDG) algorithm (I. Stanton) 4/19 Problem: The existing graph computation systems cannot efficiently integrate the high-quality graph partitioning.

Inefficient partition integrating 5/19  The high-quality graph partitioning leads to the worse overall performance.  The graph partitioning quality is improved from left to right. Running PageRank on Giraph with six different graph partition qualities.

Motivation of the PAGE Call for a novel graph computation engine to efficiently integrate graph partitioning with various qualities. 6/19

Agenda Background  Design of PAGE Experiment result Conclusion 7/19

Message processor 8/19

Inefficient partition integrating 9/19  The local message processing cost dominates the overall cost.  The existing systems cannot provide enough local message processor. Running PageRank on Giraph with six different graph partition qualities.

Overview of the PAGE PAGE applies adaptively tuning mechanism and new cooperation methods. 10/19

New Designed PAGE Worker 11/19

Dual Concurrent Message Processor First type concurrency – A remote MP and a local MP are embedded Second type concurrency – A set of message process units are contained by each message processor The concurrency is automatically determined by the system itself. 12/19

Dynamic Concurrency Control Model The DCCM determines the proper parameters, such as nmp, nmp l, nmp r. The DCCM is built on top of two heuristic rules. – Ability Lower-bound. – Workload Balance Ratio. Monitor – Tracks the necessary metrics 13/19

Agenda Background Design of PAGE  Experiment result Conclusion 14/19

Environment & Datasets Experiment Environment – a 24 nodes cluster Dataset: the uk u. – Undirected – Vertex #: 105,153,952 – Edge #: 6,603,753,128 Benchmark: PageRank SchemeEdge Cut Random98.52% LDG182.88% LDG275.69% LDG366.37% LDG456.34% METIS3.48% Partition qualities 15/19 Balance factor: < 1%.

Partition Awareness in PAGE PAGEGiraph 16/19

Compare with the naive solution 17/19 * The Giraph-GPSop is the naive solution.

Contribution & Conclusion We identify the problem of partition unaware inefficiency. We set up a new partition aware graph computation engine, PAGE. We design a Dynamic Concurrency Control Model based on several heuristic rules to better profile the characters of graph partition. At last, we demonstrate PAGE’s robustness and efficiency on different graph partition qualities. 18/19

19/19