A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.

Slides:

Advertisements

Similar presentations

Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.

Advertisements

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.

IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.

ParaMeter: A profiling tool for amorphous data-parallel programs Donald Nguyen University of Texas at Austin.

Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.

CSE 101- Winter ‘15 Discussion Section January 26th 2015.

Breadth-First Search Seminar – Networking Algorithms CS and EE Dept. Lulea University of Technology 27 Jan Mohammad Reza Akhavan.

1 Graphs: Traversal Searching/Traversing a graph = visiting the vertices of a graph by following the edges in a systematic way Example: Given a highway.

CSCI 3160 Design and Analysis of Algorithms Tutorial 2 Chengyu Lin.

Distributed Graph Processing Abhishek Verma CS425.

Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.

The of Parallelism in Algorithms Keshav Pingali The University of Texas at Austin Joint work with D.Nguyen, M.Kulkarni, M.Burtscher, A.Hassaan, R.Kaleem,

Chapter 23 Minimum Spanning Trees

Tirgul 12 Algorithm for Single-Source-Shortest-Paths (s-s-s-p) Problem Application of s-s-s-p for Solving a System of Difference Constraints.

Computer Science Lecture 6, page 1 CS677: Distributed OS Processes and Threads Processes and their scheduling Multiprocessor scheduling Threads Distributed.

Graph Algorithms: Shortest Path We are given a weighted, directed graph G = (V, E), with weight function w: E R mapping.

Shortest Path Algorithms

Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.

More Graph Algorithms Weiss ch Exercise: MST idea from yesterday Alternative minimum spanning tree algorithm idea Idea: Look at smallest edge not.

Dijkstra’s Algorithm Slide Courtesy: Uwash, UT 1.

Graph Algorithms. Overview Graphs are very general data structures – data structures such as dense and sparse matrices, sets, multi-sets, etc. can be.

Dijkstra's algorithm.

Graph Algorithms. Overview Graphs are very general data structures – data structures such as dense and sparse matrices, sets, multi- sets, etc. can be.

Pregel: A System for Large-Scale Graph Processing

Keshav Pingali The University of Texas at Austin Parallel Program = Operator + Schedule + Parallel data structure SAMOS XV Keynote.

1 Shortest Path Algorithms Andreas Klappenecker [based on slides by Prof. Welch]

Chapter 9 – Graphs A graph G=(V,E) – vertices and edges

Dijkstras Algorithm Named after its discoverer, Dutch computer scientist Edsger Dijkstra, is an algorithm that solves the single-source shortest path problem.

Elixir : A System for Synthesizing Concurrent Graph Programs

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.

Ligra: A Lightweight Graph Processing Framework for Shared Memory

Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,

X-Stream: Edge-Centric Graph Processing using Streaming Partitions

GRAPH PROCESSING Hi, I am Mayank and the second presenter for today is Shadi. We will be talking about Graph Processing.

Implementing Parallel Graph Algorithms Spring 2015 Implementing Parallel Graph Algorithms Lecture 3: Galois Primer Roman Manevich Ben-Gurion University.

Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.

Graphs. What is a graph? A data structure that consists of a set of nodes (vertices) and a set of edges that relate the nodes to each other The set of.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.

1 Keshav Pingali University of Texas, Austin Introduction to parallelism in irregular algorithms.

LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:

Parallel graph algorithms Antonio-Gabriel Sturzu, SCPD Adela Diana Almasi, SCPD Adela Diana Almasi, SCPD Iulia Alexandra Floroiu, ISI Iulia Alexandra Floroiu,

Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.

Shortest Paths and Dijkstra’s Algorithm CS 105. SSSP Slide 2 Single-source shortest paths Given a weighted graph G and a source vertex v in G, determine.

Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.

Is Your Graph Algorithm Eligible for Nondeterministic Execution? Zhiyuan Shao, Lin Hou, Yan Ai, Yu Zhang and Hai Jin Services Computing Technology and.

Data Structures and Algorithms in Parallel Computing Lecture 3.

Data Structures and Algorithms in Parallel Computing Lecture 7.

Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.

Suppose G = (V, E) is a directed network. Each edge (i,j) in E has an associated ‘length’ c ij (cost, time, distance, …). Determine a path of shortest.

Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From

Dynamic Load Balancing Tree and Structured Computations.

Chenning Xie+, Rong Chen+, Haibing Guan*, Binyu Zang+ and Haibo Chen+

TensorFlow– A system for large-scale machine learning

Pagerank and Betweenness centrality on Big Taxi Trajectory Graph

Parallel Programming By J. H. Wang May 2, 2017.

Roshan Dathathri Gurbinder Gill Loc Hoang

Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs

Slide Courtesy: Uwash, UT

Brad Karp UCL Computer Science

Slide Courtesy: Uwash, UT

Graph Algorithms: Shortest Path

TensorFlow: A System for Large-Scale Machine Learning

Parallel Programming in C with MPI and OpenMP

Phoenix: A Substrate for Resilient Distributed Graph Analytics

Relax and Adapt: Computing Top-k Matches to XPath Queries

Loc Hoang Roshan Dathathri Gurbinder Gill Keshav Pingali

DistTC: High Performance Distributed Triangle Counting

Gurbinder Gill Roshan Dathathri Loc Hoang Keshav Pingali

Presentation transcript:

A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin

What is Graph Analytics? Algorithms to compute properties of graphs – Connected components, shortest paths, centrality measures, diameter, PageRank, … Many applications – Google, path routing, friend recommendations, network analysis Difficult to implement on a large scale – Data sets are large, data accesses are irregular – Need parallelism and efficient runtimes Bridges of Konigsberg The Social Network

Contributions A lightweight infrastructure for graph analytics based on improvements to the Galois system – Example: scalable priority scheduling Orders of magnitude faster than third-party graph DSLs for many applications and inputs – Galois permits better algorithms to be written – Galois infrastructure more scalable 3

Outline 1.Abstraction for graph analytics applications and implementation in Galois system 2.One piece of infrastructure: priority scheduling 3.Evaluation of Galois system versus graph analytics DSLs 4

Example: SSSP Find the shortest distance from source node to all other nodes in a graph – Label nodes with tentative distance – Assume non-negative edge weights Algorithms – Chaotic relaxation O(2 V ) – Bellman-Ford O(VE) – Dijkstra’s algorithm O(E log V) Uses priority queue – Δ-stepping Uses sequence of bags to prioritize work Δ=1, O(E log V) Δ=∞, O(VE) Different algorithms are different schedules for applying relaxations – SSSP needs priority scheduling for work efficiency ∞ ∞ 0 Active edge Neighborhood Activity Operator Edge relaxation ∞ 2 5

Parallel Program = Operator + Schedule + Parallel Data Structure What is the operator? – Ligra, PowerGraph: only vertex programs – Galois: Unrestricted, may even morph graph by adding/removing nodes and edges Where/When does it execute? – Autonomous scheduling: activities execute transactionally – Coordinated scheduling: activities execute in rounds Read values refer to previous rounds Multiple updates to the same location are resolved with reduction, etc. Algorithm b a c 6

Galois System Parallel Program = Operator + Schedule + Parallel Data Structure OperatorsSchedulesData Structure API Calls SchedulersData Structures AllocatorThread Primitives Multicore User Program Galois System 7

Outline 1.Abstraction for graph analytics applications and implementation in Galois system 2.One piece of infrastructure: priority scheduling 3.Evaluation of Galois system versus DSLs 8

Galois Infrastructure for Scheduling Library of different schedulers – FIFO, LIFO, Priority, … – DSL for specifying scheduling policies Nguyen and Pingali (ASPLOS ’11) 9

Priority Scheduling Concurrent priority queue – Implemented in TBB, java.concurrent – Does not scale: our tasks are very fine-grain Sequence of bags – One bag per priority level – High overhead for finding non-empty, urgent- priority bags 10

Ordered by Metric (OBIM) Sparse sequence of Bags – One Bag per priority level – To reduce synchronization Sequence is replicated among cores Bag is distributed between cores – To reduce overhead of finding work A reduction tree over machine topology to get estimate of current urgent priority 11

Scheduling Bag c1c1 c2c2 Head Package 1 12 Bag c1c1 c2c2 Head Package 2

Scheduling Bag c1c1 c2c2 Head Package 1 13 Bag c1c1 c2c2 Head Package 2

Priority Map 01 Global Map Local Maps Core 1 Core 2 Scheduling Bags

Outline 1.Abstraction for graph analytics applications and implementation in Galois system 2.One piece of infrastructure: priority scheduling 3.Evaluation of Galois system versus graph analytics DSLs 15

Graph Analytics DSLs Easy to implement their APIs on top of Galois system – Galois implementations called PowerGraph-g, Ligra-g, etc. – About lines of code each 16  GraphLabLow et al. (UAI ’10)  PowerGraphGonzalez et al. (OSDI ’12)  GraphChiKyrola et al. (OSDI ’12)  LigraShun and Blelloch (PPoPP ’13)  PregelMalewicz et al. (SIGMOD ‘10)  …

Evaluation Platform – 40-core system 4 socket, Xeon E (Westmere) – 128 GB RAM Applications – Breadth-first search (bfs) – Connected components (cc) – Approximate diameter (dia) – PageRank (pr) – Single-source shortest paths (sssp) Inputs – twitter50 (50 M nodes, 2 B edges, low-diameter) – road (20 M nodes, 60 M edges, high-diameter) Comparison with – Ligra (shared memory) – PowerGraph (distributed) Runtimes with core machines (1024 cores) does not beat one 40-core machine using Galois 17

18

19

The best algorithm may require application- specific scheduling – Priority scheduling for SSSP The best algorithm may not be expressible as a vertex program – Connected components with union-find Autonomous scheduling required for high- diameter graphs – Coordinated scheduling uses many rounds and has too much overhead 20

See Paper For More inputs, applications Detailed discussion of performance results More infrastructure: non-priority scheduling, memory management Results for out-of-core DSLs 21

Summary Galois programming model is general – Permits efficient algorithms to be written Galois infrastructure is lightweight Simpler graph DSLs can be layered on top – Can perform better than existing systems Download at 22