Data Structures and Algorithms in Parallel Computing Lecture 7.

Slides:



Advertisements
Similar presentations
Partitioning Screen Space for Parallel Rendering
Advertisements

Load Balancing Parallel Applications on Heterogeneous Platforms.
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
ICS 556 Parallel Algorithms Ebrahim Malalla Office: Bldg 22, Room
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
Distributed Breadth-First Search with 2-D Partitioning Edmond Chow, Keith Henderson, Andy Yoo Lawrence Livermore National Laboratory LLNL Technical report.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Lecture 21: Spectral Clustering
CS 584. Review n Systems of equations and finite element methods are related.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Multilevel Hypergraph Partitioning G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar Computer Science Department, U of MN Applications in VLSI Domain.
BiGraph BiGraph: Bipartite-oriented Distributed Graph Partitioning for Big Learning Jiaxin Shi Rong Chen, Jiaxin Shi, Binyu Zang, Haibing Guan Institute.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Sandia National Laboratories Graph Partitioning Workshop Oct. 15, Load Balancing Myths, Fictions & Legends Bruce Hendrickson Sandia National Laboratories.
Graph Partitioning Donald Nguyen October 24, 2011.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
High Performance Computing 1 Load-Balancing. High Performance Computing 1 Load-Balancing What is load-balancing? –Dividing up the total work between processes.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
PIMA-motivation PIMA: Partition Improvement using Mesh Adjacencies  Parallel simulation requires that the mesh be distributed with equal work-load and.
CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Data Structures and Algorithms in Parallel Computing Lecture 1.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Data Structures and Algorithms in Parallel Computing Lecture 4.
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Static Process Scheduling
Data Structures and Algorithms in Parallel Computing
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
CS 420 Design of Algorithms Parallel Algorithm Design.
Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.
Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Graphs David Kauchak cs302 Spring Admin HW 12 and 13 (and likely 14) You can submit revised solutions to any problem you missed Also submit your.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning
High Performance Computing Seminar
Auburn University
Cohesive Subgraph Computation over Large Graphs
2D AFEAPI Overview Goals, Design Space Filling Curves Code Structure
Parallel Graph Algorithms
Conception of parallel algorithms
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
A Continuous Optimization Approach to the Minimum Bisection Problem
Data Structures and Algorithms in Parallel Computing
Data Structures and Algorithms in Parallel Computing
Approximating the Community Structure of the Long Tail
Integrating Efficient Partitioning Techniques for Graph Oriented Applications My dissertation work represents a study of load balancing and data locality.
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
Parallel Programming in C with MPI and OpenMP
Dynamic Load Balancing of Unstructured Meshes
Presentation transcript:

Data Structures and Algorithms in Parallel Computing Lecture 7

Parallel scientific computing How to assign work to machines? Static (at the start) vs. dynamic (during run time) Objective is to minimize total solution time Traditional approaches – Graph partitioning – Geometric partitioning Good for applications lacking graph connectivity (e.g., particle methods)

Traditional approaches Recursive Spectral Bisection – Splits vertices into groups based on eigenvectors of the Laplacian matrix associated with the graph – Slow but effective Multilevel partitioning Diffusive partitioning – Transfers work from heavily loaded processors to their more lightly loaded neighbors – Faster than multilevel but require more iterations to achieve global balance

Applications Multilevel partitioning – Effective on finite element and finite volume methods Cells are divided among processors Diffusive and geometric partitioning – Used in dynamic computations such as adaptive finite element methods – Physical locality of geometric partitioning exploited by particle methods

Beyond traditional approaches Traditional methods do not work well for higher connectivity and less homogeneity and symmetry Clustering – Used in data mining – Similarity and object-attributed based graph models – Direct (look for connected components) and partitioning (min-cut) based clustering Other methods Devine et al., Partitioning and Load Balancing for Emerging Parallel Applications and Architectures, Parallel Processing for Scientific Computing, 2006

Distributed graph processing Minimize communication costs between nodes Load balance the execution among nodes Subgraph centric reduced communication overhead by increasing local computations However efficiency depends on the graph type and partitioning technique

Graph partitioning Key component in any distributed graph processing platform Performed before running graph algorithms Reduce communication and balance computation – Partitioning depends on graph type – Sparse graphs Better load balance with reduced communication – Sparse graphs with skewed distributions Difficult to load balance with minimum communication – Dense graphs Difficult to reduce communication overhead

Applications Load balancing while minimizing communication Structured and unstructured mesh distribution for distributed memory parallel computing Sparse matrix times vector multiplication VLSI Layout Telephone network design Sparse Gaussian Elimination

Applications

1D and 2D data distribution Represent graph as sparse matrix 1D – distribute vertices with their edges to processors – Example: Parallel Boost Graph Library 2D – distribute subgraphs to processors – Reduces communication overhead – Allows higher degree of concurrency – Example: various solutions for IBM BlueGene/L and Cray machines

Balance computation vs. reduce communication Case study: Breadth First Search – communication-intensive graph computations – Used as subroutine for other sophisticated algorithms Connected components, spanning forests, testing for bipartites, maximum flows, betweenness centrality – Chosen as representative benchmark for ranking supercomputers Bluc et al., Graph Partitioning for Scalable Distributed Graph Computations, DIMACS, 2004

BFS algorithms In 2D case communication happens only along one processor dimension Buluc et al., Parallel Breadth-First Search on Distributed Memory Systems, SC, 2011

Analysis METIS 1D partitioning – balanced vertices per partition and simultaneously minimizing the number of cut edges – K-way multilevel partitioning PaToH 1D and 2D partitioning – Multilevel hypergraph partitioning

Runtime and communication

Overall conclusion Reducing work and communication imbalance among partitions is more important than minimizing the total edge cut Even well balanced vertex and edge partitions do not guarantee load-balanced execution for real-world graphs

Dynamic graphs Real world graphs are not static – Edges and vertices are constantly added/removed Twitter, trace route, social network graphs – Partitions need to be updated constantly – Repartitioning may be required to rebalance load and reduce communication Repartitioning done in parallel with the graph processing algorithm – Online, without restarting from scratch

Efficiency Vaquero et al., Adaptive Partitioning of Large-Scale Dynamic Graphs, SOCC, 2013

What’s next? Parallel sorting Parallel computational geometry Parallel numerical algorithms …