1 Keshav Pingali University of Texas, Austin Introduction to parallelism in irregular algorithms.

Slides:



Advertisements
Similar presentations
1 Parallelizing Irregular Applications through the Exploitation of Amorphous Data-parallelism Keshav Pingali (UT, Austin) Mario Mendez-Lojo (UT, Austin)
Advertisements

Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Graph Algorithms Carl Tropper Department of Computer Science McGill University.
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
Parallel and Distributed Simulation Time Warp: Basic Algorithm.
Parallel and Distributed Simulation Lookahead Deadlock Detection & Recovery.
Lookahead. Outline Null message algorithm: The Time Creep Problem Lookahead –What is it and why is it important? –Writing simulations to maximize lookahead.
Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.
ParaMeter: A profiling tool for amorphous data-parallel programs Donald Nguyen University of Texas at Austin.
Minimum Spanning Trees
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
DATAFLOW PROCESS NETWORKS Edward A. Lee Thomas M. Parks.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Galois System Tutorial Donald Nguyen Mario Méndez-Lojo.
Structure-driven Optimizations for Amorphous Data-parallel Programs 1 Mario Méndez-Lojo 1 Donald Nguyen 1 Dimitrios Prountzos 1 Xin Sui 1 M. Amber Hassaan.
Galois System Tutorial Mario Méndez-Lojo Donald Nguyen.
The of Parallelism in Algorithms Keshav Pingali The University of Texas at Austin Joint work with D.Nguyen, M.Kulkarni, M.Burtscher, A.Hassaan, R.Kaleem,
Galois Performance Mario Mendez-Lojo Donald Nguyen.
1 Keshav Pingali University of Texas, Austin Towards a Science of Parallel Programming.
The Galois Project Keshav Pingali University of Texas, Austin Joint work with Milind Kulkarni, Martin Burtscher, Patrick Carribault, Donald Nguyen, Dimitrios.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
CSE 589 Applied Algorithms Course Introduction. CSE Lecture 1 - Spring Instructors Instructor –Richard Ladner –206.
A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.
Graph Algorithms. Overview Graphs are very general data structures – data structures such as dense and sparse matrices, sets, multi-sets, etc. can be.
Graph Algorithms. Overview Graphs are very general data structures – data structures such as dense and sparse matrices, sets, multi- sets, etc. can be.
Keshav Pingali The University of Texas at Austin Parallel Program = Operator + Schedule + Parallel data structure SAMOS XV Keynote.
L21: “Irregular” Graph Algorithms November 11, 2010.
Graph Partitioning Donald Nguyen October 24, 2011.
Elixir : A System for Synthesizing Concurrent Graph Programs
A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 [pdf]pdf.
June 21, 2007 Minimum Interference Channel Assignment in Multi-Radio Wireless Mesh Networks Anand Prabhu Subramanian, Himanshu Gupta.
Message-Passing for Wireless Scheduling: an Experimental Study Paolo Giaccone (Politecnico di Torino) Devavrat Shah (MIT) ICCCN 2010 – Zurich August 2.
A GPU Implementation of Inclusion-based Points-to Analysis Mario Méndez-Lojo (AMD) Martin Burtscher (Texas State University, USA) Keshav Pingali (U.T.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Data-parallel Abstractions for Irregular Applications Keshav Pingali University of Texas, Austin.
1 Keshav Pingali University of Texas, Austin Operator Formulation of Irregular Algorithms.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
Keshav Pingali Department of Computer Science University of Texas, Austin The Hunt for Right Abstractions.
1 Shape Segmentation and Applications in Sensor Networks Xianjin Xhu, Rik Sarkar, Jie Gao Department of CS, Stony Brook University INFOCOM 2007.
Implementing Parallel Graph Algorithms Spring 2015 Implementing Parallel Graph Algorithms Lecture 2: Introduction Roman Manevich Ben-Gurion University.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Parallel Routing for FPGAs based on the operator formulation
Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley,
Graph Algorithms Gayathri R To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
Barnes Hut N-body Simulation Martin Burtscher Fall 2009.
Roman Manevich Rashid Kaleem Keshav Pingali University of Texas at Austin Synthesizing Concurrent Graph Data Structures: a Case Study.
Parallel Data Structures. Story so far Wirth’s motto –Algorithm + Data structure = Program So far, we have studied –parallelism in regular and irregular.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
Clock Synchronization (Time Management) Deadlock Avoidance Using Null Messages.
Parallel and Distributed Simulation Deadlock Detection & Recovery.
Some Computational Science Algorithms and Data Structures Keshav Pingali University of Texas, Austin.
Barnes Hut N-body Simulation Martin Burtscher Fall 2009.
Mesh Generation, Refinement and Partitioning Algorithms Xin Sui The University of Texas at Austin.
PDES Introduction The Time Warp Mechanism
Graphs Representation, BFS, DFS
Parallel and Distributed Simulation
Parallel Programming By J. H. Wang May 2, 2017.
Short paths and spanning trees
CSE 373 Data Structures and Algorithms
Parallel and Distributed Simulation
Fifty Years of Parallel Programming: Ieri, Oggi, Domani
CSE 417: Algorithms and Computational Complexity
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Parallel Data Structures
Presentation transcript:

1 Keshav Pingali University of Texas, Austin Introduction to parallelism in irregular algorithms

2 Examples Application/domainAlgorithm MeshingGeneration/refinement/partitioning CompilersIterative and elimination-based dataflow algorithms Functional interpretersGraph reduction, static and dynamic dataflow MaxflowPreflow-push, augmenting paths Minimal spanning treesPrim, Kruskal, Boruvka Event-driven simulationChandy-Misra-Bryant, Jefferson Timewarp AIMessage-passing algorithms SAT solversSurvey propagation, Bayesian inference Sparse linear solversSparse MVM, sparse Cholesky factorization

3 Delaunay Mesh Refinement Iterative refinement to remove badly shaped triangles: while there are bad triangles do { Pick a bad triangle; Find its cavity; Retriangulate cavity; // may create new bad triangles } Don’t-care non-determinism: –final mesh depends on order in which bad triangles are processed –applications do not care which mesh is produced Data structure: –graph in which nodes represent triangles and edges represent triangle adjacencies Parallelism: –bad triangles with cavities that do not overlap can be processed in parallel –parallelism is dependent on runtime values compilers cannot find this parallelism –(Miller et al) at runtime, repeatedly build interference graph and find maximal independent sets for parallel execution

4 Event-driven simulation Stations communicate by sending messages with time-stamps on FIFO channels Stations have internal state that is updated when a message is processed Messages must be processed in time- order at each station Data structure: –Messages in event-queue, sorted in time- order Parallelism: –Jefferson time-warp station can fire when it has an incoming message on any edge requires roll-back if speculative conflict is detected –Chandy-Misra-Bryant station fires when it has messages on all incoming edges and processes earliest message requires null messages to avoid deadlock 2 5 A B 3 4 C 6

5 Remarks on algorithms Diverse algorithms and data structures Exploiting parallelism in irregular algorithms is very complex –Miller et al DMR implementation: interference graph + maximal independent sets –Jefferson Timewarp algorithm for event-driven simulation Algorithms: –parallelism can be dependent on runtime values DMR, event-driven simulation,… –don’t-care non-determinism nothing to do with concurrency DMR –activities created in the future may interfere with current activities event-driven simulation… Data structures: –graphs, trees, lists, priority queues,…

6 i1i1 i2i2 i3i3 i4i4 i5i5 Operator formulation of algorithms Algorithm = repeated application of operator to graph –active element: node or edge where computation is needed –DMR: nodes representing bad triangles –Event-driven simulation: station with incoming message –Jacobi: interior nodes of mesh –neighborhood: set of nodes and edges read/written to perform computation –DMR: cavity of bad triangle –Event-driven simulation: station –Jacobi: nodes in stencil distinct usually from neighbors in graph –ordering: order in which active elements must be executed in a sequential implementation –any order (Jacobi,DMR, graph reduction) –some problem-dependent order (event- driven simulation) : active node : neighborhood

7 Parallelism Amorphous data-parallelism –active nodes can be processed in parallel, subject to neighborhood constraints ordering constraints Computations at two active elements are independent if –Neighborhoods do not overlap –More generally, neither of them writes to an element in the intersection of the neighborhoods Unordered active elements –Independent active elements can be processed in parallel –How do we find independent active elements? Ordered active elements –Independence is not enough –How do we determine what is safe to execute w/o violating ordering? i1i1 i2i2 i3i3 i4i4 i5i5 2 5 A B C

8 Galois programming model (PLDI 2007) Program written in terms of abstractions in model Programming model: sequential, OO Graph class: provided by Galois library –specialized versions to exploit structure (see later) Galois set iterators: for iterating over unordered and ordered sets of active elements –for each e in Set S do B(e) evaluate B(e) for each element in set S no a priori order on iterations set S may get new elements during execution –for each e in OrderedSet S do B(e) evaluate B(e) for each element in set S perform iterations in order specified by OrderedSet set S may get new elements during execution Mesh m = /* read in mesh */ Set ws; ws.add(m.badTriangles()); // initialize ws for each tr in Set ws do { //unordered Set iterator if (tr no longer in mesh) continue; Cavity c = new Cavity(tr); c.expand(); c.retriangulate(); m.update(c); ws.add(c.badTriangles()); //bad triangles } DMR using Galois iterators

9 iterative algorithms topology operator ordering morph: modifies structure of graph local computation: only updates values on nodes/edges reader: does not modify graph in any way general graph grid tree unordered ordered Algorithm abstractions Jacobi: topology: grid, operator: local computation, ordering: unordered DMR: topology: graph, operator: morph, ordering: unordered Event-driven simulation: topology: graph, operator: local computation, ordering: ordered