Pattern Parallel Programming

Slides:



Advertisements
Similar presentations
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Advertisements

A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
Getting Started with MPI Self Test with solution.
A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.
Parallel Programming Models and Paradigms
1 UNC-Charlotte’s Grid Computing “Seeds” framework 1 © 2011 Jeremy Villalobos /B. Wilkinson Fall 2011 Grid computing course. Slides10-1.ppt Modification.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Parallel Computing Presented by Justin Reschke
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Suzaku Pattern Programming Framework (a) Structure and low level patterns © 2015 B. Wilkinson Suzaku.pptx Modification date February 22,
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Pattern Programming PP-1.1 ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, August 29A, 2013 PatternProg-1.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Introduction to Parallel Processing
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Lecture 1 – Parallel Programming Primer
Dr. Barry Wilkinson University of North Carolina Charlotte
Distributed Shared Memory
Spark Presentation.
Parallel Programming By J. H. Wang May 2, 2017.
Pattern Parallel Programming
Computer Engg, IIT(BHU)
The University of Adelaide, School of Computer Science
Constructing a system with multiple computers or processors
Parallel Algorithm Design
Many-core Software Development Platforms
Using compiler-directed approach to create MPI code automatically
Chapter 4: Threads.
Programming with Shared Memory
Programming with Parallel Design Patterns
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.
What is Concurrent Programming?
Shared Memory Programming
CS 584.
Constructing a system with multiple computers or processors
Pattern Programming Tools
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Dr. Tansel Dökeroğlu University of Turkish Aeronautical Association Computer Engineering Department Ceng 442 Introduction to Parallel.
Dr. Barry Wilkinson University of North Carolina Charlotte
Department of Intelligent Systems Engineering
Programming with Shared Memory
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.
Chapter 4: Threads & Concurrency
Overview of Workflows: Why Use Them?
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
Synchronizing Computations
COMP60611 Fundamentals of Parallel and Distributed Systems
Types of Parallel Computers
Mattan Erez The University of Texas at Austin
Presentation transcript:

Pattern Parallel Programming B. Wilkinson/Clayton Ferner PatternProgIntro.ppt Modification date: August 14, 2014 1 1

Problem Addressed To make parallel programming more useable and scalable. Parallel programming -- writing programs using multiple computers and processors collectively to solve problems -- has a very long history but still a challenge. 2

Traditional approach Need a better structured approach. Explicitly specifying message-passing (MPI), and Low-level threads APIs (Pthreads, Java threads, OpenMP, …). Both require programmers to use low-level routines Need a better structured approach. 3

Pattern Programming Concept Programmer begins by constructing his program using established computational or algorithmic “patterns” that provide a structure. “Design patterns” part of software engineering for many years: Reusable solutions to commonly occurring problems * Patterns provide guide to “best practices”, not a final implementation Provides good scalable design structure Can reason more easily about programs Potential for automatic conversion into executable code avoiding low-level programming – We do that here. Particularly useful for the complexities of parallel/distributed computing * http://en.wikipedia.org/wiki/Design_pattern_(computer_science)

In parallel/distributed computing, what patterns are we talking about? Low-level algorithmic patterns that might be embedded into a program such as fork-join, broadcast/scatter/gather. Higher level algorithm patterns for forming a complete program such as workpool, pipeline, stencil, map-reduce. We tend to concentrate upon higher-level “computational/algorithm ” level patterns rather than lower level patterns.

MPI point-to-point Data Transfer (Send-Receive) Low level MPI message-passing patterns MPI point-to-point Data Transfer (Send-Receive) Source Destination Data

Collective patterns Broadcast Pattern Sends same data to each of a group of processes. A common pattern to get same data to all processes, especially at beginning of a computation Destinations Same data sent to all destinations Source Note: Patterns given do not mean the implementation does them as shown. Only the final result is the same in any parallel implementation. Patterns do not describe the implementation.

Scatter Pattern Distributes a collection of data items to a group of processes. A common pattern to get data to all processes. Usually data sent are parts of an array Different data sent to each destinations Source Destinations

Gather pattern Sources Destination Essentially reverse of scatter pattern. It receives data items from a group of processes Data Data Data collected at destination in an array Data Common pattern especially at the end of a computation to collect results.

Reduce Pattern Sources Destination A common pattern to get data back to master from all processes and then aggregate it by combining collected data into one answer. Reduction operation must be a binary operation that is commutative (changing the order of the operands does not change the result) Data Data Data collected at destination and combined to get one answer with a commutative operation Data Needs to be commutative operation to allow the implementation to do the operations in any order.

Collective all-to-all broadcast Sources and destinations are the same processes Sources Destinations A common all-to-all pattern, often within a computation, is to send data from all processes to all processes often within a computation Every process sends data to every other process (one-way) Versions of this can be found in MPI.

Some Higher Level Message-Passing Patterns Slaves Master/slave Master Two-way connection Computation divided into parts, which are then passed out to slaves to perform and return their results, basis of most parallel computing Compute node Source/sink

Workpool Very widely applicable pattern Slaves/Workers   Master Task from task queue Another task if task queue not empty Aggregate answers Slaves/Workers Result Task queue Very widely applicable pattern Once a slave completes a task, slave given another task from task queue master -- load-balancing quality. Need to differentiate between master-slave pattern, which does not imply a task queue.

More Specialized High-level Patterns Pipeline Stage 1 Stage 2 Stage 3 Workers One-way connection Two-way connection Master Compute node Source/sink

Divide and Conquer Divide Merge Two-way connection Compute node Source/sink

All-to-All All compute nodes can communicate with all the other nodes Two-way connection Compute node Source/sink Master

Iterative synchronous patterns When a pattern is repeated until some termination condition occurs. Synchronization at each iteration, to establish termination condition, often a global condition. Note this is two patterns merged together sequentially if we call iteration a pattern. We actually have a pattern operator that can do this Pattern Check termination condition Repeat Stop

Iterative synchronous all-to-all pattern Repeat Stop Check termination condition Example: N-body problem needs an “iterative synchronous all-to-all” pattern, where on each iteration all processes exchange data with each other. 18

Stencil All compute nodes can communicate with only neighboring nodes Usually a synchronous computation - Performs number of iterations to converge on solution, e.g. solving Laplace’s/heat equation On each iteration, each node communicates with neighbors to get stored computed values Two-way connection Compute node Source/sink

Parallel Patterns -- Advantages Abstracts/hides underlying computing environment Generally avoids deadlocks and race conditions Reduces source code size (lines of code) Leads to automated conversion into parallel programs without need to write with low level message-passing routines such as MPI. Hierarchical designs with patterns embedded into patterns, and pattern operators to combine patterns. Disadvantages New approach to learn Takes away some of the freedom from programmer Performance reduced (c.f. using high level languages instead of assembly language)

Previous/Existing Work Patterns explored in several projects. Industrial efforts Intel Threading Building Blocks (TBB), Cilk plus, Array Building Blocks (ArBB). Focus on very low level patterns such as fork-join Universities: University of Illinois at Urbana-Champaign and University of California, Berkeley University of Torino/Università di Pisa Italy “Structured Parallel Programming: Patterns for Efficient Computation,” Michael McCool, James Reinders, Arch Robison, Morgan Kaufmann, 2012 Intel tools, TBB, Cilk, ArBB

Note on Terminology “Skeletons” Sometimes term “skeleton” used to describe “patterns”, especially directed acyclic graphs with a source, a computation, and a sink. We do not make that distinction and use the term “pattern” whether directed or undirected and whether acyclic or cyclic. This is done elsewhere.

We focus on higher-level tools to avoid using low level routines. We have developed several tools at different levels of abstraction that avoid using low level MPI and enable students to create working patterns very quickly. Seeds framework – high-level Java-based software that self deploys and executes on any platform, local computers or distributed computers. Several patterns implemented including workpool, pipeline, synchronous iterative all-to-all, stencil, … Paraguin compiler – C-based compiler directive approach that creates MPI code. Patterns implemented include scatter-gather for a master slave pattern, stencil, … Suzaku framework – provides pre-written pattern-based routines and macros that hide the MPI code. At an early stage of development.

Acknowledgements The Seeds framework was developed by Jeremy Villalobos in his PhD thesis “Running Parallel Applications on a Heterogeneous Environment with Accessible Development Practices and Automatic Scalability,” UNC-Charlotte, 2011. Extending work to teaching environment supported by the National Science Foundation under grant "Collaborative Research: Teaching Multicore and Many-Core Programming at a Higher Level of Abstraction" #1141005/1141006 (2012-2015). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. 24

Questions