Pattern Programming Barry Wilkinson University of North Carolina Charlotte CCI Friday Seminar Series April 13 th, 2012.

Slides:

Advertisements

Similar presentations

MPI Message Passing Interface

Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.

Parallel Programming Models and Paradigms Prof. Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Lab. The University of Melbourne, Australia.

Toward using higher-level abstractions to teach Parallel Computing 5/20/2013 (c) Copyright 2013 Clayton S. Ferner, UNC Wilmington1 Clayton Ferner, University.

Types of Parallel Computers

George Blank University Lecturer. CS 602 Java and the Web Object Oriented Software Development Using Java Chapter 4.

1 Short Course on Grid Computing Jornadas Chilenas de Computación 2010 INFONOR-CHILE 2010 November 15th - 19th, 2010 Antofagasta, Chile Dr. Barry Wilkinson.

A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.

1 Short Course on Grid Computing Jornadas Chilenas de Computación 2010 INFONOR-CHILE 2010 November 15th - 19th, 2010 Antofagasta, Chile Dr. Barry Wilkinson.

Parallel Programming Models and Paradigms

1 Teaching Grid Computing across North Carolina and Beyond Dr. Clayton Ferner University of North Carolina Wilmington Dr. Barry Wilkinson University of.

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.

1 UNC-Charlotte’s Grid Computing “Seeds” framework 1 © 2011 Jeremy Villalobos /B. Wilkinson Fall 2011 Grid computing course. Slides10-1.ppt Modification.

1 Short Course on Grid Computing Jornadas Chilenas de Computación 2010 INFONOR-CHILE 2010 November 15th - 19th, 2010 Antofagasta, Chile Dr. Barry Wilkinson.

Course Instructor: Aisha Azeem

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Csinparallel.org Patterns and Exemplars: Compelling Strategies for Teaching Parallel and Distributed Computing to CS Undergraduates Libby Shoop Joel Adams.

SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.

Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.

Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Design patterns. What is a design pattern? Christopher Alexander: «The pattern describes a problem which again and again occurs in the work, as well as.

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.

ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006outline.1 ITCS 4145/5145 Parallel Programming (Cluster Computing) Fall 2006 Barry Wilkinson.

Pattern Programming Barry Wilkinson University of North Carolina Charlotte Computer Science Colloquium University of North Carolina at Greensboro September.

ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 3, 2011outline.1 ITCS 6010/8010 Topics in Computer Science: GPU Programming for High Performance.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.

1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.

ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, Dec 26, 2012outline.1 ITCS 4145/5145 Parallel Programming Spring 2013 Barry Wilkinson Department.

1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Pattern Programming with the Seeds Framework © 2013 B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 31 intro.ppt Modification date: Feb 17,

A Pattern Language for Parallel Programming Beverly Sanders University of Florida.

Parallel Computing Presented by Justin Reschke

Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.

Suzaku Pattern Programming Framework (a) Structure and low level patterns © 2015 B. Wilkinson Suzaku.pptx Modification date February 22,

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

Pattern Programming PP-1.1 ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, August 29A, 2013 PatternProg-1.

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,

Pattern Programming Seeds Framework Notes on Assignment 1 PP-2.1 ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, August 30, 2012 PatternProg-2.

Dr. Barry Wilkinson University of North Carolina Charlotte

Pattern Parallel Programming

Constructing a system with multiple computers or processors

Stencil Pattern A stencil describes a 2- or 3- dimensional layout of processes, with each process able to communicate with its neighbors. Appears in simulating.

Using compiler-directed approach to create MPI code automatically

Dr. Barry Wilkinson © B. Wilkinson Modification date: Jan 9a, 2014

Pattern Parallel Programming

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

Programming with Parallel Design Patterns

B. Wilkinson/Clayton Ferner Seeds.ppt Modification date August

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.

Pipelined Pattern This pattern is implemented in Seeds, see

Shared Memory Programming

Constructing a system with multiple computers or processors

© B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 31 session2a

Dr. Barry Wilkinson University of North Carolina Charlotte

Notes on Assignment 3 OpenMP Stencil Pattern

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.

Charles Tappert Seidenberg School of CSIS, Pace University

Chapter 4: Threads & Concurrency

Pattern Programming Seeds Framework Workpool Assignment 1

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Jan 28,

Types of Parallel Computers

Presentation transcript:

Pattern Programming Barry Wilkinson University of North Carolina Charlotte CCI Friday Seminar Series April 13 th, 2012

Acknowledgment This work was initiated by Jeremy Villalobos and described in his PhD thesis: “RUNNING PARALLEL APPLICATIONS ON A HETEROGENEOUS ENVIRONMENT WITH ACCESSIBLE DEVELOPMENT PRACTICES AND AUTOMATIC SCALABILITY,” UNC-Charlotte,

Problem Addressed To make parallel programming more useable and scalable. Parallel programming is writing programs for solving problems using multiple computers, processors and cores, including with physically distributed computers. 3 A very long history but still a challenge. Traditional approaches involve explicitly specifying message-passing with low- level tools such as MPI and thread parallelism with OpenMP and OpenCL. Still not mainstream as it should be with the introduction of multicore processors

Pattern Programming Concept Programmer constructs his application using established computational or algorithmic “patterns” that provide a structure. Patterns are widely applicable. 4 What patterns are we talking about? Low-level algorithmic patterns that might be embedded into a program such as fork-join, broadcast/scatter/gather. Higher level algorithm patterns for forming a complete program such as workpool, pipeline, stencil, map-reduce. We concentrate upon higher-level “computational/algorithm ” level patterns rather than lower level patterns.

Some Patterns 5 Workers Workpool Master Two-way connection Compute node Source/sink Derived from Jeremy Villalobos’s PhD thesis defense

6 Workers Pipeline Master Two-way connection Compute node Source/sink One-way connection Stage 1Stage 3Stage 2

7 Divide and Conquer Divide Two-way connection Compute node Source/sink Merge

8 Stencil Two-way connection Compute node Source/sink Synchronous

9 All-to-All Two-way connection Compute node Source/sink

Sometimes term “skeleton” used to describe “patterns”, especially directed acyclic graphs with a source, a computation, and a sink. We do not make that distinction and use the term “pattern” whether directed or undirected and whether acyclic or cyclic. This is done elsewhere. 10 Note on Terminology “Skeletons”

Skeletons/Patterns Advantages Implicit parallelization Avoid deadlocks Avoid race conditions Reduction in source code size (lines of code) Abstracts the Grid/Cloud environment Disadvantages Takes away some of the freedom from the user programmer New approach to learn Performance reduced (5% on top of MPI) Derived from Jeremy Villalobos’s PhD thesis defense 11

“Design patterns” have been part of software engineering for many years.... –Reusable solutions to commonly occurring problems * –Patterns provide guide to “best practices”, not a final implementation –Provides good scalable design structure to parallel programs –Can reason more easier about programs Hierarchical designs with patterns embedded into patterns, and pattern operators to combine patterns. Leads to an automated conversion into parallel programs without need to write with low level message-passing routines such as MPI. 12 More Advantages/Notes *

Previous/Existing Work Patterns/skeletons explored in several projects. Universities: –University of Illinois at Urbana-Champaign and University of California, Berkeley –University of Torino/Università di Pisa Italy –... Industrial efforts –Intel –Microsoft –… 13

University of Illinois at Urbana-Champaign and University of California, Berkeley with Microsoft and Intel in 2008 (with combined funding of at least $35 million). –Co-developed OPL (Our Pattern Language), a pattern language for parallel programming –Promoted patterns in Workshop on Parallel Programming Patterns, ParaPLoP 2009, 2010, and 2011, and Pattern Languages for Programs conference PloP’ Universal Parallel Computing Research Centers (UPCRC)

Group of twelve computational patterns identified: Finite State Machines Circuits Graph Algorithms Structured Grid Dense Matrix Sparse Matrix in seven general application areas 15 Spectral (FFT) Dynamic Programming Particle Methods Backtrack Graphical Models Unstructured Grid UPCRC Patterns

Closest to our work s.di.unipi.it/do kuwiki/doku.p hp?id=ffname space:about University of Torino, Italy /Università di Pisa 16

Intel Focused on very low level patterns such as fork-join, and provides constructs for them in: Intel Threading Building Blocks (TBB) –Template library for C++ to support parallelism Intel Cilk plus –Compiler extensions for C/C++ to support parallelism Intel Array Building Blocks (ArBB) –Pure C++ library-based solution for vector parallelism Above are somewhat competing tools obtained through takeovers of small companies. Each implemented differently. 17

New book due out 2012 from Intel authors “Structured Parallel Programming: Patterns for Efficient Computation,” Michael McCool, James Reinders, Arch Robison, Morgan Kaufmann, 2012 Focuses on Intel tools B. Wilkinson was a reviewer for this book.

19 Using patterns with Microsoft C# ails.aspx?displaylang=en&id=19222 Again very low-level with patterns such as parallel for loops.

Our approach (Jeremy Villalobos’ UNC-C PhD thesis) Focuses on a few patterns of wide applicability (workpool, pipelined, stencil, and dense matrix patterns) but Jeremy took it much further than UPCRC and Intel. He developed a higher-level framework called “Seeds” Uses pattern approach to automatically distribute code across geographical sites and execute the parallel code. 20

“Seeds” Parallel Grid Application Framework 21 Some Key Features Pattern-programming (Java) user interface Self-deploys on computers, clusters, and geographically distributed computers Load balances Three levels of user interface

Seeds Development Layers Basic Intended for programmers that have basic parallel computing background Based on skeletons and patterns Advanced: Used to add or extend functionality such as: Create new patterns Optimize existing patterns or Adapt existing pattern to non-functional requirements specific to the application Expert: Used to provide basic services: Deployment Security Communication/Connectivity Changes in the environment 22 Derived from Jeremy Villalobos’s PhD thesis defense

Deployment Deployment with Globus –Globus GSIFTP to transfer “seeds” folder –GRAM to submit job to run seed nodes Deployment with SSH –now preferred –Globus will be depreciated in Seeds 23

Basic User Programmer Interface 24 To create and execute parallel programs, programmer selects a pattern and implements three principal Java methods: Diffuse method – to distribute pieces of data. Compute method – the actual computation Gather method – used to gather the results Programmer also has to fill in details in a “bootstrap” class to deploy and start the framework. Diffuse Compute Gather Bootstrap class The framework self-deploys on a geographically distributed platform and executes pattern.

25 Basis on Monte Carlo calculations is use of random selections In this case, circle formed with a square Points within square chosen randomly Fraction of points within circle =  /4 Only one quadrant used in code Monte Carlo  calculation Example: Deploy a workpool pattern to compute  using Monte Carlo method

package edu.uncc.grid.example.workpool; import java.util.Random; import java.util.logging.Level; import edu.uncc.grid.pgaf.datamodules.Data; import edu.uncc.grid.pgaf.datamodules.DataMap; import edu.uncc.grid.pgaf.interfaces.basic.Workpool; import edu.uncc.grid.pgaf.p2p.Node; public class MonteCarloPiModule extends Workpool { private static final long serialVersionUID = 1L; private static final int DoubleDataSize = 1000; double total; int random_samples; Random R; public MonteCarloPiModule() { R = new Random(); public void initializeModule(String[] args) { total = 0; Node.getLog().setLevel(Level.WARNING); // reduce verbosity for logging random_samples = 3000; // set number of random samples } 26 public Data Compute (Data data) { // input gets the data produced by DiffuseData() DataMap input = (DataMap )data; // output will emit the partial answers done by this method DataMap output = new DataMap (); Long seed = (Long) input.get("seed"); // get random seed Random r = new Random(); r.setSeed(seed); Long inside = 0L; for (int i = 0; i < DoubleDataSize ; i++) { double x = r.nextDouble(); double y = r.nextDouble(); double dist = x * x + y * y; if (dist <= 1.0) { ++inside; } output.put("inside", inside);// store partial answer to return to GatherData() return output; } public Data DiffuseData (int segment) { DataMap d =new DataMap (); d.put("seed", R.nextLong()); return d; // returns a random seed for each job unit } public void GatherData (int segment, Data dat) { DataMap out = (DataMap ) dat; Long inside = (Long) out.get("inside"); total += inside; // aggregate answer from all the worker nodes. } public double getPi() { // returns value of pi based on the job done by all the workers double pi = (total / (random_samples * DoubleDataSize)) * 4; return pi; } public int getDataCount() { return random_samples; } Complete code for computation Note: No message passing (MPI etc)

Bootstrap class package edu.uncc.grid.example.workpool; import java.io.IOException; import net.jxta.pipe.PipeID; import edu.uncc.grid.pgaf.Anchor; import edu.uncc.grid.pgaf.Operand; import edu.uncc.grid.pgaf.Seeds; import edu.uncc.grid.pgaf.p2p.Types; public class RunMonteCarloPiModule { public static void main(String[] args) { try { MonteCarloPiModule pi = new MonteCarloPiModule(); Seeds.start( "/path/to/seeds/seed/folder", false); PipeID id = Seeds.startPattern(new Operand( (String[])null, new Anchor( "hostname", Types.DataFlowRoll.SINK_SOURCE), pi ) ); System.out.println(id.toString() ); Seeds.waitOnPattern(id); System.out.println( "The result is: " + pi.getPi() ) ; Seeds.stop(); } catch (SecurityException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } 27 This code deploys framework and starts execution of pattern Different patterns have similar code

Compiling/executing Can be done on the command line (ant script provided) or through an IDE (Eclipse) 28

Another example Bubble sort with the pipeline pattern 29 Static test - Program executed using a fixed number of cores as given. Dynamic test - Number of cores dynamically changed during execution to improve performance. Platform: Dell 900 server with four quad-core processors and 64GB shared memory. From Jeremy Villalobos’ PhD thesis. See thesis for more details and results

Pattern operators Example use: Heat distribution simulation (Laplace’s eq.) Multiple cells on a stencil pattern work in a loop parallel fashion, computing and synchronizing on each iteration. However, every x iterations, they must implement an all-to-all communication pattern to run an algorithm to detect termination. 30 Can combine patterns. Example: Adding Stencil and All-to-All synchronous pattern Directly from Jeremy Villalobos’s PhD thesis

15.31 Tutorial page

15.32

33 Download page

34 Open Source Apache License in progress

Work in progress Tutorial on using all-to-all pattern to solve the n-body problem, and other documentation Documentation on advanced layer to enable programmers to develop their own patterns Exploring various ways to enhance and expand the work, using for example Aparapi 35

ITCS 4145/5145 Parallel Programming Pattern programming to be introduced into ITCS 4145/5145 Parallel Programming in Fall To be taught on NCREN in same way as ITCS 4/5146 Grid Computing with instructors at UNC- Charlotte and UNC- Wilmington but in Fall 2012 just two sites. Subsequently will be offered across NC. Regional/national workshops planned External funding for this work pending.

Pattern Programming Research Group 2011 –Jeremy Villalobos (PhD awarded, continuing involvement) –Saurav Bhattara (MS thesis, graduated) Spring 2012 –Yawo Adibolo (ITCS 6880 Individual Study) –Ayay Ramesh (ITCS 6880 Individual Study) Fall 2012 –Haoqi Zhao (MS thesis) Loosely related Spring 2012: Tim Lukacik and Phil Chung (UG senior projects on Eclipse PTP) 37 Openings!

Some publications Jeremy F. Villalobos and Barry Wilkinson, “Skeleton/Pattern Programming with an Adder Operator for Grid and Cloud Platforms,” The 2010 International Conference on Grid Computing and Applications (GCA’10), July 12-15, 2010, Las Vegas, Nevada, USA. Jeremy F. Villalobos and Barry Wilkinson, “Using Hierarchical Dependency Data Flows to Enable Dynamic Scalability on Parallel Patterns,” High-Performance Grid and Cloud Computing Workshop, 25th IEEE International Parallel & Distributed Processing Symposium, Anchorage (Alaska) USA, May 16-20, Also presented by B. Wilkinson as Session 4 in “Short Course on Grid Computing” Jornadas Chilenas de Computación, INFONOR-CHILE 2010, Nov. 18th - 19th, 2010, Antofagasta, Chile I regard this conference as a top conference in the field

Questions 39