Pattern Programming PP-1.1 ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, 2013. August 29A, 2013 PatternProg-1.

Slides:

Advertisements

Similar presentations

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University

Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.

Toward using higher-level abstractions to teach Parallel Computing 5/20/2013 (c) Copyright 2013 Clayton S. Ferner, UNC Wilmington1 Clayton Ferner, University.

Types of Parallel Computers

Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.

1 Short Course on Grid Computing Jornadas Chilenas de Computación 2010 INFONOR-CHILE 2010 November 15th - 19th, 2010 Antofagasta, Chile Dr. Barry Wilkinson.

A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.

Parallel Programming Models and Paradigms

1 UNC-Charlotte’s Grid Computing “Seeds” framework 1 © 2011 Jeremy Villalobos /B. Wilkinson Fall 2011 Grid computing course. Slides10-1.ppt Modification.

4/27/2006Education Technology Presentation Visual Grid Tutorial PI: Dr. Bina Ramamurthy Computer Science and Engineering Dept. Graduate Student:

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.

A Bridge to Your First Computer Science Course Prof. H.E. Dunsmore Concurrent Programming Threads Synchronization.

Pattern Programming Barry Wilkinson University of North Carolina Charlotte CCI Friday Seminar Series April 13 th, 2012.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.

ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006outline.1 ITCS 4145/5145 Parallel Programming (Cluster Computing) Fall 2006 Barry Wilkinson.

CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.

Pattern Programming Barry Wilkinson University of North Carolina Charlotte Computer Science Colloquium University of North Carolina at Greensboro September.

1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.

SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.

1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.

Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.

Pattern Programming with the Seeds Framework © 2013 B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 31 intro.ppt Modification date: Feb 17,

A Pattern Language for Parallel Programming Beverly Sanders University of Florida.

Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.

Suzaku Pattern Programming Framework (a) Structure and low level patterns © 2015 B. Wilkinson Suzaku.pptx Modification date February 22,

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,

Pattern Programming Seeds Framework Notes on Assignment 1 PP-2.1 ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, August 30, 2012 PatternProg-2.

Advanced Computer Systems

MASS Java Documentation, Verification, and Testing

SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.

Concepts of Object Oriented Programming

Dr. Barry Wilkinson University of North Carolina Charlotte

CSCI-235 Micro-Computer Applications

Parallel Programming By J. H. Wang May 2, 2017.

Pattern Parallel Programming

The University of Adelaide, School of Computer Science

Constructing a system with multiple computers or processors

NGS computation services: APIs and Parallel Jobs

Many-core Software Development Platforms

Stencil Pattern A stencil describes a 2- or 3- dimensional layout of processes, with each process able to communicate with its neighbors. Appears in simulating.

Using compiler-directed approach to create MPI code automatically

Chapter 4: Threads.

Pattern Parallel Programming

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

Programming with Parallel Design Patterns

B. Wilkinson/Clayton Ferner Seeds.ppt Modification date August

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.

Pipelined Pattern This pattern is implemented in Seeds, see

Shared Memory Programming

Constructing a system with multiple computers or processors

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

An Introduction to Software Architecture

© B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 31 session2a

Dr. Barry Wilkinson University of North Carolina Charlotte

SAMANVITHA RAMAYANAM 18TH FEBRUARY 2010 CPE 691

Notes on Assignment 3 OpenMP Stencil Pattern

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.

Chapter 4: Threads & Concurrency

Pattern Programming Seeds Framework Workpool Assignment 1

Overview of Workflows: Why Use Them?

Types of Parallel Computers

Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.

Introduction to Computer Science and Object-Oriented Programming

Presentation transcript:

Pattern Programming PP-1.1 ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, August 29A, 2013 PatternProg-1

Problem Addressed To make parallel programming more useable and scalable. Parallel programming -- writing programs to use multiple computers and processors collectively to solve problems -- has a very long history but still a challenge. 2

Traditional approach Explicitly specifying message-passing (MPI) and Explicitly using low-level threads APIs (Pthreads, Java threads, OpenMP, …). Need a better structured approach. 3

Pattern Programming Concept Programmer begins by constructing his program using established computational or algorithmic “patterns” that provide a structure. 4 “Design patterns” part of software engineering for many years: Reusable solutions to commonly occurring problems * Provide guide to “best practices”, not a final implementation Provides good scalable design structure Can reason more easier about programs Potential for automatic conversion into executable code avoiding low-level programming – We do that here. Particularly useful for the complexities of parallel/distributed computing *

5 In Parallel/Distributed Computing What patterns are we talking about? Low-level algorithmic patterns that might be embedded into a program such as fork-join, broadcast/scatter/gather. Higher level algorithm patterns for forming a complete program such as workpool, pipeline, stencil, map-reduce. We concentrate upon higher-level “computational/algorithm ” level patterns rather than lower level patterns.

Some Patterns 6 Workers Workpool Master Two-way connection Compute node Source/sink

7 Workers Pipeline Master Two-way connection Compute node Source/sink One-way connection Stage 1Stage 3Stage 2

8 Divide and Conquer Divide Two-way connection Compute node Source/sink Merge

9 All-to-All Two-way connection Compute node Source/sink Usually a synchronous computation - Performs number of iterations to obtain on solution e.g. N-body problem All compute nodes can communicate with all the other nodes Master

10 Stencil Two-way connection Compute node Source/sink Usually a synchronous computation - Performs number of iterations to converge on solution, e.g. solving Laplace’s/heat equation On each iteration, each node communicates with neighbors to get stored computed values All compute nodes can communicate with only neighboring nodes

Disadvantages New approach to learn Takes away some of the freedom from programmer Performance reduced (c.f. using high level languages instead of assembly language) 11 Parallel Patterns -- Advantages Abstracts/hides underlying computing environment Generally avoids deadlocks and race conditions Reduces source code size (lines of code) Leads to automated conversion into parallel programs without need to write with low level message-passing routines such as MPI. Hierarchical designs with patterns embedded into patterns, and pattern operators to combine patterns.

Previous/Existing Work Patterns explored in several projects. Industrial efforts –Intel Threading Building Blocks (TBB), Intel Cilk plus, Intel Array Building Blocks (ArBB). Focus on very low level patterns such as fork-join Universities: –University of Illinois at Urbana-Champaign and University of California, Berkeley –University of Torino/Università di Pisa Italy 12

Book by Intel authors “Structured Parallel Programming: Patterns for Efficient Computation,” Michael McCool, James Reinders, Arch Robison, Morgan Kaufmann, 2012 Focuses on Intel tools 15.13

Sometimes term “skeleton” used to describe “patterns”, especially directed acyclic graphs with a source, a computation, and a sink. We do not make that distinction and use the term “pattern” whether directed or undirected and whether acyclic or cyclic. This is done elsewhere. 14 Note on Terminology “Skeletons”

Our approach (Jeremy Villalobos’ UNC-C PhD thesis) Focuses on a few patterns of wide applicability (e.g. workpool, synchronous all-to-all, pipelined, stencil) but Jeremy took it much further than UPCRC and Intel. He developed a higher-level framework called “Seeds” Uses pattern approach to automatically distribute code across processor cores, computers, or geographical distributed computers and execute the parallel code. 15

Pattern Programming with the Seeds Framework 16

“Seeds” Parallel Grid Application Framework 17 Some Key Features Pattern-programming Java user interface (C++ version in development) Self-deploys on computers, clusters, and geographically distributed computers

Seeds Development Layers Basic Intended for programmers that have basic parallel computing background Based on skeletons and patterns Advanced: Used to add or extend functionality such as: Create new patterns Optimize existing patterns or Adapt existing pattern to non-functional requirements specific to the application Expert: Used to provide basic services: Deployment Security Communication/Connectivity Changes in the environment 18 Derived from Jeremy Villalobos’s PhD thesis defense

Basic User Programmer Interface Programmer selects a pattern and implements three principal Java methods with a module class: Diffuse method – to distribute pieces of data. Compute method – the actual computation Gather method – used to gather the results Programmer also has to fill details in a “run module” bootstrap class that creates an instance of the module class and starts the framework. Diffuse Compute Gather “Run module” bootstrap class Framework then self-deploys on a specified parallel/distributed computing platform and executes pattern. 19 “Module” class

package edu.uncc.grid.example.workpool; import java.util.Random; import java.util.logging.Level; import edu.uncc.grid.pgaf.datamodules.Data; import edu.uncc.grid.pgaf.datamodules.DataMap; import edu.uncc.grid.pgaf.interfaces.basic.Workpool; import edu.uncc.grid.pgaf.p2p.Node; public class MonteCarloPiModule extends Workpool { private static final long serialVersionUID = 1L; private static final int DoubleDataSize = 1000; double total; int random_samples; Random R; public MonteCarloPiModule() { R = new Random(); } public void initializeModule(String[] args) { total = 0; Node.getLog().setLevel(Level.WARNING); // reduce verbosity for logging random_samples = 3000; // set number of random samples } 20 public Data Compute (Data data) { // input gets the data produced by DiffuseData() DataMap input = (DataMap )data; DataMap output = new DataMap (); Long seed = (Long) input.get("seed"); // get random seed Random r = new Random(); r.setSeed(seed); Long inside = 0L; for (int i = 0; i < DoubleDataSize ; i++) { double x = r.nextDouble(); double y = r.nextDouble(); double dist = x * x + y * y; if (dist <= 1.0) { ++inside; } output.put("inside", inside);// store partial answer to return to GatherData() return output; // output will emit the partial answers done by this method } public Data DiffuseData (int segment) { DataMap d =new DataMap (); d.put("seed", R.nextLong()); return d; // returns a random seed for each job unit } public void GatherData (int segment, Data dat) { DataMap out = (DataMap ) dat; Long inside = (Long) out.get("inside"); total += inside; // aggregate answer from all the worker nodes. } public double getPi() { // returns value of pi based on the job done by all the workers double pi = (total / (random_samples * DoubleDataSize)) * 4; return pi; } public int getDataCount() { return random_samples; } Example module class Complete code (Monte Carlo pi in Assignment 1, see later for more details) Note: No explicit message passing Computation

Seeds Implementations PP-2.21 Three Java versions available (2013): Full JXTA P2P version requiring an Internet connection JXTA P2P version but not needing an external network, suitable for a single computer Multicore (thread-based) version for operation on a single computer Multicore version much faster execution on single computer. Only difference is minor change in bootstrap class.

Bootstrap class JXTA P2P version package edu.uncc.grid.example.workpool; import java.io.IOException; import net.jxta.pipe.PipeID; import edu.uncc.grid.pgaf.Anchor; import edu.uncc.grid.pgaf.Operand; import edu.uncc.grid.pgaf.Seeds; import edu.uncc.grid.pgaf.p2p.Types; public class RunMonteCarloPiModule { public static void main(String[] args) { try { MonteCarloPiModule pi = new MonteCarloPiModule(); Seeds.start( "/path/to/seeds/seed/folder", false); PipeID id = Seeds.startPattern(new Operand( (String[])null, new Anchor("hostname", Types.DataFlowRoll.SINK_SOURCE), pi )); System.out.println(id.toString() ); Seeds.waitOnPattern(id); Seeds.stop(); System.out.println( "The result is: " + pi.getPi() ) ; } catch (SecurityException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } 22 This code deploys framework and starts execution of pattern Different patterns have similar code

Bootstrap class Multicore version Multicore version Much faster on a multicore platform Thread based Bootstrap class does not need to start and stop JXTA P2P. Seeds.start() and Seeds.stop() not needed. Otherwise user code similar. public class RunMonteCarloPiModule { public static void main(String[] args) { try { MonteCarloPiModule pi=new MonteCarloPiModule(); Thread id = Seeds.startPatternMulticore( new Operand( (String[])null, new Anchor( args[0], Types.DataFlowRole.SINK_SOURCE), pi ),4); id.join(); System.out.println( "The result is: " + pi.getPi() ) ; } catch (SecurityException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); }

Measuring Time Can instrument code in the bootstrap class: public class RunMyModule { public static void main (String [] args ) { try{ long start = System.currentTimeMillis(); MyModule m = new MyModule(); Seeds.start(. ); PipeID id = ( … ); Seeds.waitOnPattern(id); Seeds.stop(); long stop = System.currentTimeMillis(); double time = (double) (stop - start) / ; System.out.println(“Execution time = " + time); } catch (SecurityException e) { … …

Compiling/executing Can be done on the command line (ant script provided) or through an IDE (Eclipse) 25

15.26 Tutorial page

Acknowledgements Extending work to teaching environment supported by the National Science Foundation under grant "Collaborative Research: Teaching Multicore and Many-Core Programming at a Higher Level of Abstraction" # / ( ). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Work initiated by Jeremy Villalobos in his PhD thesis “Running Parallel Applications on a Heterogeneous Environment with Accessible Development Practices and Automatic Scalability,” UNC-Charlotte, Jeremy developed “Seeds” pattern programming software.

UNC-Charlotte Pattern Programming Research Group Fall 2013 Jeremy Villalobos (PhD awarded, continuing involvement) PhD student Yasaman Kamyab Hessary (Course TA) CS MS students Haoqi Zhao (MS thesis) Yawo Adibolo developed C++ version of framework software for interest. CS BS student Matthew Edge (Senior project) Kevin Silliman (Senior project evaluating Yawo’s C++ framework) 28 Please contact B. Wilkinson if you would like to be involved in this work for academic credit

Questions 29

Next step Assignment 1 – using the Seeds framework 30