Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pattern Programming Barry Wilkinson University of North Carolina Charlotte Computer Science Colloquium University of North Carolina at Greensboro September.

Similar presentations


Presentation on theme: "Pattern Programming Barry Wilkinson University of North Carolina Charlotte Computer Science Colloquium University of North Carolina at Greensboro September."— Presentation transcript:

1 Pattern Programming Barry Wilkinson University of North Carolina Charlotte Computer Science Colloquium University of North Carolina at Greensboro September 11, 2012

2 Acknowledgment This work was initiated by Jeremy Villalobos and described in his PhD thesis: “Running Parallel Applications on a Heterogeneous Environment with Accessible Development Practices and Automatic Scalability,” UNC-Charlotte, 2011. Jeremy developed the so-called “Seeds” pattern programming software described here. 2

3 Problem Addressed To make parallel programming more useable and scalable. Parallel programming -- writing programs using multiple computers and processors collectively to solve problems -- has a very long history but still a challenge. 3 Traditional approach Explicitly specifying message- passing (MPI), and Low-level threads APIs (Pthreads, Java threads, OpenMP, …). Need a better structured approach.

4 Pattern Programming Concept Programmer begins by constructing his program using established computational or algorithmic “patterns” that provide a structure. 4 What patterns are we talking about? Low-level algorithmic patterns that might be embedded into a program such as fork-join, broadcast/scatter/gather. Higher level algorithm patterns for forming a complete program such as workpool, pipeline, stencil, map-reduce. We concentrate upon higher-level “computational/algorithm” patterns rather than lower level patterns.

5 Some Patterns 5 Workers Workpool Master Two-way connection Compute node Source/sink Derived from Jeremy Villalobos’s PhD thesis defense

6 6 Workers Pipeline Master Two-way connection Compute node Source/sink One-way connection Stage 1Stage 3Stage 2

7 7 Divide and Conquer Divide Two-way connection Compute node Source/sink Merge

8 8 All-to-All Two-way connection Compute node Source/sink

9 9 Stencil Two-way connection Compute node Source/sink Usually a synchronous computation - Performs number of iterations to converge on solution, e.g. for solving Laplace’s/heat equation On each iteration, each node communicates with neighbors to get stored computed values

10 Sometimes term “skeleton” used to describe “patterns”, especially directed acyclic graphs with a source, a computation, and a sink. We do not make that distinction and use the term “pattern” whether directed or undirected and whether acyclic or cyclic. This is done elsewhere. 10 Note on Terminology “Skeletons”

11 “Design patterns” part of software engineering for many years –Reusable solutions to commonly occurring problems * –Patterns provide guide to “best practices”, not a final implementation –Provides good scalable design structure to parallel programs –Can reason more easier about programs Pattern programming takes this concept further and applies it to parallel programming. 11 Design Patterns * http://en.wikipedia.org/wiki/Design_pattern_(computer_science)

12 Using Parallel Patterns Advantages Abstracts/hides underlying computing environment Generally avoids deadlocks and race conditions Reduces source code size (lines of code) Hierarchical designs with patterns embedded into patterns, and pattern operators to combine patterns Leads to an automated conversion into parallel programs without need to write with low level message-passing routines such as MPI. Disadvantages New approach to learn Takes away some of the freedom from programmer Performance reduced slightly (c.f. using high level languages instead of assembly language) 12

13 Previous/Existing Work Patterns/skeletons explored in several projects. Universities: –University of Illinois at Urbana-Champaign and University of California, Berkeley –University of Torino/Università di Pisa Italy Industrial efforts –Intel –Microsoft 13

14 University of Illinois at Urbana-Champaign and University of California, Berkeley with Microsoft and Intel in 2008 (with combined funding of at least $35 million). Co-developed OPL (Our Pattern Language). Group of twelve computational patterns identified: Finite State Machines Circuits Graph Algorithms Structured Grid Dense Matrix Sparse Matrix in seven general application areas 14 Universal Parallel Computing Research Centers (UPCRC)

15 Intel Focuses on very low level patterns such as fork-join, and provides constructs for them in: Intel Threading Building Blocks (TBB) –Template library for C++ to support parallelism Intel Cilk plus –Compiler extensions for C/C++ to support parallelism Intel Array Building Blocks (ArBB) –Pure C++ library-based solution for vector parallelism Above are somewhat competing tools obtained through takeovers of small companies. Each implemented differently. 15

16 New book 2012 from Intel authors “Structured Parallel Programming: Patterns for Efficient Computation,” Michael McCool, James Reinders, Arch Robison, Morgan Kaufmann, 2012 Focuses entirely on Intel tools 15.16

17 17 Using patterns with Microsoft C# http://www.microsoft.com/download/en/det ails.aspx?displaylang=en&id=19222 Again very low-level with patterns such as parallel for loops.

18 Closest to our work http://calvado s.di.unipi.it/do kuwiki/doku.p hp?id=ffname space:about 18 University of Torino/Università di Pisa, Italy

19 Our approach (Jeremy Villalobos’ UNC-C PhD thesis) Focuses on a few patterns of wide applicability: Workpool Synchronous all-to-all Pipeline Stencil and a few others but Jeremy took it much further. He developed a higher-level framework called “Seeds” Automatically distributes code across processor cores, computers, or geographical distributed computers and execute parallel code according to pattern. 19

20 “Seeds” Parallel Grid Application Framework 20 http://coit-grid01.uncc.edu/seeds/ Some Key Features Pattern-programming (Java) user interface Self-deploys on computers, clusters, and geographically distributed computers Load balances Three levels of user interface

21 Seeds Development Layers Basic Intended for programmers that have basic parallel computing background Based on skeletons and patterns Advanced: Used to add or extend functionality such as: Create new patterns Optimize existing patterns or Adapt existing pattern to non-functional requirements specific to the application Expert: Used to provide basic services: Deployment Security Communication/Connectivity Changes in the environment 21 Derived from Jeremy Villalobos’s PhD thesis defense

22 Deployment Several different ways implemented during PhD work including using Globus grid computing software Deployment with SSH now preferred 22

23 Basic User Programmer Interface 23 Programmer selects a pattern and implements three principal Java methods: Diffuse method – to distribute pieces of data. Compute method – the actual computation Gather method – used to gather the results Programmer also has to fill in details in a “bootstrap” class to deploy and start the framework. Diffuse Compute Gather Bootstrap class Framework self-deploys on a geographically distributed or local platform and executes pattern.

24 24 Basis of Monte Carlo calculations is use of random selections In this case, circle formed within a square Points within square chosen randomly Fraction of points within circle =  /4 Only one quadrant used in code Monte Carlo  calculation Example: Deploy a workpool pattern to compute  using Monte Carlo method

25 package edu.uncc.grid.example.workpool; import java.util.Random; import java.util.logging.Level; import edu.uncc.grid.pgaf.datamodules.Data; import edu.uncc.grid.pgaf.datamodules.DataMap; import edu.uncc.grid.pgaf.interfaces.basic.Workpool; import edu.uncc.grid.pgaf.p2p.Node; public class MonteCarloPiModule extends Workpool { private static final long serialVersionUID = 1L; private static final int DoubleDataSize = 1000; double total; int random_samples; Random R; public MonteCarloPiModule() { R = new Random(); } public void initializeModule(String[] args) { total = 0; Node.getLog().setLevel(Level.WARNING); random_samples = 3000; } 25 public Data Compute (Data data) { DataMap input = (DataMap )data; DataMap output = new DataMap (); Long seed = (Long) input.get("seed"); Random r = new Random(); r.setSeed(seed); Long inside = 0L; for (int i = 0; i < DoubleDataSize ; i++) { double x = r.nextDouble(); double y = r.nextDouble(); double dist = x * x + y * y; if (dist <= 1.0) { ++inside; } output.put("inside", inside); return output; } public Data DiffuseData (int segment) { DataMap d =new DataMap (); d.put("seed", R.nextLong()); return d; } public void GatherData (int segment, Data dat) { DataMap out = (DataMap ) dat; Long inside = (Long) out.get("inside"); total += inside; } public double getPi() { double pi = (total / (random_samples * DoubleDataSize)) * 4; return pi; } public int getDataCount() { return random_samples; } Complete code for Monte Carlo  computation Note: No message passing (MPI etc)

26 Seeds Workpool DiffuseData, Compute, and GatherData Methods DiffuseData DataMap d Returns d to each slave DataMap d created in diffuse DataMap output created in compute DataMap output GatherData Private variable total (answer) Compute DataMap input Data argument data Data argument dat Master Slaves Note DiffuseData, Compute and GatherData methods start with a capital letter although method names should not! 26

27 DataMap methods put (String, data) – puts data into DataMap identified by string get (String, data) – gets stored data identified by string In the pi code, data is a Long. DataMap extends Java HashMap which implement a Map, see http://doc.java.sun.com/DocWeb/api/java.util.HashMap Data and DataMap classes For implementation convenience two classes: Data class used to pass data between master and slaves Uses a “segment” number to keep track of packets as they go from one method to another. DataMap class inside compute method DataMap is a subclass of Data and so allows casting. 27

28 public Data DiffuseData (int segment) { DataMap d =new DataMap (); input Data = …. d.put(“name_of_inputdata", inputData); return d; } public Data Compute (Data data) { DataMap input = (DataMap )data; //data produced by DiffuseData() DataMap output = new DataMap (); //output returned to gatherdata inputData = input.get(“name_of_inputdata”); … // computation output.put("name_of _results", results); // to return to GatherData() return output; } public void GatherData (int segment, Data dat) { DataMap out = (DataMap ) dat; outdata = out.get (“name_of_results”); result … // aggregate outdata from all the worker nodes. result a private variable } By framework segment used by Framework to keep track of where to put results GatherData gives back Data object with a segment number Data cast into a DataMap 28

29 Bootstrap class package edu.uncc.grid.example.workpool; import java.io.IOException; import net.jxta.pipe.PipeID; import edu.uncc.grid.pgaf.Anchor; import edu.uncc.grid.pgaf.Operand; import edu.uncc.grid.pgaf.Seeds; import edu.uncc.grid.pgaf.p2p.Types; public class RunMonteCarloPiModule { public static void main(String[] args) { try { MonteCarloPiModule pi = new MonteCarloPiModule(); Seeds.start( "/path/to/seeds/seed/folder", false); PipeID id = Seeds.startPattern(new Operand( (String[])null, new Anchor( "hostname", Types.DataFlowRoll.SINK_SOURCE), pi ) ); System.out.println(id.toString() ); Seeds.waitOnPattern(id); System.out.println( "The result is: " + pi.getPi() ) ; Seeds.stop(); } catch (SecurityException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } 29 This code deploys framework and starts execution of pattern Different patterns have similar code

30 Compiling/executing Can be done on the command line (ant script provided), or through an IDE (Eclipse): 30

31 MatrixAddModule.java (continues on several slides) package edu.uncc.grid.example.workpool; import … public class MatrixAddModule extends Workpool { private static final long serialVersionUID = 1L; int[][] matrixA, matrixB, matrixC; public MatrixAddModule() { matrixC = new int[3][3]; } public void initMatrices(){ matrixA = new int[][]{{2,5,8},{3,4,9},{1,5,2}}; matrixB = new int[][]{{2,5,8},{3,4,9},{1,5,2}}; } public void initializeModule(String[] args) { Node.getLog().setLevel(Level.WARNING); } Another Workpool example - Matrix addition 31

32 public Data DiffuseData(int segment) { int[] rowA = new int[3]; int[] rowB = new int[3]; DataMap d =new DataMap (); int k = segment; // segment variable identifies slave task for (int i=0;i<3;i++) { //Copy one row of A and one row of B into d rowA[i] = matrixA[k][i]; rowB[i] = matrixB[k][i]; } d.put("rowA",rowA); d.put("rowB",rowB); return d; } 32

33 public Data Compute(Data data) { int[] rowC = new int[3]; DataMap input = (DataMap )data; DataMap output = new DataMap (); int[] rowA = (int[]) input.get("rowA"); int[] rowB = (int[]) input.get("rowB"); for (int i=0;i<3;i++){ //computation rowC[i] = rowA[i] + rowB[i]; } output.put("rowC",rowC); return output; } 33

34 public void GatherData(int segment, Data dat) { DataMap out = (DataMap ) dat; int[] rowC = (int[]) out.get("rowC"); for (int i=0;i<3;i++) { matrixC[segment][i]= rowC[i]; } public void printResult(){ for (int i=0;i<3;i++){ System.out.println(""); for(int j=0;j<3;j++){ System.out.print(matrixC[i][j] + ","); } public int getDataCount() { return 3; } 34

35 Bootstrap class to run framework RunMatrixAddModule.java package edu.uncc.grid.example.workpool; import … public class RunMatrixAddModule { public static String localhost = "T5400"; public static String seedslocation = "C:\\seeds_2.0\\pgaf"; public static void main (String [] args ) { try{ Seeds.start( seedslocation,false); MatrixAddModule m = new MatrixAddModule(); m.initMatrices(); PipeID id = Seeds.startPattern(new Operand ((String[])null,new Anchor (localhost,Types.DataFlowRoll.SINK_SOURCE),m)); Seeds.waitOnPattern(id); m.printResult(); Seeds.stop(); } … // exceptions } 35

36 Multicore version of Seeds Just done by Jeremy (September 2012) Designed for a multicore shared memory platform. Much faster. Thread-based. Does not use JXTA P2P network to run cluster nodes. Bootstrap class does not need to start and stop JXTA P2P. Otherwise user code similar. public class RunMonteCarloPiModule { public static void main(String[] args) { try { MonteCarloPiModule pi=new MonteCarloPiModule(); Thread id = Seeds.startPatternMulticore( new Operand( (String[])null, new Anchor( args[0], Types.DataFlowRole.SINK_SOURCE), pi ), 4 ); id.join(); System.out.println( "The result is: " + pi.getPi() ) ; } catch (SecurityException e) { … } 36

37 Matrix multiplication Courtesy of Ben Barbour, Research Assistant/graduate student, Dept. of Computer Science, UNC- Wilmington 37

38 A pattern that combines all-to-all with synchronous iteration. Slave processes can exchange data with each other at each iteration (synchronization point) without stopping pattern, i.e. without returning from Seeds. Some problems of this type require a number of iterations to converge on the solution. Example N-body problem Solving general system of linear equations by iteration Seeds “CompleteSyncGraph” pattern 38

39 Solving General System of Linear Equations by iteration Suppose equations are of a general form with n equations and n unknowns: where the unknowns are x 0, x 1, x 2, … x n-1 (0 <= i < n). Synchronous all-to-all pattern (Seeds CompleteSyncGraph pattern) Example with termination after converging on solution 39

40 P0P0 P n-1 PiPi (Excluding P i ) By rearranging the ith equation, P i computes the following: Broadcasting their result to every other process after each iteration until convergence 40

41 Pattern operators Example use: Heat distribution simulation (Laplace’s eq.) Multiple cells on a stencil pattern work in a loop parallel fashion, computing and synchronizing on each iteration. However, every x iterations, they must implement an all-to-all communication pattern to run an algorithm to detect termination. 41 Can create your own combined patterns with a pattern operator. Example: Adding Stencil and All-to-All synchronous pattern Directly from Jeremy Villalobos’s PhD thesis

42 15.42 Tutorial page http://coit-grid01.uncc.edu/seeds/

43 43

44 44 Download page

45 Pattern programming introduced into our regular undergraduate parallel programming course before lower-level tools such as MPI, OpenMP, CUDA. Prototype course on NCREN between UNC- Charlotte and UNC- Wilmington. Future offering will include other sites. 45 Introducing pattern programming into undergraduate curriculum

46 Acknowledgments Introducing pattern programming into the undergraduate curriculum is supported by the National Science Foundation under grants #1141005 and #1141006, “Collaborative Research: Teaching Multicore and Many-Core Programming at a Higher Level of Abstraction,” a collaborative project with Dr. C. Ferner, co- PI at UNC-Wilmington. Note: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. 46

47 Pattern Programming Research Group 2011 –Jeremy Villalobos (PhD awarded, continuing involvement) –Saurav Bhattara (MS thesis, graduated) Spring 2012 –Yawo Adibolo (ITCS 6880 Individual Study) –Ayay Ramesh (ITCS 6880 Individual Study) Fall 2012 –Haoqi Zhao (MS thesis, implementing a C++ version of Seeds) –Pohua Lee (BS senior project) 47

48 48 Research group home page http://coitweb.uncc.edu/ ~abw/PatternProgGroup Currently needs a password

49 Some publications Jeremy F. Villalobos and Barry Wilkinson, “Skeleton/Pattern Programming with an Adder Operator for Grid and Cloud Platforms,” The 2010 International Conference on Grid Computing and Applications (GCA’10), July 12-15, 2010, Las Vegas, Nevada, USA. Jeremy F. Villalobos and Barry Wilkinson, “Using Hierarchical Dependency Data Flows to Enable Dynamic Scalability on Parallel Patterns,” High-Performance Grid and Cloud Computing Workshop, 25th IEEE International Parallel & Distributed Processing Symposium, Anchorage (Alaska) USA, May 16-20, 2011. Also presented by B. Wilkinson as Session 4 in “Short Course on Grid Computing” Jornadas Chilenas de Computación, INFONOR-CHILE 2010, Nov. 18th - 19th, 2010, Antofagasta, Chile. http://coitweb.uncc.edu/~abw/Infornor-Chile2010/GridWorkshop/index.html 49

50 Questions


Download ppt "Pattern Programming Barry Wilkinson University of North Carolina Charlotte Computer Science Colloquium University of North Carolina at Greensboro September."

Similar presentations


Ads by Google