CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879.

Slides:



Advertisements
Similar presentations
Parallel Programming Patterns Eun-Gyu Kim June 10, 2004.
Advertisements

Programmability Issues
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.
Creating Computer Programs lesson 27. This lesson includes the following sections: What is a Computer Program? How Programs Solve Problems Two Approaches:
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Includes slides from “Multicore Programming Primer” course at Massachusetts Institute of Technology (MIT) by Prof. Saman Amarasinghe and Dr. Rodric Rabbah.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
Reference: Message Passing Fundamentals.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.
Parallel Programming Models and Paradigms
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
OPL: Our Pattern Language. Background Design Patterns: Elements of Reusable Object-Oriented Software o Introduced patterns o Very influential book Pattern.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Prince Sultan College For Woman
INTEL CONFIDENTIAL Parallel Decomposition Methods Introduction to Parallel Programming – Part 2.
INTEL CONFIDENTIAL Finding Parallelism Introduction to Parallel Programming – Part 3.
Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven / Aachen, Germany Stand: Version 2.3.
Recognizing Potential Parallelism Intel Software College Introduction to Parallel Programming – Part 1.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
TRIPS – An EDGE Instruction Set Architecture Chirag Shah April 24, 2008.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
9/20/6Lecture 3 - Instruction Set - Al1 Program Design.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Recognizing Potential Parallelism Introduction to Parallel Programming Part 1.
Lecture 1 Introduction Figures from Lewis, “C# Software Solutions”, Addison Wesley Richard Gesick.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Processor Architecture
FOUNDATION IN INFORMATION TECHNOLOGY (CS-T-101) TOPIC : INFORMATION SYSTEM – SOFTWARE.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Finding concurrency Jakub Yaghob. Finding concurrency design space Starting point for design of a parallel solution Analysis The patterns will help identify.
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
Static Translation of Stream Program to a Parallel System S. M. Farhad The University of Sydney.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Parallel Patterns.
Algorithms and Problem Solving
Conception of parallel algorithms
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Programming Patterns
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
CS 584 Lecture 3 How is the assignment going?.
Parallel Algorithm Design
Mattan Erez The University of Texas at Austin
Mattan Erez The University of Texas at Austin
Mattan Erez The University of Texas at Austin
CSCE569 Parallel Computing
CS 584.
Algorithms and Problem Solving
Mattan Erez The University of Texas at Austin
Parallel Programming in C with MPI and OpenMP
COMP60611 Fundamentals of Parallel and Distributed Systems
Mattan Erez The University of Texas at Austin
Presentation transcript:

CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware Lecture 5 Patterns for Parallel Programming

CISC 879 : Software Support for Multicore Architectures Lecture 5: Overview Writing a Parallel Program Design Patterns for Parallel Programs Finding Concurrency Algorithmic Structure Supporting Structures Implementation Mechanisms Slide Source: Dr. Rabbah, IBM, MIT Course IAP 2007

CISC 879 : Software Support for Multicore Architectures Parallelization Common Steps Slide Source: Dr. Rabbah, IBM, MIT Course IAP Study problem or code 2. Look for parallelism opportunities 3. Try to keep all cores busy doing useful work

CISC 879 : Software Support for Multicore Architectures Decomposition Slide Source: Dr. Rabbah, IBM, MIT Course IAP 2007 Identify concurrency Decide level to exploit it Requires understanding the algorithm! May require restructing program/algorithm May require entirely new algorithm Break computation into tasks Divided among processes Tasks may become available dynamically Number of tasks can vary with time Want enough tasks to keep processors busy.

CISC 879 : Software Support for Multicore Architectures Assignment Slide Source: Dr. Rabbah, IBM, MIT Course IAP 2007 Specify mechanism to divide work Balance of computation Reduce communication Structured approaches work well Code inspection and understanding algorithm Using design patterns (second half part of lecture)

CISC 879 : Software Support for Multicore Architectures Granularity Ratio of computation and communication Fine-grain parallelism Tasks execute little comp. between comm. Easy to load balance If too fine, comm. may take longer than comp. Coarse-grain parallelism Long computations between communication More opportunity for performance increase Harder to load balance Most efficient granularity depends on algorithm/hardware.

CISC 879 : Software Support for Multicore Architectures Orchestration and Mapping Slide Source: Dr. Rabbah, IBM, MIT Course IAP 2007 Computation and communication concurrency Preserve locality of data Schedule task to satisfy dependences

CISC 879 : Software Support for Multicore Architectures Lecture 5: Overview Parallelizing a Program Design Patterns for Parallelization Finding Concurrency Algorithmic Structure Supporting Structures Implementation Mechanisms

CISC 879 : Software Support for Multicore Architectures Patterns for Parallelization Parallelization is a difficult problem Hard to fully exploit parallel hardware Solution: Design Patterns Cookbook for parallel programmers Can lead to high quality solutions Provides a common vocabulary Each pattern has a name and associated vocabulary for discussing solutions Helps with software reusability and modularity Allows developer to understand solution and its context

CISC 879 : Software Support for Multicore Architectures Architecture Patterns Slide Source: Dr. Rabbah, IBM, MIT Course IAP 2007 Christopher Alexander Berkeley architecture professor Developed patterns for architecture City planning Layout of windows in a room Attempt to capture principles for “living” designs

CISC 879 : Software Support for Multicore Architectures Patterns for OOP Slide Source: Dr. Rabbah, IBM, MIT Course IAP 2007 First to bring pattens to CS Design Patterns: Elements of Reusable Object-Oriented Software (1994) Gamma et al. (Gang of Four) Catalogue of “patterns” Solutions to common problems in software design Not a finished solution! Rather a template for how to solve a problem

CISC 879 : Software Support for Multicore Architectures Patterns Parallelization Book Slide Source: Dr. Rabbah, IBM, MIT Course IAP 2007 Patterns for Parallel Programming. Mattson et al. (2005) Four Design Spaces Finding Concurrency Expose concurrent task or data Algorithm Structure Map tasks to processes Supporting Structures Code and data structuring patterns Implementation Mechanisms Low-level mechanisms for writing programs

CISC 879 : Software Support for Multicore Architectures Finding Concurrency Decomposition Task, Data, Pipeline Dependency Analysis Control dependences Data dependences Design Evaluation Suitability for target platform Design quality

CISC 879 : Software Support for Multicore Architectures Decomposition Data (domain) decomposition Break data up independent units Task (functional) decomposition Break problem into parallel tasks Case for Pipeline decomposition In Algorithm Structure design pattern (next class), but I’ll discuss here.

CISC 879 : Software Support for Multicore Architectures Data (Domain) Decomposition Also known as Domain Decomposition Implied by Task Decomposition Which decomposition more natural to start with: 1) Decide how data elements divided among cores 2) Decide which tasks each core should be performing Data decomposition is good starting point when Main computation manipulating a large data structure Similar operations applied to different parts of a data structure (SPMD) Example : Vector addition Slide Source: Intel Software College, Intel Corp.

CISC 879 : Software Support for Multicore Architectures Data Decomposition Slide Source: Intel Software College, Intel Corp. Find the largest element of an array

CISC 879 : Software Support for Multicore Architectures Data Decomposition Slide Source: Intel Software College, Intel Corp. Find the largest element of an array CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Data Decomposition Slide Source: Intel Software College, Intel Corp. Find the largest element of an array CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Data Decomposition Slide Source: Intel Software College, Intel Corp. Find the largest element of an array CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Data Decomposition Slide Source: Intel Software College, Intel Corp. Find the largest element of an array CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Data Decomposition Slide Source: Intel Software College, Intel Corp. Find the largest element of an array CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Data Decomposition Slide Source: Intel Software College, Intel Corp. Find the largest element of an array CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Data Decomposition Slide Source: Intel Software College, Intel Corp. Find the largest element of an array CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Data Decomposition Slide Source: Intel Software College, Intel Corp. Find the largest element of an array CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Data Decomposition Slide Source: Intel Software College, Intel Corp. Find the largest element of an array CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Data Decomposition Slide Source: Intel Software College, Intel Corp. Find the largest element of an array CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Data Decomposition Slide Source: Intel Software College, Intel Corp. Find the largest element of an array CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Data Decomposition Forces Flexibility Size of data chunks should support a range of systems Granularity knobs Efficiency Data chunks should have comparable computation (load balancing) Simplicity Complex data decomposition difficult to debug

CISC 879 : Software Support for Multicore Architectures Task (Functional) Decomposition Slide Source: Intel Software College, Intel Corp. Programs often naturally decompose into tasks Functions Distinct loop iterations Loop splitting algorithms Divide tasks among cores Easier to start with too many task and fuse some later Decide data accessed (read/written) by each core Example: Event-handler for a GUI

CISC 879 : Software Support for Multicore Architectures Task Decomposition Slide Source: Intel Software College, Intel Corp. f() s() r() q() h() g()

CISC 879 : Software Support for Multicore Architectures Task Decomposition Slide Source: Intel Software College, Intel Corp. f() s() r() q() h() g() CPU 2 CPU 1 CPU 0

CISC 879 : Software Support for Multicore Architectures Task Decomposition Slide Source: Intel Software College, Intel Corp. f() s() r() q() h() g() CPU 0 CPU 2 CPU 1

CISC 879 : Software Support for Multicore Architectures Task Decomposition Slide Source: Intel Software College, Intel Corp. f() s() r() q() h() g() CPU 0 CPU 2 CPU 1

CISC 879 : Software Support for Multicore Architectures Task Decomposition Slide Source: Intel Software College, Intel Corp. f() s() r() q() h() g() CPU 2 CPU 1 CPU 0

CISC 879 : Software Support for Multicore Architectures Task Decomposition Slide Source: Intel Software College, Intel Corp. f() s() r() q() h() g() CPU 0 CPU 2 CPU 1

CISC 879 : Software Support for Multicore Architectures Task Decomposition Forces Slide Source: Intel Software College, Intel Corp. Flexibility in number and size of tasks Task should not be tied to a specific architecture Parameterize number of task Flexible to any architecture topology Efficiency Task have enough computation to amortize creation costs Sufficiently independent so dependencies are manageable Simplicity Easy to understand and debug

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. Special kind of task decomposition Data flows through a sequence of tasks “Assembly line” parallelism Example: 3D rendering in computer graphics RasterizeClipProjectModel Input Output

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. RasterizeClipProjectModel Processing one data set (Step 1)

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. Processing one data set (Step 2) RasterizeClipProjectModel

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. Processing one data set (Step 3) RasterizeClipProjectModel

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. Processing one data set (Step 4) Pipeline processes 1 data set in 4 steps RasterizeClipProjectModel

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. Processing five data set (Step 1) Data set 0 Data set 1 Data set 2 Data set 3 Data set 4 CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. Processing five data set (Step 2) Data set 0 Data set 1 Data set 2 Data set 3 Data set 4 CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. Processing five data set (Step 3) Data set 0 Data set 1 Data set 2 Data set 3 Data set 4 CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. Processing five data set (Step 4) Data set 0 Data set 1 Data set 2 Data set 3 Data set 4 CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. Processing five data set (Step 5) Data set 0 Data set 1 Data set 2 Data set 3 Data set 4 CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. Processing five data set (Step 6) Data set 0 Data set 1 Data set 2 Data set 3 Data set 4 CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. Processing five data set (Step 7) Data set 0 Data set 1 Data set 2 Data set 3 Data set 4 CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Slide Source: Intel Software College, Intel Corp. Processing five data set (Step 8) Data set 0 Data set 1 Data set 2 Data set 3 Data set 4 CPU 0CPU 1CPU 2CPU 3

CISC 879 : Software Support for Multicore Architectures Pipeline Decomposition Forces Slide Source: Intel Software College, Intel Corp. Flexibility Deeper pipelines are better Will scale better and can later merge pipeline stages Efficiency Stages of pipeline should not cause bottleneck Simplicity More pipeline stages break down problem into more manageable chunks of code

CISC 879 : Software Support for Multicore Architectures Dependency Analysis Control and Data Dependences Dependence Graph Graph = (nodes, edges) Node for each Variable assignment Constant Operator or Function call Edge indicates use of variables and constants Data flow Control flow Slide Source: Intel Software College, Intel Corp.

CISC 879 : Software Support for Multicore Architectures Dependency Analysis Slide Source: Intel Software College, Intel Corp. for (i = 0; i < 3; i++) a[i] = b[i] / 2.0; b[0]b[1]b[2] a[0]a[1]a[2] /// 222

CISC 879 : Software Support for Multicore Architectures Dependency Analysis Slide Source: Intel Software College, Intel Corp. for (i = 0; i < 3; i++) a[i] = b[i] / 2.0; b[0]b[1]b[2] a[0]a[1]a[2] /// 222 Domain decomposition possible

CISC 879 : Software Support for Multicore Architectures Dependency Analysis Slide Source: Intel Software College, Intel Corp. for (i = 1; i < 4; i++) a[i] = a[i-1] * b[i]; b[1]b[2]b[3] a[1]a[2]a[3] *** a[0]

CISC 879 : Software Support for Multicore Architectures Dependency Analysis Slide Source: Intel Software College, Intel Corp. for (i = 1; i < 4; i++) a[i] = a[i-1] * b[i]; b[1]b[2]b[3] a[1]a[2]a[3] *** a[0] No domain decomposition

CISC 879 : Software Support for Multicore Architectures Dependency Analysis Slide Source: Intel Software College, Intel Corp. a = f(x, y, z); b = g(w, x); t = a + b; c = h(z); s = t / c; x f wyz ab g t c s / h

CISC 879 : Software Support for Multicore Architectures Dependency Analysis Slide Source: Intel Software College, Intel Corp. a = f(x, y, z); b = g(w, x); t = a + b; c = h(z); s = t / c; x f wyz ab g t c s / h Task decomposition with 3 CPUs.

CISC 879 : Software Support for Multicore Architectures Evaluate Design Is the design good enough YES - move to next design space NO - re-evaluate previous patterns Forces Suitability to target platform Should not depend on underlying architecture Design quality Trade-offs between simplicity, flexibility, and efficiency Pick any two! Preparation for next phase Understand design to help in next phase: Algorithm Structure

CISC 879 : Software Support for Multicore Architectures Read for Next Time Reengineering for Parallelism: An Entry Point for PLPP (Pattern Language for Parallel Programming) for Legacy Applications