Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Slides:

Advertisements

Similar presentations

Distributed Systems CS

Advertisements

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

Creating Computer Programs lesson 27. This lesson includes the following sections: What is a Computer Program? How Programs Solve Problems Two Approaches:

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science

Reference: Message Passing Fundamentals.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.

Operating System Support Focus on Architecture

Parallel Computing Multiprocessor Systems on Chip: Adv. Computer Arch. for Embedded Systems By Jason Agron.

Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.

Chapter 11 Operating Systems

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.

 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.

1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.

Designing Parallel Programs David Rodriguez-Velazquez CS-6260 Spring-2009 Dr. Elise de Doncker.

Mapping Techniques for Load Balancing

To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,

SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.

Introduction to Parallel Computing

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

Compiler, Languages, and Libraries ECE Dept., University of Tehran Parallel Processing Course Seminar Hadi Esmaeilzadeh

Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.

Parallel Computer Architecture and Interconnect 1b.1.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note Introduction to Parallel Computing (Blaise Barney,

Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.

1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.

Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.

The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.

Chapter 7 -1 CHAPTER 7 PROCESS SYNCHRONIZATION CGS Operating System Concepts UCF, Spring 2004.

ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

Finding concurrency Jakub Yaghob. Finding concurrency design space Starting point for design of a parallel solution Analysis The patterns will help identify.

Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.

Chapter 8 System Management Semester 2. Objectives  Evaluating an operating system  Cooperation among components  The role of memory, processor,

A Pattern Language for Parallel Programming Beverly Sanders University of Florida.

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

Parallel Computing Presented by Justin Reschke

Designing Parallel Programs  Automatic vs. Manual Parallelization  Understand the Problem and the Program  Communications  Synchronization  Data Dependencies.

Background Computer System Architectures Computer System Software.

Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.

These slides are based on the book:

Parallel Patterns.

Parallel Programming pt.1

Introduction to Parallel Processing

PARALLEL COMPUTING Submitted By : P. Nagalakshmi

Distributed Shared Memory

Operating Systems (CS 340 D)

Computer Engg, IIT(BHU)

Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang

Operating Systems (CS 340 D)

Parallel Programming in C with MPI and OpenMP

EE 193: Parallel Computing

CMSC 611: Advanced Computer Architecture

Chapter 4: Threads.

Lecture 3 : Performance of Parallel Programs

Fast Communication and User Level Parallelism

Threads Chapter 4.

Designing Parallel Programs

Multiprocessor and Real-Time Scheduling

Mattan Erez The University of Texas at Austin

Parallel Programming in C with MPI and OpenMP

CSC Multiprocessor Programming, Spring, 2011

Presentation transcript:

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National Laboratory

Overview Parallel programming models in common use: – Shared Memory – Threads – Message Passing – Data Parallel – Hybrid Parallel programming models are abstractions above hardware and memory architectures.

Shared Memory Model Tasks share a common address space, which they read and write asynchronously. Various mechanisms such as locks / semaphores may be used to control access to the shared memory. An advantage of this model from the programmer's point of view is that the notion of data "ownership" is lacking, so there is no need to specify explicitly the communication of data between tasks. Program development can often be simplified.

Disadvantage Is difficult to understand and manage data locality. – Keeping data local to the processor that works on it conserves memory accesses, cache refreshes and bus traffic that occurs when multiple processors use the same data. – Unfortunately, controlling data locality is hard to understand and beyond the control of the average user.

Implementations The native compilers translate user program variables into actual memory addresses, which are global. Common distributed memory platform implementations does not exist. A shared memory view of data even though the physical memory of the machine was distributed, impended as virtual shared memory

Threads Model A single process can have multiple, concurrent execution paths. The main program loads and acquires all of the necessary system and user resources. It performs some serial work, and then creates a number of tasks (threads) that run concurrently.

Threads Cont. The word of a thread can be described as a subroutine within the main program. All the thread shares the memory space Each thread has local data. They save the overhead of replicating the program's resources. Threads communicate with each other through global memory. Threads requires synchronization constructs to insure that more than one thread is not updating the same global address at any time. Threads can come and go, but main thread remains present to provide the necessary shared resources until the application has completed.

Message Passing Model Message Passing Model is used – A set of tasks that use their own local memory during computation. – Multiple tasks can reside on the same physical machine as well across an arbitrary number of machines.

Message Passing Model Tasks exchange data through communications by sending and receiving messages. Data transfer usually requires cooperative operations to be performed by each process. The communication processes may exist on the same machine of different machines

Data Parallel Model Most of the parallel work focuses on performing operations on a data set. The data set is typically organized into a common structure. A set of tasks work collectively on the same data structure, each task works on a different partition of the same data structure.

Data Parallel Model Cont. Tasks perform the same operation on their partition of work. On shared memory architectures, all tasks may have access to the data structure through global memory. On distributed memory architectures the data structure is split up and resides as "chunks" in the local memory of each task.

Designing Parallel Algorithms The programmer is typically responsible for both identifying and actually implementing parallelism. Manually developing parallel codes is a time consuming, complex, error-prone and iterative process. Currently, The most common type of tool used to automatically parallelize a serial program is a parallelizing compiler or pre-processor.

A parallelizing compiler Fully Automatic – The compiler analyzes the source code and identifies opportunities for parallelism. – The analysis includes identifying inhibitors to parallelism and possibly a cost weighting on whether or not the parallelism would actually improve performance. – Loops (do, for) loops are the most frequent target for automatic parallelization. Programmer Directed – Using "compiler directives" or possibly compiler flags, the programmer explicitly tells the compiler how to parallelize the code. – May be able to be used in conjunction with some degree of automatic parallelization also.

Automatic Parallelization Limitations Wrong results may be produced Performance may actually degrade Much less flexible than manual parallelization Limited to a subset (mostly loops) of code May actually not parallelize code if the analysis suggests there are inhibitors or the code is too complex

The Problem & The Pogramm Determine whether or not the problem is one that can actually be parallelized. Identify the program's hotspots: – Know where most of the real work is being done. – Profilers and performance analysis tools can help here – Focus on parallelizing the hotspots and ignore those sections of the program that account for little CPU usage. Identify bottlenecks in the program – Identify areas where the program is slow, or bounded. – May be possible to restructure the program or use a different algorithm to reduce or eliminate unnecessary slow areas Identify inhibitors to parallelism. One common class of inhibitor is data dependence, as demonstrated by the Fibonacci sequence. Investigate other algorithms if possible. This may be the single most important consideration when designing a parallel application.

Partitioning Break the problem into discrete "chunks" of work that can be distributed to multiple tasks. – domain decomposition – functional decomposition.

Domain Decomposition The data associated with a problem is decomposed. Each parallel task then works on a portion of the data. This partition could be done in different ways. – Row, Columns, Blocks, Cyclic, etc.

Functional Decomposition The problem is decomposed according to the work that must be done. Each task then performs a portion of the overall work.

Communications Cost of communications Latency vs. Bandwidth Visibility of communications Synchronous vs. asynchronous communications Scope of communications – Point-to-point – Collective Efficiency of communications Overhead and Complexity

Synchronization Barrier Lock / semaphore Synchronous communication operations

Data Dependencies A dependence exists between program statements when the order of statement execution affects the results of the program. A data dependence results from multiple use of the same location(s) in storage by different tasks. Dependencies are important to parallel programming because they are one of the primary inhibitors to parallelism.

Load Balancing Load balancing refers to the practice of distributing work among tasks so that all tasks are kept busy all of the time. It can be considered a minimization of task idle time. Load balancing is important to parallel programs for performance reasons. For example, if all tasks are subject to a barrier synchronization point, the slowest task will determine the overall performance.