Parallel Programming. Introduction Idea has been around since 1960’s –pseudo parallel systems on multiprogram-able computers True parallelism –Many processors.

Slides:



Advertisements
Similar presentations
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
Advertisements

Operating Systems Lecture 7.
Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita.
Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6: Process Synchronization
8a-1 Programming with Shared Memory Threads Accessing shared data Critical sections ITCS4145/5145, Parallel Programming B. Wilkinson Jan 4, 2013 slides8a.ppt.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Monitors Chapter 7. The semaphore is a low-level primitive because it is unstructured. If we were to build a large system using semaphores alone, the.
Computer Systems/Operating Systems - Class 8
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
Reference: Message Passing Fundamentals.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Concurrency. What is Concurrency Ability to execute two operations at the same time Physical concurrency –multiple processors on the same machine –distributing.
CS220 Software Development Lecture: Multi-threading A. O’Riordan, 2009.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.
3.5 Interprocess Communication
Chapter 11: Distributed Processing Parallel programming Principles of parallel programming languages Concurrent execution –Programming constructs –Guarded.
Multithreading in Java Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1 Organization of Programming Languages-Cheng (Fall 2004) Concurrency u A PROCESS or THREAD:is a potentially-active execution context. Classic von Neumann.
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
CS 470/570:Introduction to Parallel and Distributed Computing.
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
Computer Architecture Parallel Processing
A Bridge to Your First Computer Science Course Prof. H.E. Dunsmore Concurrent Programming Threads Synchronization.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings 1.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Concurrency (Based on:Concepts of Programming Languages, 8th edition, by Robert W. Sebesta, 2007)
Concurrency, Mutual Exclusion and Synchronization.
Threads in Java. History  Process is a program in execution  Has stack/heap memory  Has a program counter  Multiuser operating systems since the sixties.
12/1/98 COP 4020 Programming Languages Parallel Programming in Ada and Java Gregory A. Riccardi Department of Computer Science Florida State University.
Quick overview of threads in Java Babak Esfandiari (extracted from Qusay Mahmoud’s slides)
Dr. R R DOCSIT, Dr BAMU. Basic Java : Multi Threading 2 Objectives of This Session State what is Multithreading. Describe the life cycle of Thread.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
CPS 506 Comparative Programming Languages
CSCI-455/552 Introduction to High Performance Computing Lecture 19.
1 Concurrency Architecture Types Tasks Synchronization –Semaphores –Monitors –Message Passing Concurrency in Ada Java Threads.
Programming Languages Third Edition Chapter 13 Parallel Programming.
CSC321 Concurrent Programming: §5 Monitors 1 Section 5 Monitors.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
Multithreading in Java Sameer Singh Chauhan Lecturer, I. T. Dept., SVIT, Vasad.
ICS 313: Programming Language Theory Chapter 13: Concurrency.
Chapter 7 -1 CHAPTER 7 PROCESS SYNCHRONIZATION CGS Operating System Concepts UCF, Spring 2004.
Threads Doing Several Things at Once. Threads n What are Threads? n Two Ways to Obtain a New Thread n The Lifecycle of a Thread n Four Kinds of Thread.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
1 5-High-Performance Embedded Systems using Concurrent Process (cont.)
Semaphores Chapter 6. Semaphores are a simple, but successful and widely used, construct.
Concurrency in Java MD. ANISUR RAHMAN. slide 2 Concurrency  Multiprogramming  Single processor runs several programs at the same time  Each program.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
Chapter 4 – Thread Concepts
Chapter 3: Process Concept
Chapter 4 – Thread Concepts
Computer Engg, IIT(BHU)
Programming with Shared Memory
Multithreaded Programming
Shared Memory Programming
Semaphores Chapter 6.
Chapter 4: Threads & Concurrency
- When you approach operating system concepts there might be several confusing terms that may look similar but in fact refer to different concepts:  multiprogramming, multiprocessing, multitasking,
Programming with Shared Memory Specifying parallelism
Operating System Overview
Presentation transcript:

Parallel Programming

Introduction Idea has been around since 1960’s –pseudo parallel systems on multiprogram-able computers True parallelism –Many processors connected to run in concert Multiprocessor system Distributed system –stand-alone systems connected –More complex with high-speed networks

Programming Languages Used to express algorithms to solve problems presented by parallel processing systems Used to write OSs that implement these solutions Used to harness capabilities of multiple processors efficiently Used to implement and express communication across networks

Two kinds of parallelism Existing in underlying hardware As expressed in programming language –May not result in actual parallel processing –Could be implemented with pseudo parallelism –Concurrent programming – expresses only potential for parallelism

Some Basics Process –An instance of a program or program part that has been scheduled for independent execution Heavy-weight process –full-fledged independent entity with all the memory and other resources that are ordinarily allocated by OS Light-weight process or thread –shares resources with program it came from

Process states Blocked – waiting for a resource Executing – in possession of processor Waiting – waiting to be schedule

Primary requirements for organization Must be a way for processors to synchronize their activities –1 st processor input and sorts data –2 nd processor waits to perform computations on sorted data Must be a way for processors to communicate data among themselves –2 nd processor needs data

Architectures SIMD (single-instruction, multiple-data) –One processor is controller –All processors execute same instructions on respective registers or data sets –Multiprocessing –Synchronous (all processors operate at same speed) –Implicit solution to synchronization problem MIMD (multiple-instruction, multiple-data) –All processors act independently –Multiprocessor or distributed processor systems –Asynchronous (synchronization critical problem)

Memory Shared-memory (one central memory) –Used in multiprocessor systems Distributed-memory –Each processor has its own independent memory –Communication critical here

More terms Mutual exclusion (synchronize access to shared memory) Without mutual exclustion  Race condition (interleaved modifications to memory, for example) Deadlock (processes waiting on each other to unblock)

OS requirements for Parallelism Means of creating and destroying processes Means of managing the number of processors used by processes Mechanism for ensuring mutual exclusion on shared-memory systems Mechanism for creating and maintaining communication channels between processors on distributed-memory systems

Language requirements Machine independence Adhere to language design principles Some languages use shared-memory model and provide facilities for mutual exclusion through a library Some assume distributed-memory model and provide communication facilities A few include both

Common mechanisms Threads Semaphores Monitors Message passing

2 common sample problems Bounded buffer problem –similar to producer-consumer problem Parallel matrix multiplication –N 3 algorithm –Assign a process to compute each element, each process on a separate processor  N steps

Without explicit language facilities One approach is not to be explicit –Possible in some functional, logical, and OO languages –Certain inherent parallelism implicit Language translators use optimization techniques to make use automatically of OS utilities to assign different processors to different parts of program Suboptimal

Another alternative without explicit language facilities Translator offers compiler options to allow explicit indicating of areas where parallelism is called for. Most effective in nested loops Example: Fortran

integer a(100, 100), b(100, 100), c(100,100) integer i, j, k, numprocs, err numprocs = 10 C code to read in a and b goes here err = m_set_procs (numprocs) C$doacross share (a, b, c), local (j, k) do 10 i = 1, 100 do 10 j = 1, 100 c(i,j) = 0 do 10 k = 1, 100 c(i, j) = c(i,j) + a(i, k) * b (k, j) 10 continue call m_kill_procs C code to write out c goes here end compiler directive synchronizes the processes, all processes wait for entire loop to finish; one process continues after loop local – local to process share – access by all processes m_set_procs –sets the number of processes

3 rd way with explicit constructs Provide a library of functions This passes facilities provided by OS directly to programmer (This is the same as providing it in language) Example: C with library parallel.h

#include #define size 100 #define NUMPROCS 10 shared int a[SIZE][SIZE], b[SIZE][SIZE], c [SIZE] [SIZE] void multiply (void) { int i, j, k; for (i=m_get_myid(); i < SIZE; i += NUMPROCS) for (j=0; j < SIZE; j++) for (k=0; k < SIZE; k++) c(i, j) += a(i, k) * b (k, j); } main () { int err; // code to read in a and b goes here m_set_procs (NUMPROCS); m_fork (multiply); m_kill_procs (); // C code to write out c goes here return 0; } m_set_procs –creates the 10 processes, all instances of multiply

4 th final alternative Simply rely on OS Example: –pipes in Unix OS ls | grep “java” –runs ls and grep in parallel –output of ls is piped to grep

Language with explicit mechanism 2 basic ways to create new processes –SPMD (single program multiple data) split the current process into 2 or more that execute copies of the same program –MPMD (multiple program multiple data) a segment of code associated with each new process typical case fork-join model, in which a process creates several child processes, each with its own code (a fork), and then waits for the children to complete their execution (a join) last example similar, but m_kill_procs takes place of join

Granularity Size of code assignable to separate processes –fine-grained: statement-level parallelism –medium-grained: procedure-level parallelism –large-grained: program-level parallelism Can be an issue in program efficiency –small-grained: overhead –large-grained: may not exploit all opportunities for parallelism

Thread fine-grained or medium-grained without overhead of full-blown process creation

Issues Does parent suspend execution while child processes are executing, or does it continue to execute alongside them? What memory, if any, does a parent share with its children or the children share among themselves?

Answers in Last example parent process suspended execution indicate explicitly global variables shared by all processes

Process Termination Simplest case –a process executes its code to completion then ceases to exist Complex case –process may need to continue executing until a certain condition is met and then terminate

Statement-Level Parallelism (Ada) parbegin S1; S2; … Sn; parend;

Statement-Level Parallelism (Fortran95) FORALL (I = 1:100, J=1:100) C(I,J) = 0; DO 10 K = 1,100 C(I,J) = C(I,J) + A(I,k) * B(K,j) 10 CONTINUE END FORALL

Procedure-Level Parallelism (Ada) x = newprocess(p); … killprocess(x); where p is declared procedure and x is a process designator similar to tasks in Ada

Program-Level Parallelism (Unix) fork creates a process that is an exact copy of calling process if (fork ( ) == 0) { /*..child executes this part */} else { /*..parent executes this part */} a returned 0-value indicates process is the child

Java threads built into Java Thread class part of java.lang package reserved word synchronize –establish mutual exclusion create an instance of Thread object define its run method that will execute when thread starts

Java threads 2 ways (I’ll show you second more versatile way) Define a class that implements Runnable interface (define run method) Then pass an object of this class to the Thread constructor Note: Every Java program is already executing inside a thread whose run method is main.

Java Thread Example class MyRunner implements Runnable { public void run() { … } } MyRunner m = new MyRunner (); Thread t = new Thread (m); t.start (); //t will now execute the run //method

Destroying threads let each thread run to completion wait for other threads to finish t.start (); //do some other work t.join () //wait for t to finish interrupt it t.start (); //do some other work t.interrupt() //tell t we are waiting… t.join () //wait for t to finish

Mutual exclusion class Queue { … synchronized public Object dequeue () { if (empty()) throw … } synchronized public Object enqueue (Object obj) { … } … }

Mutual exclusion class Remover implements Runnable { public Remover (Queue q) {..} public void run( ) { …q.dequeue() …} } class Insert implements Runnable { public Insert (Queue q) {…} public void run () { …q.enqueue (…) …} }

Mutual exclusion Queue myqueue = new Queue(..); … Remover r = new Remover (q); Inserter i = new Insert (q); Thread t1 = new Thread (r); Thread t2 = new Thread (i); t1.start(); t2.start();

Manually stalling a thread and then reawakening it class Queue { … synchronized public Object dequeue () { try { while (empty()) wait(); } catch (InterruptedException e) //reset interrupt { … } } synchronized public Object enqueue (Object obj) { … notifyAll(); } … }