Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

MPI Message Passing Interface
Operating Systems Lecture 7.
COMMUNICATING SEQUENTIAL PROCESSES C. A. R. Hoare The Queen’s University Belfast, North Ireland.
Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
Parallel programming in Java. Java has 2 forms of support for parallel programming: –Multithreading Multiple threads of control (sub processes), useful.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Distributed Systems and Message Passing. 2 Context (Inter Process Communication, IPC) So far, we have studied concurrency in the context of Multiprogramming,
Concurrency: Mutual Exclusion and Synchronization Why we need Mutual Exclusion? Classical examples: Bank Transactions:Read Account (A); Compute A = A +
Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
1 Semaphores Special variable called a semaphore is used for signaling If a process is waiting for a signal, it is suspended until that signal is sent.
Concurrency CS 510: Programming Languages David Walker.
Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures.
Chapter 3: Processes. 3.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts - 7 th Edition, Feb 7, 2006 Chapter 3: Processes Process Concept.
Chapter 11: Distributed Processing Parallel programming Principles of parallel programming languages Concurrent execution –Programming constructs –Guarded.
Parallel programming in Java. Java has 2 forms of support for parallel programming: –Multithreading Multiple threads of control (sub processes), useful.
Chapter 3: Processes. Process Concept Process Scheduling Operations on Processes Cooperating Processes Interprocess Communication Communication in Client-Server.
02/02/2004CSCI 315 Operating Systems Design1 Interprocesses Communication Notice: The slides for this lecture have been largely based on those accompanying.
Communication in Distributed Systems –Part 2
Asynchronous Message Passing EE 524/CS 561 Wanliang Ma 03/08/2000.
Chapter 3: Processes. 3.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts - 7 th Edition, Feb 7, 2006 Chapter 3: Processes Process Concept.
02/01/2010CSCI 315 Operating Systems Design1 Interprocess Communication Notice: The slides for this lecture have been largely based on those accompanying.
1 Organization of Programming Languages-Cheng (Fall 2004) Concurrency u A PROCESS or THREAD:is a potentially-active execution context. Classic von Neumann.
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
Concurrency - 1 Tasking Concurrent Programming Declaration, creation, activation, termination Synchronization and communication Time and delays conditional.
Mapping Techniques for Load Balancing
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Chapter 9 Message Passing Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere2 Introduction.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication in Client-Server.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Chapter 3: Processes. 3.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts - 7 th Edition, Feb 7, 2006 Process Concept Process – a program.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
1 Concurrency Architecture Types Tasks Synchronization –Semaphores –Monitors –Message Passing Concurrency in Ada Java Threads.
1 Lecture 5 (part2) : “Interprocess communication” n reasons for process cooperation n types of message passing n direct and indirect message passing n.
Chapter 3: Processes. 3.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts - 7 th Edition, Feb 7, 2006 Outline n Process Concept n Process.
Processes. Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Cooperating Processes Interprocess Communication Communication.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
ICS 313: Programming Language Theory Chapter 13: Concurrency.
CS212: OPERATING SYSTEM Lecture 2: Process 1. Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-4 Process Communication Department of Computer Science and Software.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Lecture 3 Operating Systems.
PREPARED BY : MAZHAR JAVED AWAN BSCS-III Process Communication & Threads 4 December 2015.
Concurrency Control 1 Fall 2014 CS7020: Game Design and Development.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Orca A language for parallel programming of distributed systems.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
Stacey Levine Chapter 4.1 Message Passing Communication.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
 Process Concept  Process Scheduling  Operations on Processes  Cooperating Processes  Interprocess Communication  Communication in Client-Server.
Agenda  Quick Review  Finish Introduction  Java Threads.
Channels. Models for Communications Synchronous communications – E.g. Telephone call Asynchronous communications – E.g. .
Gokul Kishan CS8 1 Inter-Process Communication (IPC)
Message passing model. buffer producerconsumer PRODUCER-CONSUMER PROBLEM.
Orca A language for parallel programming of distributed systems.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
Interprocess Communication and Synchronization based on Message Passing.
03 – Remote invoaction Request-reply RPC RMI Coulouris 5
“Language Mechanism for Synchronization”
Processes Overview: Process Concept Process Scheduling
L21: Putting it together: Tree Search (Ch. 6)
Lecture 2: Processes Part 1
Course Outline Introduction in algorithms and applications
Channels.
Subject : T0152 – Programming Language Concept
Parallel programming in Java
Presentation transcript:

Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures Programming methods, languages, and environments Programming methods, languages, and environments Message passing (SR, MPI, Java) Higher-level languages: HPF Applications Applications N-body problems, search algorithms, bioinformatics Grid computing Multimedia content analysis on Grids (guest lecture Frank Seinstra) (guest lecture Frank Seinstra) Course Outline

Approaches to Parallel Programming Sequential language + library MPI, PVM Extend sequential language C/Linda, Concurrent C++ New languages designed for parallel or distributed programming SR, occam, Ada, Orca

Paradigms for Parallel Programming Processes + shared variables Processes + message passing Concurrent object-oriented languages Concurrent functional languages Concurrent logic languages Data-parallelism (SPMD model) - SR and MPI Java - -HPF

Interprocess Communication and Synchronization based on Message Passing Henri Bal

Overview Message passing Naming the sender and receiver Explicit or implicit receipt of messages Synchronous versus asynchronous messages Nondeterminism Select statement Example language: SR (Synchronizing Resources) Traveling Salesman Problem in SR Example library: MPI (Message Passing Interface)

Point-to-point Message Passing Basic primitives: send & receive As library routines: send(destination, & MsgBuffer) receive(source, &MsgBuffer) As language constructs send MsgName(arguments) to destination receive MsgName(arguments) from source

Direct naming Sender and receiver directly name each other S: send M to R R: receive M from S Asymmetric direct naming (more flexible): S: send M to R R: receive M Direct naming is easy to implement Destination of message is know in advance Implementation just maps logical names to machine addresses

Indirect naming Indirect naming uses extra indirection level R: send M to P -- P is a port name S: receive M from P Sender and receiver need not know each other Port names can be moved around (e.g., in a message) send ReplyPort(P) to U -- P is name of reply port Most languages allow only a single process at a time to receive from any given port Some languages allow multiple receivers that service messages on demand -> called a mailbox

Explicit Message Receipt Explicit receive by an existing process Receiving process only handles message when it is willing to do so process main() { // regular computation here receive M( ….); // explicit message receipt // code to handle message // more regular computations …. }

Implicit message receipt Receipt by a new thread of control, created for handling the incoming message int X; process main( ) { // just regular computations, this code can access X } message-handler M( ) // created whenever a message M arrives { // code to handle the message, can also access X }

Threads Threads run in (pseudo-) parallel on the same node Each thread has its own program counter and local variables Threads share global variables mainMM X time

Differences (1) Implicit receipt is used if it’s unknown when a message will arrive; example: request for remote data int X; process main( ) { // regular computations } int message-handler readX( S) { send valueX(X) to S } process main() { int X; while (true) { if (there is a message readX) { receive readX(S); send valueX(X) to S } // regular computations }

Differences (2) Explicit receive gives more control over when to accept which messages; e.g., SR allows: receive ReadFile(file, offset, NrBytes) by NrBytes // sorts messages by (increasing) 3rd parameter, i.e. small reads go first // sorts messages by (increasing) 3rd parameter, i.e. small reads go first MPI has explicit receive (+ polling for implicit receive) Java has implicit receive: Remote Method Invocation (RMI) SR has both

Synchronous vs. asynchronous Message Passing Synchronous message passing: Sender is blocked until receiver has accepted the message Too restrictive for many parallel applications Asynchronous message passing: Sender continues immediately More efficient Ordering problems Buffering problems

Ordering with asynchronous message passing SENDER: RECEIVER: send message(1)receive message(N); print N send message(2)receive message(M); print M Messages may be received in any order, depending on the protocol Message ordering message(1) message(2)

Example: AT&T crash P2P1 Are you still alive? P2P1P1 crashesP1 is dead P2P1 I’m back Regular message Something’s wrong, I’d better crash! P2P1P2 is dead

Message buffering Keep messages in a buffer until the receive( ) is done What if the buffer overflows? Continue, but delete some messages (e.g., oldest one), or Use flow control: block the sender temporarily Flow control changes the semantics since it introduces synchronization S: send zillion messages to R; receive messages R: send zillion messages to S; receive messages -> deadlock!

Nondeterminism Interactions may depend on run-time conditions e.g.: wait for a message from either A or B, whichever comes first Need to express and control nondeterminism specify when to accept which message Example (bounded buffer): do simultaneously when buffer not full: accept request to store message when buffer not empty: accept request to fetch message

Select statement several alternatives of the form: WHEN condition => RECEIVE message DO statement Each alternative may succeed, if condition=true & a message is available fail, if condition=false suspend, if condition=true & no message available yet Entire select statement may succeed, if any alternative succeeds -> pick one nondeterministically fail, if all alternatives fail suspend, if some alternatives suspend and none succeeds yet

Example: bounded buffer select when not FULL(BUFFER) => receive STORE_ITEM(X: INTEGER) do ‘store X in buffer’ end; or when not EMPTY(BUFFER) => receive FETCH_ITEM(X: out INTEGER) do X := ‘first item from buffer’ end; end select;

Synchronizing Resources (SR) Developed at University of Arizona Goals of SR: Expressiveness Many message passing primitives Ease of use Minimize number of underlying concepts Clean integration of language constructs Efficiency Each primitive must be efficient

Overview of SR Multiple forms of message passing Asynchronous message passing Rendezvous (synchronous send, explicit receipt) Remote Procedure Call (synchronous send, implicit receipt) Multicast (many receivers) Powerful receive-statement Conditional & ordered receive, based on contents of message Select statement Resource = module run on 1 node (uni/multiprocessor) Contains multiple threads that share variables

Orthogonality in SR The send and receive primitives can be combined in all 4 possible ways

Example body S #sender send R.m1 #asynchr. mp send R.m2 # fork call R.m1 # rendezvous call R.m2 # RPC end S body R #receiver proc M2( ) # implicit receipt # code to handle M2 end initial # main process of R do true -> #infinite loop in m1( ) # explicit receive # code to handle m1 ni od end end R

Traveling Salesman Problem (TSP) in SR Find shortest route for salesman among given set of cities Each city must be visited once, no return to initial city Saint Louis Miami Chicago New York

Sequential branch-and-bound Structure the entire search space as a tree, sorted using nearest-city first heuristic n csm cs m sm scc c m ms

Pruning the search tree Keep track of best solution found so far (the “bound”) Cut-off partial routes >= bound n csm cs m sm scc c m ms Length=6 Can be pruned

Parallelizing TSP Distribute the search tree over the CPUs CPUs analyze different routes Results in reasonably large-grain jobs

Distribution of TSP search tree n csm cs m sm scc c m ms CPU 1CPU 2CPU 3 Subtasks: - New York -> Chicago - New York -> Saint Louis - New York -> Miami

Distribution of the tree (2) Static distribution: each CPU gets a fixed part of the tree Load balancing problem: subtrees take different amounts of time n csm cs m sm scc c m m s

Dynamic distribution: Replicated Workers Model Master process generates large number of jobs (subtrees) and repeatedly hands them out Worker processes (subcontractors) repeatedly take work and execute it 1 worker per processor General, frequently-used model for parallel processing

Implementing TSP in SR Need communication to distribute work Need communication to implement global bound

Distributing work Master generates jobs to be executed by workers Not known in advance which worker will execute which job A “mailbox” (port with >1 receivers) would have helped Use intermediate buffer process instead Masterbuffer workers

Implementing the global bound Problem: the bound is a global variable, but it must be implemented with message passing The bound is accessed millions of times, but updated only when a better route is found Only efficient solution is to manually replicate it

Managing a replicated variable in SR Use a BoundManager process to serialize updates BoundManagerWorker 1Worker 2 MMM M = copy of global Minimum M := 3 Assign(M,3) Update(M,3) Process 2 assigns to M Assign: asynchr. + explicit ordered recv. Update: synchr.+implicit recv.+multicast

SR code fragments for TSP body worker var M: int := Infinite # copy of bound sem sema # semaphore proc update(value: int) P(sema) # lock copy M := value V(sema) # unlock end update initial # main code for worker - can read M (using sema) - can use send BoundManager.Assign(value) body BoundManager var M: int := Infinite do true -> # handle requests 1 by 1 in Assign(value) by value -> if value M := value co(i := 1 to ncpus) # multicast call worker[i].update(value) co fi ni od end BoundManager

Search overhead n csm cs m sm scc c m ms CPU 1CPU 2CPU 3 Problem Path with length=6 not yet computed by CPU 1 when CPU 3 starts n->m->s Parallel algorithm does more work than sequential algorithm: search overhead Not pruned :-(

Performance of TSP in SR Communication overhead Distribution of jobs + updating the global bound (small overhead) Load imbalances Replicated worker model has automatic load balancing Synchronization overhead Mutual exclusion (locking) needed for accessing copy of bound Search overhead Main performance problem In practice: high speedups possible