Interprocess Communication and Synchronization based on Message Passing.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
Silberschatz and Galvin  Operating System Concepts Module 16: Distributed-System Structures Network-Operating Systems Distributed-Operating.
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
Concurrency: Mutual Exclusion and Synchronization Why we need Mutual Exclusion? Classical examples: Bank Transactions:Read Account (A); Compute A = A +
Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
1 Semaphores Special variable called a semaphore is used for signaling If a process is waiting for a signal, it is suspended until that signal is sent.
Concurrency CS 510: Programming Languages David Walker.
Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures.
Inter Process Communication:  It is an essential aspect of process management. By allowing processes to communicate with each other: 1.We can synchronize.
Chapter 11: Distributed Processing Parallel programming Principles of parallel programming languages Concurrent execution –Programming constructs –Guarded.
Communication in Distributed Systems –Part 2
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Concurrency - 1 Tasking Concurrent Programming Declaration, creation, activation, termination Synchronization and communication Time and delays conditional.
Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures.
Client Server Model and Software Design TCP/IP allows a programmer to establish communication between two application and to pass data back and forth.
Mapping Techniques for Load Balancing
Chapter 26 Client Server Interaction Communication across a computer network requires a pair of application programs to cooperate. One application on one.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Chapter 9 Message Passing Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere2 Introduction.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication in Client-Server.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
1 Concurrency Architecture Types Tasks Synchronization –Semaphores –Monitors –Message Passing Concurrency in Ada Java Threads.
1 Lecture 5 (part2) : “Interprocess communication” n reasons for process cooperation n types of message passing n direct and indirect message passing n.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
ICS 313: Programming Language Theory Chapter 13: Concurrency.
CS212: OPERATING SYSTEM Lecture 2: Process 1. Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
1 Client-Server Interaction. 2 Functionality Transport layer and layers below –Basic communication –Reliability Application layer –Abstractions Files.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Orca A language for parallel programming of distributed systems.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
 Process Concept  Process Scheduling  Operations on Processes  Cooperating Processes  Interprocess Communication  Communication in Client-Server.
Channels. Models for Communications Synchronous communications – E.g. Telephone call Asynchronous communications – E.g. .
Gokul Kishan CS8 1 Inter-Process Communication (IPC)
Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.
Message passing model. buffer producerconsumer PRODUCER-CONSUMER PROBLEM.
Communication in Distributed Systems. . The single most important difference between a distributed system and a uniprocessor system is the interprocess.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
Last Class: Introduction
Chapter 3: Process Concept
03 – Remote invoaction Request-reply RPC RMI Coulouris 5
“Language Mechanism for Synchronization”
Prof. Leonardo Mostarda University of Camerino
Operating System Concepts
Chapter 3 Internet Applications and Network Programming
Processes Overview: Process Concept Process Scheduling
L21: Putting it together: Tree Search (Ch. 6)
CMSC 611: Advanced Computer Architecture
Concurrency: Mutual Exclusion and Synchronization
Chapter 4: Processes Process Concept Process Scheduling
Lecture 2: Processes Part 1
Recap OS manages and arbitrates resources
Inter Process Communication (IPC)
Sarah Diesburg Operating Systems COP 4610
Threads Chapter 4.
Background and Motivation
Channels.
Subject : T0152 – Programming Language Concept
Channels.
Channels.
Distributed Resource Management: Distributed Shared Memory
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Programming Parallel Computers
Presentation transcript:

Interprocess Communication and Synchronization based on Message Passing

Approaches to Parallel Programming Sequential language + library MPI, PVM Extend sequential language C/Linda, Concurrent C++ New languages designed for parallel or distributed programming SR, occam, Ada, Orca

Paradigms for Parallel Programming Processes + shared variables Processes + message passing Concurrent object-oriented languages Concurrent functional languages Concurrent logic languages Data-parallelism (SPMD model) Advanced communication models

Overview Message passing General issues Examples: rendezvous, Remote Procedure Calls, Broadcast Nondeterminism Select statement Example language: SR (Synchronizing Resources) Traveling Salesman Problem in SR Example library: MPI (Message Passing Interface)

Point-to-point Message Passing Basic primitives: send & receive As library routines: send(destination, & MsgBuffer) receive(source, &MsgBuffer) As language constructs send MsgName(arguments) to destination receive MsgName(arguments) from source

Issues in Message Passing Naming the sender and receiver Explicit or implicit receipt of messages Synchronous versus asynchronous messages

Direct naming Sender and receiver directly name each other S: send M to R R: receive M from S Asymmetric direct naming (more flexible): S: send M to R R: receive M Direct naming is easy to implement Destination of message is know in advance Implementation just maps logical names to machine addresses

Indirect naming Indirect naming uses extra indirection level R: send M to P -- P is a port name S: receive M from P Sender and receiver need not know each other Port names can be moved around (e.g., in a message) send ReplyPort(P) to U -- P is name of reply port Most languages allow only a single process at a time to receive from any given port Some languages allow multiple receivers that service messages on demand -> called a mailbox

Explicit Message Receipt Explicit receive by an existing process Receiving process only handles message when it is willing to do so process main() { // regular computation here receive M( ….); // explicit message receipt // code to handle message // more regular computations …. }

Implicit message receipt Receipt by a new thread of control, created for handling the incoming message int x; process main( ) { // just regular computations, this code can access X } message-handler M( ) // created whenever a message M arrives { // code to handle the message, can also access X }

Differences Implicit receipt is used if it’s unknown when a message will arrive (e.g., request for data) Explicit receive gives more control over when to accept which messages; e.g., SR allows: receive ReadFile(file, offset, NrBytes) by NrBytes // sorts messages by (increasing) 3rd parameter, i.e. small reads go first // sorts messages by (increasing) 3rd parameter, i.e. small reads go first

Synchronous vs. asynchronous Message Passing Synchronous message passing: Sender is blocked until receiver has accepted the message Too restrictive for many parallel applications Asynchronous message passing: Sender continues immediately More efficient Ordering problems Buffering problems

Ordering with asynchronous message passing SENDER: RECEIVER: send message(1)receive message(N); print N send message(2)receive message(M); print M Messages may be received in any order, depending on the protocol Message ordering message(1) message(2)

Example: AT&T crash P2P1 Are you still alive? P2P1P1 crashesP1 is dead P2P1 I’m back Regular message Something’s wrong, I’d better crash! P2P1P2 is dead

Message buffering Keep messages in a buffer until the receive( ) is done What if the buffer overflows? Continue, but delete some messages (e.g., oldest one), or Use flow control: block the sender temporarily Flow control changes the semantics since it introduces synchronization S: send zillion messages to R; receive messages R: send zillion messages to S; receive messages -> deadlock!

Example communication primitives Rendezvous (Ada) Remote Procedure Call (RPC) Broadcast

Rendezvous (Ada) Two-way interaction Synchronous (blocking) send Explicit receive Output parameters sent back to caller Entry = procedure implemented by a task that can be called remotely

Example task SERVER is entry INCREMENT(X: integer; Y: out integer); end; entry call: S.INCREMENT(2, A) -- invoke entry of task S

Accept statement task body SERVER is begin accept INCREMENT(X: integer; Y: out integer) do Y := X + 1; -- handle entry call end; …... end; Entry call is fully synchronous Invoker waits until server is ready to accept Accept statement waits for entry call Caller proceeds after accept statement has been executed

Remote Procedure Call (RPC) Similar to traditional procedure call Caller and receiver are different processes Possibly on different machines Fully synchronous Sender waits for RPC to complete Implicit message receipt New thread of control within receiver

Broadcast Many networks (e.g., Ethernet) support: broadcast: send message to all machines multicast: send messages to a set of machines Hardware multicast is very efficient Ethernet: same delay as for a unicast Multicast can be made reliable using software protocols

Nondeterminism Interactions may depend on run-time conditions e.g.: wait for a message from either A or B, whichever comes first Need to express and control nondeterminism specify when to accept which message Example (bounded buffer): do simultaneously when buffer not full: accept request to store message when buffer not empty: accept request to fetch message

Select statement several alternatives of the form: WHEN condition => ACCEPT message DO statement Each alternative may succeed, if condition=true & a message is available fail, if condition=false suspend, if condition=true & no message available yet Entire select statement may succeed, if any alternative succeeds -> pick one nondeterministically fail, if all alternatives fail suspend, if some alternatives suspend and none succeeds yet

Example: bounded buffer in Ada select when not FULL(BUFFER) => accept STORE_ITEM(X: INTEGER) do ‘store X in buffer’ end; or when not EMPTY(BUFFER) => accept FETCH_ITEM(X: out INTEGER) do X := ‘first item from buffer’ end; end select;

Synchronizing Resources (SR) Developed at University of Arizona Goals of SR: Expressiveness Many message passing primitives Ease of use Minimize number of underlying concepts Clean integration of language constructs Efficiency Each primitive must be efficient

Overview of SR Multiple forms of message passing Asynchronous message passing Rendezvous (explicit receipt) Remote Procedure Call (implicit receipt) Multicast Powerful receive-statement Conditional & ordered receive, based on contents of message Select statement Resource = module run on 1 node (uni/multiprocessor) Contains multiple threads that share variables

Orthogonality in SR The send and receive primitives can be combined in all 4 possible ways

Example body S #sender send R.m1 #asynchr. mp send R.m2 # fork call R.m1 # rendezvous call R.m2 # RPC end S body R #receiver proc M2( ) # implicit receipt # code to handle M2 end initial # main process of R do true -> #infinite loop in m1( ) # explicit receive # code to handle m1 ni od end end R

Traveling Salesman Problem (TSP) in SR Find shortest route for salesman among given set of cities Each city must be visited once, no return to initial city Saint Louis Miami Chicago New York

Sequential branch-and-bound Structure the entire search space as a tree, sorted using nearest-city first heuristic n csm cs m sm scc c m ms

Pruning the search tree Keep track of best solution found so far (the “bound”) Cut-off partial routes >= bound n csm cs m sm scc c m ms Length=6 Can be pruned

Parallelizing TSP Distribute the search tree over the CPUs CPUs analyze different routes Results in reasonably large-grain jobs

Distribution of TSP search tree n csm cs m sm scc c m ms CPU 1CPU 2CPU 3 Subtasks: - New York -> Chicago - New York -> Saint Louis - New York -> Miami

Distribution of the tree (2) Static distribution: each CPU gets a fixed part of the tree Load balancing problem: subtrees take different amounts of time n csm cs m sm scc c m m s

Dynamic distribution: Replicated Workers Model Master process generates large number of jobs (subtrees) and repeatedly hands them out Worker processes (subcontractors) repeatedly take work and execute it 1 worker per processor General, frequently-used model for parallel processing

Implementing TSP in SR Need communication to distribute work Need communication to implement global bound

Distributing work Master generates jobs to be executed by workers Not known in advance which worker will execute which job A “mailbox” (port with >1 receivers) would have helped Use intermediate buffer process instead Masterbuffer workers

Implementing the global bound Problem: the bound is a global variable, but it must be implemented with message passing The bound is accessed millions of times, but updated only when a better route is found Only efficient solution is to manually replicate it

Managing a replicated variable in SR Use a BoundManager process to serialize updates BoundManagerWorker 1Worker 2 MMM M = copy of global Minimum M := 3 Assign(M,3) Update(M,3) Process 2 assigns to M Assign: asynchr. + explicit ordered recv. Update: synchr.+implicit recv.+multicast

SR code fragments for TSP body worker var M: int := Infinite # copy of bound sem sema # semaphore proc update(value: int) P(sema) # lock copy M := value V(sema) # unlock end update initial # main code for worker - can read M (using sema) - can use send BoundManager.Assign(value) body BoundManager var M: int := Infinite do true -> # handle requests 1 by 1 in Assign(value) by value -> if value M := value co(i := 1 to ncpus) # multicast call worker[i].update(value) co fi ni od end BoundManager

Search overhead n csm cs m sm scc c m ms CPU 1CPU 2CPU 3 Problem Path with length=6 not yet computed by CPU 1 when CPU 3 starts n->m->s Parallel algorithm does more work than sequential algorithm: search overhead Not pruned :-(

Performance of TSP in SR Communication overhead Distribution of jobs + updating the global bound (small overhead) Load imbalances Replicated worker model has automatic load balancing Synchronization overhead Mutual exclusion (locking) needed for accessing copy of bound Search overhead Main performance problem In practice: high speedups possible