Communication Models for Parallel Computer Architectures 4 Two distinct models have been proposed for how CPUs in a parallel computer system should communicate.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
SE-292 High Performance Computing
Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
Multiple Processor Systems
UMA Bus-Based SMP Architectures
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Reference: Message Passing Fundamentals.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Inter Process Communication:  It is an essential aspect of process management. By allowing processes to communicate with each other: 1.We can synchronize.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Multiple Processor Systems 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Parallel Computer Architectures
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Switching Techniques Student: Blidaru Catalina Elena.
Advances in Language Design
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
DISTRIBUTED COMPUTING
Cotter-cs431 Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved Chapter 8 Multiple Processor Systems.
Outline Why this subject? What is High Performance Computing?
Computer performance issues* Pipelines, Parallelism. Process and Threads.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Computer Organization
MODERN OPERATING SYSTEMS Third Edition ANDREW S
Distributed Shared Memory
Lecture 21 Synchronization
CSC 4250 Computer Architectures
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
12.4 Memory Organization in Multiprocessor Systems
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
The University of Adelaide, School of Computer Science
Multiprocessors - Flynn’s taxonomy (1966)
Multiple Processor Systems
Constructing a system with multiple computers or processors
Switching Techniques.
Chapter 3 Part 3 Switching and Bridging
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Communication Models for Parallel Computer Architectures 4 Two distinct models have been proposed for how CPUs in a parallel computer system should communicate. –In the first model, all CPUs share a common physical memory. This kind of system is called a multiprocessor or shared memory system. –In the second design, each CPU has its own private memory. Such a design is called a multicomputer or distributed memory system.

Multiprocessors –Consider a program to find all of the objects in a bit-map image. One copy of the image is kept in memory. Each CPU runs a single process which inspects one section of the image. Some objects occupy multiple sections, so it is essential that each process have access to the entire image. –Example multiprocessors include: –Sun Enterprise 1000 –Sequent NUMA-Q –SGI Origin 2000 –HP/Convex Exemplar

Multiprocessors

Multicomputers –In a multicomputer solving the same problem, each CPU has a section of the image in its local memory. If the CPUs need to follow an object across the border, they must request the information from a neighboring CPU. –This is done via message passing. –Programming multicomputers is more difficult than programming multiprocessors, but they are more scalable. Building a multicomputer with 10,000 CPUs is straightforward.

Multicomputers

–Example multicomputers include: IBM SP/2 Intel/Sandia Option Red Wisconsin COW –Much research focuses on hybrid systems combining the best of both worlds. –Shared memory might be implemented at a higher-level than the hardware. The operating system might simulate a shared memory by providing a single system-wide paged shared address space. –This approach is called DSM (Distributed Shared Memory).

Shared Memory

Each machine has its own virtual memory and its own page table. When a CPU does a LOAD or STORE on a page it does not have, a trap to the OS occurs. The OS locates the page and asks the CPU currently holding it to unmap the page and send it over the interconnection network. When it arrives, the page is mapped in and the faulting instruction restarted. –A third possibility is to have a user-level runtime system implement a form of shared memory.

Shared Memory –The programming language provides a shared memory abstraction implemented by the compiler and runtime system. The Linda model is based on the abstraction of a shared space of tuples. –Processes can input a tuple from the shared tuple space or output a tuple to the shared tuple space. The Orca model allows shared generic objects. –Processes can execute object-specific methods on shared objects. –When a change occurs to the internal state of some object, it is up to the runtime system to simultaneously update all copies of the object.

Interconnection Networks 4 Multicomputers are held together by interconnection networks which move packets between CPUs and memory. –The CPUs and memory modules of multiprocessors are also interconnected. –Interconnection networks consist of: CPUs Memory modules Interfaces Links Switches

Interconnection Networks –The links are the physical channels over which bits move. They can be electrical or optical fiber serial or parallel simplex, half-duplex, or full duplex –The switches are devices with multiple input ports and multiple output ports. When a packet arrives at an input port on a switch some bits are used to select the output port to which the packet is sent.

Topology

Switching –An interconnection network consists of switches and wires connecting them. –The following slide shows an example. Each switch has four input ports and four output ports. In addition each switch has some CPUs and interconnect circuitry. The job of the switch is to accept packets arriving on any input port and send each one out on the correct output port. Each output port is connected to an input port of another switch by a parallel or serial line.

Switching

–Several switching strategies are possible. In circuit switching, before a packet is sent, the entire path from the source to the destination is reserved in advance. –All ports and buffers are claimed, so that when transmission starts, all necessary resources are guaranteed to be available and the bits can move at full speed from the source, through the switches to the destination. In store-and-forward packet switching, no advance reservation is needed. –The source sends a complete packet to the first switch where it is stored in its entirety. –The switches may need to buffer packets if an output port is busy.

Switching

Communication Methods –When a program is split up into pieces, the pieces (processes) often need to communicate with one another. –This communication can be done in one of two ways: shared variables explicit message passing Logical sharing of variables is possible even on a multicomputer. Message passing is easy to implement on a multiprocessor by simply copying from the sender to the receiver.

Communication Methods

Taxonomy of Parallel Computers 4 Although many researchers have tried to come up with a taxonomy of parallel computers, the only one which is widely used is that of Flynn (1972). 4 This classification is based on two concepts –instruction streams corresponding to a program counter –data streams consisting of a set of operands

Taxonomy of Parallel Computers

Memory Semantics –Even though all multiprocessors present the CPUs with the image of a single shared address space, often there are many memory modules present, each holding some portion of the physical memory. The CPUs and memories are often interconnected by a complex interconnection network. Several CPUs may be attempting to read a memory word at the same time several other CPUs are attempting to write the same word. Multiple copies of some blocks may be in caches.

Memory Semantics –One view of memory semantics is to view it as a contract between the software and the memory hardware. The rules are called consistency models, and many different ones have been proposed and implemented. For example, suppose that CPU 0 writes the value 1 to some memory word and a little later CPU 1 writes the value 2 to the same word. Now CPU 2 reads the word and gets the value 1. Is this an error?

Memory Semantics –The simplest model is strict consistency. With this model, any read to a location x, always returns the value of the most recent write to x. This model is great for programmers, but almost impossible to implement. –The next best model is called sequential consistency. The basic idea is that in the presence of multiple read and write requests, some interleaving of all the requests is chosen by the hardware (nondeterministically), but all CPUs see the same order.

Memory Semantics

–A looser consistency model, but one that is easier to implement on large multiprocessors, is processor consistency. It has two properties: Writes by any CPU are seen by all CPUs in the order they were issued. For every memory word, all CPUs see all writes to it in the same order. If CPU 1 issues writes with values 1A, 1B, and 1C to some memory location in that sequence, then all other processors see them in that order too. Every memory word has an unambiguous value after several CPUs write to it and stop.

Memory Semantics –Weak consistency does not even guarantee that writes from a single CPU are seen in that order. One CPU might see 1A before 1B and another CPU might see 1A after 1B. However, to add some order, weakly consistent memories have synchronization variables or a synchronization variable. –When a synchronization is executed, all pending writes are finished and no new ones are started until all the old ones are done and the synchronization itself is done. –In effect a synchronization “flushes the pipeline” and brings the memory to a stable state with no operations pending. –Time is divided into epochs delimited by the synchronizations.

Memory Semantics

–Weak consistency has the problem that it is quite inefficient because it must finish off all pending memory operations and hold all new ones until the current ones are done. –Release consistency improves matters by adopting a model akin to critical sections. The idea behind this model is that when a process exits a critical region it is not necessary to force all writes to complete immediately. It is only necessary to make sure that they are done before any process enters the critical region again.

Memory Semantics –In this model, the synchronization operation offered by weak consistency is split into two different operations. To read or write a shared data variable, a CPU must first do an acquire operation on the synchronization variable to get exclusive access to the shared data. When it is done, the CPU does a release operation on the synchronization variable to indicate that it is finished.