Download presentation
Presentation is loading. Please wait.
1
Multiprocessor System Distributed System
2
Multiprocessor System
3
Multiprocessor A shared-memory multiprocessor (or just multiprocessor henceforth) is a computer system in which two or more CPUs share full access to a common RAM. A program running on any of the CPUs sees a normal (usually paged) virtual address space. The only unusual property this system has is that the CPU can write some value into a memory word and then read the word back and get a different value (because another CPU has changed it)
4
Comparison Multiprocessor / Single processor
For the most part, multiprocessor operating systems are just regular operating systems. They handle system calls, do memory management, provide a file system, and manage I/O devices Unique features:- These include process synchronization, resource management, and scheduling
5
2 types of multiprocessor system
These machines are called UMA (Uniform Memory Access) multiprocessors. multiprocessors have the additional property that every memory word can be read as fast as every other memory word In contrast, NUMA (Nonuniform Memory Access) multiprocessors do not provide read of every memory as fast as every other memory word
6
UMA Bus-Based SMP Architectures
7
UMA system without caching
The simplest multiprocessors are based on a single bus (Fig. A) Two or more CPUs and one or more memory modules all use the same bus for communication When a CPU wants to read a memory word, it first checks to see if the bus is busy. If the bus is idle, the CPU puts the address of the word it wants on the bus, asserts a few control signals, and waits until the memory puts the desired word on the bus. Drawback: Waiting, Waiting increases with number of CPU
8
With caching The solution to this problem is to add a cache to each CPU, as depicted in Fig.(b). The cache can be inside the CPU chip, next to the CPU chip, on the processor board. Since many reads can now be satisfied out of the local cache, there will be much less bus traffic, and the system can support more CPUs. In general, caching is done on the basis of 32- or 64-byte blocks. When a word is referenced, its entire block is fetched into the cache of the CPU touching it. Each cache block is marked as being either read-only or as read-write.
9
With caching If a CPU attempts to write a word that is in one or more remote caches, the bus hardware detects the write and puts a signal on the bus informing all other caches of the write. If other caches have a “clean” copy, that is, an exact copy of what is in memory, they can just discard their copies and let the writer fetch the cache block from memory before modifying it. Many cache transfer protocols exist.
10
With Caching With Local Memory
Yet another possibility is the design of Fig.(c) In which each CPU has not only a cache, but also a local, private memory which it accesses over a dedicated (private) bus. To use this configuration optimally, For example compiler should place all the program text, strings, constants and other read-only data, stacks, and local variables in the private memories. The shared memory is then only used for writable shared variables. Greatly reduce bus traffic, but it does require active cooperation from the compiler.
11
UMA Multiprocessor implementation
Using Crossbar switch Using Multistage switching networks
12
Using Crossbar switch The simplest circuit for connecting n CPUs to k memories is the crossbar switch Cross point shown in the figure. It can be opened or closed One of the nicest properties of the crossbar switch is that it is a nonblocking network, meaning that no CPU is ever denied the connection it needs because some crosspoint or line is already occupied Drawbacks: One of the worst properties of the crossbar switch is the fact that the number of crosspoints grows as n^2 large crossbar switch is not feasible
14
Using Multistage switching networks
A completely different multiprocessor design is based on the humble 2 × 2 switch shown in Fig. This switch has two inputs and two outputs. Messages arriving on either input line can be switched to either output line. For our purposes, messages will contain up to four parts
15
Using Multistage switching networks
16
Using Multistage switching networks
The Module field tells which memory to use. The Address specifies an address within a module. The Opcode gives the operation, such as READ or WRITE. Finally, the optional Value field may contain an operand, such as a 32-bit word to be written on a WRITE. The switch inspects the Module field and uses it to determine if the message should be sent on X or on Y. 2 × 2 switches can be arranged in many ways to build larger multistage switching networks
17
Using Multistage switching networks
Here we have connected eight CPUs to eight memories using 12 switches. More generally, for n CPUs and n memories we would need log2n stages, with n/2 switches per stage, for a total of (n/2)log2n switches, which is a lot better than n^2 crosspoints, especially for large values of n
18
NUMA Multiprocessors Single-bus UMA multiprocessors are generally limited to no more than a few dozen CPUs and crossbar or switched multiprocessors need a lot of (expensive) hardware and are not that much bigger. To get to more than 100 CPUs, something has to give NUMA machines, access to local memory modules is faster than access to remote ones
19
NUMA Multiprocessors NUMA machines have three key characteristics that all of them possess and which together distinguish them from other multiprocessors: There is a single address space visible to all CPUs. Access to remote memory is via LOAD and STORE instructions Access to remote memory is slower than access to local memory When the access time to remote memory is not hidden (because there is no caching), the system is called NC-NUMA. When coherent caches are present, the system is called CC-NUMA (Cache-Coherent NUMA).
20
Multiprocessor Operating System Types
Each CPU Has Its Own Operating System
21
Each CPU Has Its Own Operating System
The simplest possible way to organize a multiprocessor operating system is to statically divide memory into as many partitions as there are CPUs and give each CPU its own private memory and its own private copy of the operating system. In effect, the n CPUs then operate as n independent computers First, when a process makes a system call, the system call is caught and handled on its own CPU
22
Master-Slave Multiprocessors
One copy of the operating system and its tables are present on CPU 1 and not on any of the others. All system calls are redirected to CPU 1 for processing there. CPU 1 may also run user processes if there is CPU time left over
23
Master-Slave Multiprocessors
The master-slave model solves most of the problems of the first model. There is a single data structure (e.g., one list or a set of prioritized lists) that keeps track of ready processes. When a CPU goes idle, it asks the operating system for a process to run and it is assigned one. Thus it can never happen that one CPU is idle while another is overloaded.
24
Master-Slave Multiprocessors
Our third model, the SMP (Symmetric MultiProcessor), eliminates this asymmetry. There is one copy of the operating system in memory, but any CPU can run it
25
Most modern multiprocessors use this arrangement.
The hard part about writing the operating system for such a machine is not that the actual code is so different from a regular operating system The hard part is splitting it into critical regions that can be executed concurrently by different CPUs without interfering with one another Furthermore, great care must be taken to avoid deadlocks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.