Distributed Shared Memory Chapter 9
Introduction Distributed Shared Memory Making the main memory of a cluster of computers look as though it is a single memory with a single address space. Then you can use shared memory programming techniques.
1. Distributed Shared Memory Still need messages or mechanisms to get data to processor, but these are hidden from the programmer:
1. Distributed Shared Memory Advantages of DSM System scalable Hides the message passing - do not explicitly specific sending messages between processes Can use simple extensions to sequential programming Can handle complex and large data bases without replication or sending the data to processes
1. Distributed Shared Memory Disadvantages of DSM May incur a performance penalty Must provide for protection against simultaneous access to shared data (locks, etc.) Little programmer control over actual messages being generated Performance of irregular problems in particular may be difficult
2. Implementing Distributed Shared Memory Hardware Special network interfaces and cache coherence circuits Software Modifying the OS kernel Adding a software layer between the operating system and the application - most convenient way for teaching purposes
2.1 Software DSM Systems Page based Shared variable approach Using the system’s virtual memory Shared variable approach Using routines to access shared variables Object based Shared data within collection of objects. Access to shared data through object oriented discipline (ideally)
2.1 Software DSM Systems Software Page Based DSM Implementation
2.1 Software DSM Systems Some Software DSM Systems Treadmarks JIAJIA Page based DSM system Apparently not now available JIAJIA C based Some users have said it required significant modifications to work (in message-passing calls) Adsmith object based C++ library routines
2.2 Hardware DSM Implementation Special network interfaces and cache coherence circuits are added to the system to make a memory reference to a remote look like a reference to a local memory location Typical examples are in clusters of SMP machines
2.3 Managing Shared Data There are several ways that a processor could be given access to shared data The simplest solution is to have a central server responsible for all read and write routines. Problem – the server is the bottleneck Solution?
2.3 Managing Shared Data Multiple copies of data. Allows simultaneous access by different processors. How do you maintain these copies using a coherence policy? One option is Multiple Reader/Single Writer When the writer changes the data they have 2 choices Update policy Invalidate policy
2.3 Managing Shared Data Another option is Multiple Reader / Multiple Writer This is the most complex scenario.
2.4 Multiple Reader/Single Writer Policy in a Page-Based System Remember that a page holds more than one variable location. Problem… A and B can change different things in the same page How do you deal with the consistency?
3. Achieving Consistent Memory in a DSM System The term memory consistency model addresses when the current value of a shared variable is seen by other processors. There are various models with decreasing constraints to provide higher performance.
3. Achieving Consistent Memory in a DSM System Consistency Models Strict Consistency - Processors sees most recent update, i.e. read returns the most recent write to location. Sequential Consistency - Result of any execution same as an interleaving of individual programs. Relaxed Consistency- Delay making write visible to reduce messages.
3. Achieving Consistent Memory in a DSM System Consistency Models (cont) Weak consistency - programmer must use synchronization operations to enforce sequential consistency when necessary. Release Consistency - programmer must use specific synchronization operators, acquire and release. Lazy Release Consistency - update only done at time of acquire.
3. Achieving Consistent Memory in a DSM System Strict Consistency Every write immediately visible Disadvantages: number of messages, latency, maybe unnecessary.
3. Achieving Consistent Memory in a DSM System Release Consistency An extension of weak consistency in which the synchronization operations have been specified: acquire operation - used before a shared variable or variables are to be read. release operation - used after the shared variable or variables have been altered (written) and allows another process to access to the variable(s) Typically acquire is done with a lock operation and release by an unlock operation (although not necessarily).
3. Achieving Consistent Memory in a DSM System Release Consistency Arrows show messages
3. Achieving Consistent Memory in a DSM System Lazy Release Consistency Messages only on acquire Advantages: Fewer Messages
4. Distributed Shared Memory Programming Primitives 4 fundamental primitives Process/thread creation (and termination) Shared-data creation Mutual-exclusion synchronization (controlled access to shared data) Process/thread and event synchronization.
4.1 Process Creation Simple routines such as: dsm_spawn(filename, num_processes); dsm_wait();
4.2 Shared Data Creation Routines to construct shares data dsm_malloc(); dsm_free();
4.3 Shared Data Access In a DSM system employing relaxed-consistency model (most DSM systems) dsm_lock(lock1) dsm_refresh(sum); *sum++; dsm_flush(sum); dsm_unlock(lock1);
4.4 Synchronization Access Two types must be provided Global Synchronization Process-Pair Synchronization Typically done with identifiers to barriers dsm_barrier(identifier);
4.5 Features to Improve Performance Overlapping computation with communication. dsm_prefetch(sum); This tells the system to get the variable because we will need it soon. Similar to speculative loads used in processor/cache algorithms.
4.5 Features to Improve Performance Reducing the Number of Messages dsm_acquire(sum); *sum++; dsm_release(sum); Old Way: dsm_lock(lock1); dsm_refresh(sum); *sum++; dsm_flush(sum); dsm_unlock(lock1);
5. Distributed Shared Memory Programming Uses the same concepts as shared memory programming. Message passing occurs that is hidden from the user, and thus there are additional inefficiency considerations. You may have to modify the code to request variables before you need them. Would be nice if the compiler could do this
6. Implementing a Simple DSM System It is relatively straightforward to write your own simple DSM system. In this section we will review how this can be done.
6.1 User Interface Using Classes and Methods. Using the OO methodology of C++ we might have a wrapper class SharedInteger sum = new SharedInteger(); This class could extend the Integer class to provide the following methods. sum.lock() sum.refresh() sum++ sum.flush() sum.unlock()
6.2 Basic Shared-Variable Implementation Option 1: Centralized Server
6.2 Basic Shared-Variable Implementation Option 2: Multiple Servers
6.2 Basic Shared-Variable Implementation Option 3: Multiple Readers (1 owner)
6.2 Basic Shared-Variable Implementation Option 4: Migrating the owner
DSM Projects Write a DSM system in C++ using MPI for the underlying message-passing and process communication. (More advanced) One of the fundamental disadvantages of software DSM system is the lack of control over the underlying message passing. Provide parameters in a DSM routine to be able to control the message-passing. Write routines that allow communication and computation to be overlapped.