Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

The University of Adelaide, School of Computer Science
Distributed Shared Memory
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 25: Distributed Shared Memory All slides © IG.
Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 4 -- Spring 2001.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
Memory Management 2010.
Distributed Resource Management: Distributed Shared Memory
Communication Models for Parallel Computer Architectures 4 Two distinct models have been proposed for how CPUs in a parallel computer system should communicate.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CSS434 DSM1 CSS434 Distributed Shared Memory Textbook Ch18 Professor: Munehiro Fukuda.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed Shared Memory.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Multiprocessor Cache Coherency
Distributed Shared Memory Systems and Programming
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Distributed Shared Memory (DSM)
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
B. Prabhakaran 1 Distributed Shared Memory DSM provides a virtual address space that is shared among all nodes in the distributed system. Programs access.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.
CS425/CSE424/ECE428 – Distributed Systems Nikita Borisov - UIUC1 Some material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra,
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Distributed Shared Memory Presentation by Deepthi Reddy.
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
DISTRIBUTED COMPUTING
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
A Design of User-Level Distributed Shared Memory Zhi Zhai Feng Shen Computer Science and Engineering University of Notre Dame Oct. 27, 2009 Progress Report.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
Lecture 28-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) December 2, 2010 Lecture 28 Distributed.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
Distributed Shared Memory
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Ivy Eva Wu.
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Example Cache Coherence Problem
The University of Adelaide, School of Computer Science
Distributed Shared Memory
Page Replacement.
Lecture 26 A: Distributed Shared Memory
Outline Midterm results summary Distributed file systems – continued
Multiprocessors - Flynn’s taxonomy (1966)
Distributed Shared Memory
Multiple Processor Systems
CSS490 Distributed Shared Memory
The University of Adelaide, School of Computer Science
Lecture 26 A: Distributed Shared Memory
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Distributed Resource Management: Distributed Shared Memory
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release u implementation of sequential consistency u implementation issues F granularity F thrashing F page replacement

DSM idea n all computers share a single paged, virtual address space n pages can be physically located on any computer n when process accesses data in shared address space a mapping manager maps the request to the physical page n mapping manager – kernel or runtime library n if page is remote – block the process and fetch it

Advantages of DSM n Simpler abstraction - programmer does not have to worry about data movement, may be easier to implement than RPC since the address space is the same n easier portability - sequential programs can in principle be run directly on DSM systems n possibly better performance u locality of data - data moved in large blocks which helps programs with good locality of reference u on-demand data movement u larger memory space - no need to do paging on disk n flexible communication - no need for sender and receiver to exist, can join and leave DSM system without affecting the others n process migration simplified - one process can easily be moved to a different machine since they all share the address space

Maintaining memory coherency n DSM systems allow concurrent access to shared data n concurrency may lead to unexpected results - what if the read does not return the value stored by the most recent write (write did not propagate)? n Memory is coherent if the value returned by the read operation is always the value the programmer expected n To maintain coherency of shared data a mechanism that controls (and synchronizes) memory accesses is used. n This mechanism only allows a restricted set of memory access orderings n memory consistency model - the set of allowable memory access orderings

Strict and sequential consistency n strict consistency (strongest model) u value returned by a read operation is always the same as the value written by the most recent write operation u hard to implement n sequential consistency (Lamport 1979) u the result of any execution of the operations of all processors is the same as if there were executed in some sequential order and one process’ operations are in the order of the program F Interleaving of operations doesn’t matter, if all processes see the same ordering u read operation may not return result of most recent write operation! F running a program twice may give different results u little concurrency

Causal consistency n proposed (Hutto and Ahmad 1990) n there is no single (even logical) ordering of operations – two processes may “see” the same write operations ordered differently n the operations are sequenced in the same order if they are potentially causally related n read/write (or two write) operations on the same item are causally related n all operations of the same process are causally related n causality is transitive - if a process carries out an operation B that causally depends on the preceding op A - all consequent ops by this process are causally related to A (even if they are on different items)

PRAM and processor consistency n PRAM (Lipton & Sandberg 1988) u All processes see memory writes done by a single process in the same (correct) order u PRAM = pipelined RAM F Writes done by a single process can be pipelined; it doesn’t have to wait for one to finish before starting another F writes by different processes may be seen in different order on a third process u Easy to implement — order writes on each processor independent of all others n Processor consistency (Goodman 1989) u PRAM + u coherency on the same data item - all processes agree on the order of write operations to the same data item

Weak and release consistency n Weak consistency (Dubois 1988) u Consistency need only apply to a group of memory accesses rather than individual memory accesses u Use synchronization variables to make all memory changes visible to all other processes (e.g., exiting critical section) F all accesses to synchronization variables must be sequentially consistent F write operations are completed before access to synchvar F access to non-synchvar is allowed only after sycnhvar access is completed n Release consistency (Gharachorloo 1990) u synchronization var works like an OS lock, two operations F acquire - all changes to synchronized vars are propagated to the process F release - all changes to synchronized vars are propagated to other processes u programmer has to write accesses to these variables

Comparison of consistency models n Models differ by difficulty of implementation, ease of use, and performance n strict consistency — most restrictive, but hard to implement n sequential consistency — widely used, intuitive semantics, not much extra burden on programmer u but does not allow much concurrency n causal & PRAM consistency — allow more concurrency, but have non-intuitive semantics, and put more of a burden on the programmer to avoid doing things that require more consistency n weak and release consistency — intuitive semantics, but put extra burden on the programmer

Implementation issues n how to keep track of the location of remote data n how to overcome the communication delays and high overhead associated with execution of communication protocols n how to make shared data concurrently accessible at several nodes to improve system performance

Implementing sequential consistency on page-based DSM n Can a page move? …be replicated? n Nonreplicated, nonmigrating pages u All requests for the page have to be sent to the owner of the page u Easy to enforce sequential consistency — owner orders all access request u No concurrency n Nonreplicated, migrating pages u All requests for the page have to be sent to the owner of the page u Each time a remote page is accessed, it migrates to the processor that accessed it u Easy to enforce sequential consistency — only processes on that processor can access the page u No concurrency

Implementing sequential consistency on page-based DSM (cont.) n Replicated, migrating pages u All requests for the page have to be sent to the owner of the page u Each time a remote page is accessed, it’s copied to the processor that accessed it u Multiple read operations can be done concurrently u Hard to enforce sequential consistency — must invalidate (most common approach) or update other copies of the page during a write operation n Replicated, nonmigrating pages u Replicated at fixed locations u All requests to the page have to be sent to one of the owners of the page u Hard to enforce sequential consistency — must update other copies of the page during a write operation

Granularity n Granularity - size of shared memory unit n Page-based DSM u Single page — simple to implement u Multiple pages — take advantage of locality of reference, amortize network overhead over multiple pages F Disadvantage — false sharing n Shared-variable DSM u Share only those variables that are need by multiple processes u Updating is easier, can avoid false sharing, but puts more burden on the programmer n Object-based DSM u Retrieve not only data, but entire object — data, methods, etc. u Have to heavily modify old programs

Thrashing n Occurs when system spends a large amount of time transferring shared data blocks from one node to another (compared to time spent on useful computation) u interleaved data access by two nodes causes data block to move back and forth u read-only blocks invalidated as soon as they are replicated n handling thrashing u application specifies when to prevent other nodes from moving block - has to modify application u “nailing” block after transfer for a minimum amount of time t - hard to select t, wrong selection makes inefficient use of DSM F adaptive nailing? u tailoring coherence semantics (Minin) to use – object based sharing

Page replacement n What to do when local memory is full? u swap on disk? u swap over network? u what if page is replicated? u what if it’s read-only? u what if it’s read/write but clean (dirty)? u are shared pages given priority over private (non-shared)?

Multicomputers and Multiprocessors n Multiprocessors (shared memory) u Any process can use usual load / store operations to access any memory word u Complex hardware — bus becomes a bottleneck with more than CPUs u Simple communication between processes — shared memory locations u Synchronization is well-understood, uses classical techniques (semaphores, etc.) n Multicomputers (no shared memory) u Each CPU has its own private memory u Simple hardware with network connection u Complex communication — processes have to use message-passing, have to deal with lost messages, blocking, etc. u RPCs help, but aren’t perfect (no globals, can’t pass large data structures, etc.)

Bus-based multiprocessors n Symmetric Multiprocessor (SMP) u Multiple CPUs (2–30), one shared physical memory, connected by a bus n Caches must be kept consistent u Each CPU has a “snooping” cache, which “snoops” what’s happening on the bus F On read hit, fetch data from local cache F On read miss, fetch data from memory, and store in local cache Same data may be in multiple caches F (Write through) On write, store in memory and local cache Other caches are snooping; if they have that word they invalidate their cache entry After write completes, the memory is up-to-date and the word is only in one cache CPU Cache CPU Cache CPU Cache Memory

NUMA Machines n NUMA = Non-Uniform Memory Access n NUMA multiprocessor u Multiple CPUs, each with its own memory u CPUs share a single virtual memory u Accesses to local memory locations are much faster (maybe 10x) than accesses to remote memory locations n No hardware caching, so it matters which data is stored in which memory u A reference to a remote page causes a hardware page fault, which traps to the OS, which decides whether or not to move the page to the local machine CPUMemoryCPUMemory Bus LAN

Comparison of shared memory systems u SMP = single bus multiprocessor F Hardware access to all of memory, hardware caching of blocks u NUMA F Hardware access to all of memory, software caching of pages u Distributed Shared Memory (DSM) F Software access to remote memory, software caching of pages