Chapter 10 Distributed Shared Memory

Slides:

Advertisements

Similar presentations

Slides for Chapter 18: Distributed Shared Memory From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Pearson Education.

Advertisements

COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 7: Consistency 4/13/20151Distributed Systems - COMP 655.

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.

Presented by Evan Yang. Overview of Munin  Distributed shared memory (DSM) system  Unique features Multiple consistency protocols Release consistency.

Distributed Shared Memory

Consistency Models Based on Tanenbaum/van Steen’s “Distributed Systems”, Ch. 6, section 6.2.

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 25: Distributed Shared Memory All slides © IG.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Distributed Resource Management: Distributed Shared Memory

CS 551-Memory Management1 Learning Objectives Centralized Memory Management -review Simple Memory Model Shared Memory Model Distributed Shared Memory Memory.

Memory Consistency Models

Consistency. Consistency model: –A constraint on the system state observable by applications Examples: –Local/disk memory : –Database: What is consistency?

CSS434 DSM1 CSS434 Distributed Shared Memory Textbook Ch18 Professor: Munehiro Fukuda.

PRASHANTHI NARAYAN NETTEM.

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed Shared Memory.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

Distributed Shared Memory Systems and Programming

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.

Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.

Distributed Shared Memory (DSM)

2008 dce Distributed Shared Memory Pham Quoc Cuong & Phan Dinh Khoi Use some slides of James Deak - NJIT.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.

B. Prabhakaran 1 Distributed Shared Memory DSM provides a virtual address space that is shared among all nodes in the distributed system. Programs access.

TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.

CS425/CSE424/ECE428 – Distributed Systems Nikita Borisov - UIUC1 Some material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra,

Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.

Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.

Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.

Distributed Shared Memory Presentation by Deepthi Reddy.

Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.

Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

DISTRIBUTED COMPUTING

Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.

CIS 720 Distributed Shared Memory. Shared Memory Shared memory programs are easier to write Multiprocessor systems Message passing systems: - no physically.

1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.

Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.

Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.

Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.

Consistency and Replication Chapter 6 Presenter: Yang Jie RTMM Lab Kyung Hee University.

Consistency and Replication CSCI 6900/4900. FIFO Consistency Relaxes the constraints of the causal consistency “Writes done by a single process are seen.

Lecture 28-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) December 2, 2010 Lecture 28 Distributed.

Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

CS6320 – Performance L. Grewe.

Software Coherence Management on Non-Coherent-Cache Multicores

Distributed Shared Memory

Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013

Lecture 18: Coherence and Synchronization

Multiprocessor Cache Coherency

Consistency and Replication

Distributed Shared Memory

Consistency Models.

Lecture 26 A: Distributed Shared Memory

Outline Midterm results summary Distributed file systems – continued

Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.

Consistency and Replication

Multiprocessors - Flynn’s taxonomy (1966)

Distributed Shared Memory

Distributed Shared Memory

CSS490 Distributed Shared Memory

Exercises for Chapter 16: Distributed Shared Memory

Concurrency: Mutual Exclusion and Process Synchronization

Lecture 26 A: Distributed Shared Memory

Lecture 24: Multiprocessors

Distributed Resource Management: Distributed Shared Memory

Presentation transcript:

Chapter 10 Distributed Shared Memory Introduction Design and implementation issues Sequential consistency and IVY Release consistency and Munin Summary

10.1 Introduction to DSM Why DSM (Distributed Shared Memory)? - Message passing is complex (dropped msgs, etc) - Hard to pass complex data structures in RPCs - Shared-memory multiprocessors easier to program(such as synchronization) - Processes communicating via DSM may execute with non-overlapping lifetimes - But multicomputers easier to build and cheaper What is Distributed shared memory? - Processors share a virtual address space(VAS) - Perhaps there is no global memory - Perhaps they have private memories - Private memories cache pages from VAS - May need consistency between memory caches

Main approaches to DSM? - Hardware-based: rely on specialized hardware - Page-based: as a region of virtual memory occupying the same address range in the address space of every participating process - Library-based: library calls are responsible for accessing DSM and maintaining consistency Example: Mether system program #include “world.h” #include “world.h” struct shared { struct shared { int a,b; int a,b; } } Program Writer: Program Reader: main() main() { { struct shared *p; struct shared *p; methersetup(); methersetup(); p = (struct shared*) METHERBASE; p = (struct shared*) METHERBASE; p->a = p->b = 0; while(TRUE){ while(TRUE){ printf(“a=%d,b=%d\n”, p->a, p-b); p->a = p->a + 1: sleep(1); p->b = p->b - 1; } } } }

10.2 Design and implementation issues Structure - Byte-oriented - Shared objects - Immutable data Synchronization model - A distributed synchronization service needs to help applications to apply constraints concerning the values in DSM. Example: (two processes share a & b) constraints: a=b initially: a=b=0 a ++; b ++; Consistency model - It’s different from the above synchronization model.

Consistency model - Example: two processes accessing shared variables Process 1 Process 2 br:=b; a:=a+1; ar:=a; b:=b+1 if (ar>=br) then printf(“OK”); Could ar =0, br=1 occur? If occur, is it right? - A contract between software and memory - If software obeys certain rules, memory provides certain guarantees - The fewer guarantees, the better the performance - some consistency models strict consistency, sequential consistency, causal consistency, weak consistency, release consistency

Sequential consistency Strict consistency - Any read to location x returns most recent value stored in x - We expect this on uniprocessors - Assumes absolute global time - Is this even possible in a distributed system? - What if two writes from processors are a ns apart? - Use locking and critical sections instead Sequential consistency - Slightly weaker - from Lamport 1979 - Any interleaving of operations is okay, but they must satisfy the following conditions: all the reads and writes issued by the individual processes concerned are satisfied in program order; the memory operations belonging to different processes occur in some serial order - Programmer-friendly, but slow

- Example: An example serialization under sequential consistency Process 1 Process 2 br:=b; a:=a+1; ar:=a; b:=b+1 if (ar>=br) then printf(“OK”); Could ar =0, br=1 occur under sequential consistency model? Time read write Causal consistency - Only concerned with events that are potentially causally related - Causally-related writes must be seen by all processes in same order - Concurrent writes can be seen in different orders - Example: In a newsgroup, there are 3 messages: Msg 1: Does anyone know the time of our next meeting? Msg 2: 3 pm Msg 3: A good recipe for mushrooms stuffed with salmon is ... - Msg 1 & 2 are causally related, Msg 3 is not Msg 2 wouldn’t make sense unless you’d received Msg 1 already Msg 3 makes sense anytime

Weak consistency Example: valid ordering P1: W(x) 1 W(x) 2 S - Not all applications even to see writes - Example: writes in critical section - Who cares what order others see them, they shouldn’t look! - Need synchronization variable - Synch. Var. Accesses are sequentially consistent - Synchronize memory only at synchronization points - Accessing a synch. Var. “flushes the pipeline” - No data access allowed until all previous synch. Accesses have been performed - To get consistent value, do a synch before read Example: valid ordering P1: W(x) 1 W(x) 2 S P2: R(x) 1 R(x) 2 S P3: R(x) 2 R(x) 1 S invalid ordering P2: S R(x) 1

Release consistency Example: valid ordering - When synch done before, is this end of writes or beginning of reads? - Had to finish local writes and gather remote ones - Release consistency fixes this - Two kinds of synch. Accesses - Acquire means you’re about to enter critical section - Release means you’ve exited critical section - Can also use barrier synchronization Example: valid ordering P1: Acq(L) W(x) 1 W(x)2 Rel(L) P2: Acq(L) R(x) 2 Rel(L) P3: R(x) 1 - Eager versus lazy release consistency Eager: process doing the release pushes modified data to all processors with cached copies But do they all need it? Lazy: pull most recent values upon acquire With a critical section in a loop, this saves a lot!

Update options DSM using write-update printf(“after”); P1 P2 P3 - How to propagate updates? => Write-update vs write-invalidate - Write-update multiple-reader-multiple-writer sharing memory consistency model mainly depends on multicast ordering property DSM using write-update a:=7 b:= 7; if(b=8) then printf(“after”); If(a=7)then b:=b+1 ... P1 time time P2 If(b=a)then printf(“before”); P3 time

Granularity Thrashing Data items laid out over pages A B Page n -Write-invalidate multiple-reader-single-writer sharing this scheme can achieve sequential consistency Granularity - Page-based implementation - large or small? - False sharing Data items laid out over pages A B Page n Thrashing - occurs when several processes compete for the same data item, or for falsely shared data items.

10.3 Sequential consistency and Ivy The system model Process accessing paged DSM segment Page faults Kernel Pages transferred over network - sequential consistent, page-based DSM - paging is transparent to the processes - DSM run time restricts page access permissions: none, read-only or read-write.

Write-invalidation - Can’t use write-update, why? Write protected Page fault write permission Trace mode trace exception handler - Write invalidation Page protection: read-only permission & read/write permission owner & copyset State transitions under write-invalidation Multiple reader Single writer R W (invalidation) R W (invalidation) Note: R = read fault occurs; W= write fault occurs

- Write fault handling procedure is as follows: The page is transferred to Pw’s(Pw is a process which attempts to write a page p) kernel, if it does not already have an up-to-date read- only copy. All other copies are invalidated: the page permissions are set to no- access at all members of copyset(p). copyset(p) := {Pw}. owner(p) := Pw. The kernel maps the page with read-write permissions into Pw’s address space, and Pw is restarted.

- Read fault handling procedure is as follows: The page is copied from owner(p) to Pr’s(Pr is a process which attempts to read a page p) kernel. If owner(p) is a single writer, then its access permission for p is set to read-only access and it remains as p’s owner.. copyset(p) := copyset(p) {Pr}. Pr’s kernel maps the page with read-only permissions into Pr’s address space, and Pr continues. - Two problems remain to be addressed: How to locate owner(p) for a given page p? Where to store copyset(p)?

- The approaches to these problems Li Kai described include: centralized manager algorithm fixed distributed page management multicast -based distributed management dynamic distributed management Centralized manager algorithm Faulting process (i.e. client) Current owner 3. Page 2. Requestor, page no., access 1. Page no., access(R/W) Page Owner no. ...... ...... Manager

Fixed distributed page management - pages are divided statically among multiple managers, but processes maybe don’t access the pages equally. Using multicast to locate the owner - If two or more clients request the same page at more or less the same time, we must ensure that each client must obtain the page eventually! 3 Owner(p) O 1 2 Client C2 Client C1 4

A dynamic distributed manager algorithm - allows page ownership to be transferred between kernels - to divide the overheads of locating pages among the computers that access them - probable owner of p, or probOwner(p) is just a hint - the owner of a page is located by following chains of hints - hints are updated and requests are forwarded as follows: when a kernel transfers ownership of page p to another kernel, it updates probOwner(p) to be the recipient. when a kernel(not the owner) handles an invalidation request for a page p, it updates probOwner(p) to be the requester. when a kernel that has requested read access to a page p receives it, it updates probOwner(p) to be the provider. when a kernel receives a request(whether it’s for read access or write access ) for a page p that it does not own, it forwards the request to probOwner(p), and resets probOwner(p) to be the requester.

B A C D E Owner B A C D E Owner B A C D E Owner (a) probOwner pointers just before process A takes a page fault for a page owned by E B A C D E Owner (b) Write fault:probOwner pointers after A’s write request is forwarded. B A C D E Owner (c) Read fault:probOwner pointers after A’s read request is forwarded.

periodically broadcasting the current owner’s location to all kernels - optimization 1 periodically broadcasting the current owner’s location to all kernels to reduce the average length of pointer chains. Simulation results: Page faults per broadcast average length of pointer chains 1024 3.64 2.34 256 - optimization 2 a client can obtain a copy from any kernel with a valid copy owner 1 2 3 Some invalidations can occur in parallel! 4 5 6 7 8

10.4 Release consistency and Munin - Sequential consistency is costly to implement 1、using multicasts; 2、locating the owner of a page. - Release consistency is weaker than sequential consistency and cheaper to implement, but has reasonable semantics. E.g., DASH, Munin Re-examine release consistency - its idea is to reduce DSM overhead by exploiting the fact that programmers use synchronization objects - ordinary accesses(to DSM) vs synchronization accesses - its main guarantee is as follows: All ordinary memory access issued prior to a release have taken effect at all other processes before the release completes - including those accesses issued prior to the preceding acquire - by using appropriate synchronization accesses, it can give equivalent results to executions under sequential consistency model

- Munin’s synchronization objects >> acquireLock, releaseLock and waitAtBarrier E.g., Process 1: Process 2: acquireLock(); /*enter critical section*/ acquireLock(); /*enter critical section*/ a:= a+1; printf(“the values of a & b are:”,a,b); b:= b+1; releaseLock(); /*leave critical section*/ releaseLock(); /*leave critical section*/ Advantages of release consistency: - to avoid some blocking of processes - to delay some communication until a release occurs

Munin - Shared memory programs on shared distributed memory multiprocessors, in this case a workstation cluster - Release consistent memory and eager approach Munin sends update or invalidation information as soon as a lock is released - Multiple consistency protocols, parameterized according to some options, such as: whether to use a write-update or write-invalidate protocol whether the data item has a fixed owner whether or not to delay updates or invalidations - Annotate shared variables with expected access patterns, examples: read-only - replication on demand producer-consumer - update rather than invalidate write-shared - avoiding false-sharing and only the differences between the two versions are sent in an update

That’s it, thanks for your attention! In next class we’ll continue discussing: Chapter 11 Time and Coordination Synchronizing physical clocks Logical time and logical clocks Distributed coordination That’s it, thanks for your attention!