CSS434 DSM1 CSS434 Distributed Shared Memory Textbook Ch18 Professor: Munehiro Fukuda.

Slides:



Advertisements
Similar presentations
Slides for Chapter 18: Distributed Shared Memory From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Pearson Education.
Advertisements

Multiple-Writer Distributed Memory. The Sequential Consistency Memory Model P1P2 P3 switch randomly set after each memory op ensures some serial order.
Virtual Memory Introduction to Operating Systems: Module 9.
Distributed Shared Memory
Consistency Models Based on Tanenbaum/van Steen’s “Distributed Systems”, Ch. 6, section 6.2.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 4 -- Spring 2001.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
(Software) Distributed Shared Memory (aka Shared Virtual Memory)
Distributed Resource Management: Distributed Shared Memory
CS 551-Memory Management1 Learning Objectives Centralized Memory Management -review Simple Memory Model Shared Memory Model Distributed Shared Memory Memory.
Computer Organization and Architecture
Memory Consistency Models
Consistency. Consistency model: –A constraint on the system state observable by applications Examples: –Local/disk memory : –Database: What is consistency?
Interprocessor arbitration
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed Shared Memory.
CompSci 143A1 9. Linking and Sharing 9.1 Single-Copy Sharing –Why Share –Requirements for Sharing –Linking and Sharing 9.2 Sharing in Systems without Virtual.
Distributed Shared Memory Systems and Programming
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Distributed Shared Memory (DSM)
2008 dce Distributed Shared Memory Pham Quoc Cuong & Phan Dinh Khoi Use some slides of James Deak - NJIT.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
B. Prabhakaran 1 Distributed Shared Memory DSM provides a virtual address space that is shared among all nodes in the distributed system. Programs access.
TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.
CS425/CSE424/ECE428 – Distributed Systems Nikita Borisov - UIUC1 Some material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra,
An Efficient Lock Protocol for Home-based Lazy Release Consistency Electronics and Telecommunications Research Institute (ETRI) 2001/5/16 HeeChul Yun.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Distributed Shared Memory Presentation by Deepthi Reddy.
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
Memory Consistency Models 1. Uniform Consistency Models Only have read and write operations Sequential Consistency Pipelined-RAM Causal Consistency Coherence.
Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.
DISTRIBUTED COMPUTING
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
A Design of User-Level Distributed Shared Memory Zhi Zhai Feng Shen Computer Science and Engineering University of Notre Dame Oct. 27, 2009 Progress Report.
CIS 720 Distributed Shared Memory. Shared Memory Shared memory programs are easier to write Multiprocessor systems Message passing systems: - no physically.
Introduction to Software Distributed Shared Memory Systems Chang-Yi Lin 2004 / 02 / 26.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
Lecture 28-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) December 2, 2010 Lecture 28 Distributed.
Region-Based Software Distributed Shared Memory Song Li, Yu Lin, and Michael Walker CS Operating Systems May 1, 2000.
Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.
Distributed Shared Memory
Lecture 21 Synchronization
Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
Relaxed Consistency models and software distributed memory
Ivy Eva Wu.
Chapter 10 Distributed Shared Memory
Distributed Shared Memory
Outline Midterm results summary Distributed file systems – continued
Distributed Shared Memory
Distributed Shared Memory
CSS490 Distributed Shared Memory
Exercises for Chapter 16: Distributed Shared Memory
The University of Adelaide, School of Computer Science
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Distributed Resource Management: Distributed Shared Memory
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
CSE 542: Operating Systems
Presentation transcript:

CSS434 DSM1 CSS434 Distributed Shared Memory Textbook Ch18 Professor: Munehiro Fukuda

CSS434 DSM2 Basic Concept Communication Network CPU 1 CPU n MMU Page Mgr : Memory Node 0 CPU 1 CPU n MMU Page Mgr : Memory Node 1 CPU 1 CPU n MMU Page Mgr : Memory Node 2 … Distributed Shared Memory (exists only virtually) Data = read(address); write(address, data); address A cache line or a page is transferred to and cached in the requested computer.

CSS434 DSM3 Writer Process on DSM #include "world.h" structshared { int a,b; }; Program Writer: main() { int x; struct shared *p; methersetup(); /* Initialize the Mether run-time */ p = (struct shared *)METHERBASE; /* overlay structure on METHER segment */ p->a = p->b = 0;/* initialize fields to zero */ while(TRUE){/* continuously update structure fields */ p –>a = p –>a + 1; p –>b = p –>b - 1; }

CSS434 DSM4 Reader Process on DSM Program Reader: main() { struct shared *p; methersetup(); p = (struct shared *)METHERBASE; while(TRUE) {/* read the fields once every second */ printf("a = %d, b = %d\n", p –>a, p –>b); sleep(1); }

CSS434 DSM5 Why DSM? Simpler abstraction Underlying tedious communication primitives are all shielded by memory accesses Better portability of distributed application programs Natural transition from sequential to distributed application Better performance of some applications Data locality, one-demand data movement, and large memory space reduce network traffic and paging/swapping activities. Flexible communication environment Sender and receiver have no need to know each other. They even need not coexist. Ease of process migration Migration is completed only by transferring the corresponding PCB to the destination.

CSS434 DSM6 Main Issues Granularity Fine (less false sharing but more network traffic)  Cache line (e.g. Dash and Alewife), Object (e.g. Orca and Linda), Page (e.g. Ivy)  Coarse(more false sharing but less network traffice) Memory coherence and access synchronization Strict, Sequential, Causal, Weak, and Release Consistency models Data location and access Broadcasting, centralized data locator, fixed distributed data locator, and dynamic distributed data locator Replacement strategy LRU or FIFO (The same issue as OS virtual memory) Thrashing How to prevent a block from being exchanged back and forth between two nodes. Heterogeneity

CSS434 DSM7 Consistency Models Two processes accessing shared variables a := a + 1; b := b + 1; br := b; ar := a; if(ar ≥ br) then print ("OK"); Process 1Process 2 At the beginning a = b = 0; a == 1 b == 1 Condition satisfied a == 1 b == 0 Condition satisfied b == 1 a == 0 This may happen if new contents are transmitted through a different route. DSM needs a consistency model.

CSS434 DSM8 Consistency Models Strict Consistency Wi(x, a): Processor i writes a on variable x, (i.e., x = a;). b  Ri(x): Processor i reads b from variable x. (i.e., y = x; && y == b;). Any read on x must return the value of the most recent write on x. Strict Consistency Not Strict Consistency P1 P2 P3 P1 P2 P3 W2(x, a) a  R1(x) a  R3(x) W2(x, a) nil  R1(x) a  R3(x) a  R1(x)

CSS434 DSM9 Consistency Models Linearizability and Sequential Consistency Linearlizability: Operations of each individual process appear to all processes in the same order as they happen. Sequential Consistency: Operations of each individual process appear in the same order to all processes. Linearlizability Sequential Consistency P1 P2 P3 W2(x, a) a  R1(x) b  R1(x) P4 a  R4(x) W3(x, b) b  R4(x) P1 P2 P3 W2(x, a) b  R1(x) a  R1(x) P4 b  R4(x) W3(x, b) a  R4(x) Nil <-R1(x)

CSS434 DSM10 Consistency Models FIFO and Processor Consistency FIFO Consistency: writes by a single process are visible to all other processes in the order in which they were issued. Processor Consistency: FIFO Consistency + all write to the same memory location must be visible in the same order. FIFO Consistency Processor Consistency P1 P2 P3 W2(x, b) a  R1(x) 0  R1(x) P1 P2 P3 W2(x, a) W3(x, 1) W3(x, 0) W2(x, b) W2(x, a) W3(y, 1) W3(y, 0) P4 1  R1(x) b  R1(x) a  R1(x) 0  R1(x) b  R1(x) 1  R1(z) W2(y, a) W3(z, a) a  R1(y) 1  R1(x) 1  R1(z) a  R1(y) W2(y, a) W3(z, 1) a  R1(x) 0  R1(y) 1  R1(y) b  R1(x) a  R1(y) 1  R1(z) a  R1(x) 0  R1(y) 1  R1(y) b  R1(x) a  R1(y) 1  R1(z)

CSS434 DSM11 Consistency Models Causal Consistency Causally related write must be visible to all processes in the same order. Concurrent writes may be propagated in a different order. Causal Consistency Not Causal Consistency P1 P2 P3 b  R4(x) c  R1(x) P4 P1 P2 P3 P4 W2(x, a) a  R3(x) W3(x, b) b  R1(x) c  R4(x) W2(x, c) a  R4(x) a  R3(x) a  R1(x) W2(x, a) a  R3(x) W3(x, b) b  R1(x) b  R4(x) a  R4(x)

CSS434 DSM12 Consistency Models Weak Consistency Accesses to synchronization variables must obey sequential consistency. All previous writes must be completed before an access to a synchronization variable. All previous accesses to synchronization variables must be completed before access to non-synchronization variable. Weak Consistency Not Weak Consistency P1 P2 P3 P1 P2 P3 W2(x, a) W2(x, b) W2(y, c) S2 S1 S3 b  R4(x) c  R4(y) b  R4(x) W2(x, a) W2(x, b) W2(y, c) S2 S1 S3 a  R4(x) c  R4(y) b  R4(x) c  R4(y) a  R4(x) Nil  R4(y) b  R4(x)

CSS434 DSM13 Consistency Models Release Consistency Access to acquire and release variables obey processor consistency. Previous acquires requested by a process must be completed before the process performs a data access. All previous data accesses performed by a process must be completed before the process performs a release. P1 P2 P3 a  R3(x) Acq1(L) W1(x, a) W1(x, b) Rel1(L) Acq2(L) Rel2(L) b  R2(x)

CSS434 DSM14 Process 1: acquireLock();// enter critical section a := a + 1; b := b + 1; releaseLock();// leave critical section Process 2: acquireLock();// enter critical section print ("The values of a and b are: ", a, b); releaseLock();// leave critical section Consistency Models Release Consistency (Example)

CSS434 DSM15 Implementing Sequential Consistency Replicated and Migrating Data Blocks memory cache x y Processor memory cache m n Processor Node 1 memory cache a b Processor Node 2 Node 3 m b x x Duplicate Then what if Node 2 updates x?

CSS434 DSM16 Implementing Sequential Consistency Write Invalidation a copy of block a copy of block Client wants to write: 1. Request block new copy 2. Replicate block new copy 3. Invalidate block

CSS434 DSM17 Implementing Sequential Consistency Write Update a copy of block a copy of block Client wants to write: 1. Request block new copy 2. Replicate block new copy 3. Update block new copy

CSS434 DSM18 Implementing Sequential Consistency Read/Write Request Unused Writable Read only Nil Read-owned Read (Read a copy from the onwer) Read (Read from memory and get an ownership) Write (invalidate others if they have a copy and get an ownership) Write (invalidate others if they have a copy) Write (invalidate others if they have a copy and get an ownership) Write invalidate Replacement

CSS434 DSM19 Implementing Sequential Consistency Locating Data – Fixed Distributed-Server Algorithms AddressOwner 0P0 1 2P2 Addr0 writable Addr1 read owned Addr5 writable Addr3 read owned Addr7 writable Addr2 read owned Addr6 writable Addr8 read owned AddressOwner 6P2 7P1 8P2 AddressOwner 3P1 4P2 5P0 Addr4 read owned Processor 0 Processor 1 Processor 2 Read addr2 Addr2 read only Location search Read request Block replication

CSS434 DSM20 Implementing Sequential Consistency Locating Data – Dynamic Distributed-Server Algorithms AddressProbable 0P0 1 2P2 Addr0 writable Addr1 read owned Addr5 writable Addr3 read owned Addr7 writable Addr2 read owned Addr8 read owned AddressProbable 2P2 7P1 8P2 AddressProbable 3P1 4P2 5P0 Addr4 read owned Processor 0 Processor 1 Processor 2 Read addr2 Addr2 read owned Addr2 read only Location search Read request Block replication p1 Breaking the chain of nodes: When the node receives an invalidation When the node relinquishes ownership When the node forwards a fault request The node points to a new owner

CSS434 DSM21 Replacement Strategy Which block to replace Non-usage based (e.g. FIFO) Usage based (e.g. LRU) Mixed of those (e.g. Ivy ) Unused/Nil: replaced with the highest priority Read-only: the second priority Read-owned: the third priority Writable: the lowest priority and LRU used. Where to place a replaced block Invalidating a block if other nodes have a copy. Using secondary store Using the memory space of other nodes

CSS434 DSM22 Thrashing Thrashing: Two or more processes try to write the same shared block. An owner keeps writing its block shared by two or more reader processes. The larger a block, the more chances of false sharing that causes thrashing. Solutions: Allow a process to prevent a block from accessed from the others, using a lock. Allow a process to hold a block for a certain amount of time. Apply a different coherence algorithm to each block. What do those solutions require users to do? Are there any perfect solutions?

CSS434 DSM23 Paper Review by Students IVY Dash Munin Linda/Jini/JavaSpace Discussions: Classify which system is based on sequential consistency, release consistency, and lazy release consistency. Classify the shared data granularity of these systems: cache- line based, page-based, and object-based. Classify the implementation of these systems: hardware implementation, OS implementation, and User-level implementation.

CSS434 DSM24 Non-Turn-In Exercises 1. Is the memory underlying the following execution of two processes sequentially consistent (assuming that, initially, all variables are set to zero)? P1: R(x)1; R(x)2; W(y)1 1. P2:W(x)1; R(y)1; W(x)2 2. Show that the following history is not causally consistent. 1. P1:W(a)0; W(a)1 2. P2:R(a)1; W(b)2 3. P3:R(b)2; R(a)0 3. Explain the relationship between false sharing and data granularity in DSM.

CSS434 DSM25 Non-Turn-In Exercises Processor 1 ownership table addr owner P0 P3 shared P3 addr 0 addr 1 eve nt Processor 2 ownership table addr owner P2 P3 P0 shared addr 3 addr 7 addr 8 Processor 3 ownership table addr owner P3 P2 shared addr 2 addr 4 addr 6 data items copy addr1 4 4.There is a DSM system that is based on the write-invalidation protocol, uses a fixed distributed-server algorithm for locating a given data item, and consists of three processors such as 1, 2, and 3. Each processor has the following data items and an ownership/sharing-processor table.

CSS434 DSM26 Non-Turn-In Exercises Given the following sequence of memory accesses, draw additional arrows and circles in the above figure as instructed. To distinguish which arrow corresponds to which operation, add the operation number 1 – 8 to each arrow. Also, update the corresponding ownership table entries. (1) Memory access #1: Processor 2 reads data from address 2. Add arrows in the above figure to indicate operations required for the memory access #1. 1. Send a query to search for the address 2 2. Send a request to read from the address 2 3. Read data from the address 2 to Processor 2 Update the corresponding ownership table entry. (Just add P2 in the “ share ” field.) Draw a circle to indicate that a copy of address 2 was created on Processor 2. (2) Memory access #2: Processor 1 reads data from address 2. Add arrows in the above figure to indicate operations required for the memory access #2. 4. Send a query to search for the address 2 5. Send a request to read from the address 2 6. Read data from the address 1 to Processor 2 Update the corresponding ownership table entry. (Just add P1 in the “ share ” field.) Draw a circle to indicate that a copy of address 2 was created on Processor 1. (3) Memory access #3: Processor 2 writes data to address 2. Add arrows in the above figure to indicate operations required for the memory access #3. 7. Send a request to update the ownership information on the address 2 8. Send a write invalidation to all non-owner processors sharing the address 2 Update the corresponding ownership table entry. (Make Processor 2 a new owner of address 2 and cross out all other processor Ids in the entry.) Cross out all circles to indicate that old copies of address 2 were all invalidated.