Lightweight Logging For Lazy Release Consistent DSM Costa, et. al. CS 717 - 11/01/01.

Slides:



Advertisements
Similar presentations
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Advertisements

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Multiple-Writer Distributed Memory. The Sequential Consistency Memory Model P1P2 P3 switch randomly set after each memory op ensures some serial order.
Distributed Shared Memory
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 25: Distributed Shared Memory All slides © IG.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CS 582 / CMPE 481 Distributed Systems
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Memory consistency models Presented by: Gabriel Tanase.
CS 582 / CMPE 481 Distributed Systems
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Memory Management Chapter 5.
Distributed Resource Management: Distributed Shared Memory
Memory Consistency Models
Consistency. Consistency model: –A constraint on the system state observable by applications Examples: –Local/disk memory : –Database: What is consistency?
PRASHANTHI NARAYAN NETTEM.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
Distributed Deadlocks and Transaction Recovery.
Distributed Shared Memory Systems and Programming
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.
Checkpointing and Recovery. Purpose Consider a long running application –Regularly checkpoint the application Expensive task –In case of failure, restore.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck)
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
Distributed Shared Memory Presentation by Deepthi Reddy.
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Consider the program fragment below left. Assume that the program containing this fragment executes t1() and t2() on separate threads running on separate.
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004.
CIS 720 Distributed Shared Memory. Shared Memory Shared memory programs are easier to write Multiprocessor systems Message passing systems: - no physically.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck)
Distributed Shared Memory
Virtual Memory - Part II
Memory Consistency Models
The University of Adelaide, School of Computer Science
Memory Consistency Models
Ivy Eva Wu.
O.S Lecture 13 Virtual Memory.
Chapter 9: Virtual-Memory Management
Consistency Models.
Lecture 26 A: Distributed Shared Memory
Distributed Shared Memory
Interrupt handling Explain how interrupts are used to obtain processor time and how processing of interrupted jobs may later be resumed, (typical.
The University of Adelaide, School of Computer Science
Lecture 26 A: Distributed Shared Memory
Programming with Shared Memory - 3 Recognizing parallelism
Distributed Resource Management: Distributed Shared Memory
Lecture 17 Multiprocessors and Thread-Level Parallelism
Programming with Shared Memory Specifying parallelism
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Presentation transcript:

Lightweight Logging For Lazy Release Consistent DSM Costa, et. al. CS /01/01

Definition of a SDSM In a software distributed shared memory (SDSM), each node runs its own operating system, and has a local physical memory Each node runs a local process. The these processes form the parallel application The union of the local memory of each of the local processes form the global memory of the application The global memory appears as one virtual address space – a process accesses all memory locations in the same manner, using standard load and stores

Basic Implementation of a SDSM The virtual add. space is divided among different memory pages, which are distributed among the local memory of the different processes Each node has a copy of the page to node assignments We use the hardware’s virtual memory support to provide the appearance of SM (page table and faults) The SDSM system is implemented as fault handler routines Such a system is also called a SVM system

Illustration N1N2N3 P1P3P4P5 P2 The same virtual page might appear in multiple physical pages, on multiple nodes P5

SDSM Operation If N2 attempts to write x on P2 P2 is marked as invalid on N2’s page table, so access will cause a fault Fault handler checks page-node map, and then requests that N3 send it P2 N3 sends page, and notifies all nodes of the change N3 sets page access to “invalid” N2 sets page access to “read/write” Handler returns Multiple N’s can have the same P in their physical add. space, if P is “read-only” for all of them, but only one N can have a copy of P if it is “read/write”

Page Size Granularity Memory access is managed at the granularity of an OS page Easy to implement Can be very inefficient If N exhibits poor spatial locality, a lot of unnecessary data transfer If both x and y are on the same page, P, and N1 is repeatedly writing to x while N2 is writing to y, P will be continually sent back and forth between N1 and N2 – false sharing

Sequential Consistency Defined by Lamport as: A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor occur in this sequence in the order specified by its program

Is this SDSM Sequentially Consistent? Assume a and b are on P1 and P2 respectively N1 N2 a = 1 print b b = 1 print a If N2 does not invalidate its copy of P1, but does invalidate P2, the output will be which is invalid under SC

Ensuring Sequential Consistency For the system to be SC, N1 must ensure that N2 invalidated its copy of a page before it can write to that page Before a write, N1 must tell N2 to invalidate its copy of the page, and then wait for N2 to acknowledge that it has done so Of course, if we know that N2’s copy is already invalidated, we don’t need to do this N2 could not have re-obtained access with out N1’s copy being invalidated

Ping-Pong Effect SC, combined with the large sharing granularity (OS page), can lead to the ping-pong effect Substantial, expensive, communication cost due to false- sharing

A Problem With SC N1 is continually writing to x while N2 is cont. reading from y, both on the same P N2 has P in “read-only”, N1 has P in “r-o” N1 attempts to write to x, faults, tells N2 to go to “invalid” N1 waits for N2 to go to “invalid”, N1 goes to “r/w”, N1 does write N2 tries to read, faults, tells N1 to go to “r-o”, and send current copy of P, N2 goes to “r-o” N2 gets P, does read

Ping-Pong Effect N2 N1 R/O inval R/WR/O inval R/W … inval ack req reply

Relaxing the Consistency Model The memory consistency model specifies constraints on the order in which memory operations appear to to execute wrt. each other Can we relax the consistency model to improve performance?

Release Consistency Certain operations are specified as ‘acquire’ and ‘release’ operations Code below an acquire can never be moved above the acquire Code above the release can never be moved below the release As long as there are no race-conditions, behavior of program same under RC or SC

RC Illustration I acq II rel III I acq II rel III

Lazy Release Consistency (LRC) In order for a system to be RC, it must ensure that all memory writes above a release become visible before that release is visible i.e., before issuing a release, it must invalidate all other copies of the same page Can we relax this further?

LRC LRC is a further relaxation: Lets not invalidate pages until absolutely necessary N1: I, acquire, II, release N2: III, acquire, IV, release Only when N2 is about to issue an acquire, does N1 ensure that all changes it make before its release are visible N1 invalidates N2’s copy of the pages before N2 does its acquire

Illustration RC LRC N1 N2 AI R invalack AII R invalack A… N1 N2 AI R AII R invalack A…

TreadMarks A high performance SDSM Implements LRC Keller, Cox, Zwaenepoel 1994

Intervals The execution of each process is divided into intervals, beginning at a synchronization access (acq. or release) These form a partial order: intervals on the same process are totally ordered intval. x precedes y if the release that ended x corresponds to the acquire that began y When a process begins a new interval, it creates a new IntervalRecord

Vector Clocks Each process also keeps a current vector clock, VC, If VC N is process N’s vector clock, VC N (M) is the most recent interval of process M that process N knows about VC N (N) is therefore the current interval of process N

Interval Records An IntervalRecord is a structure containing: The pid of the process that created this record The vector-clock timestamp of when this interval was created A list of WriteNotices

Write Notices A WriteNotice is a record containing: The page number of the page written to A diff showing the changes made to this page A pointer to the corresponding IntervalRecord

Acquiring A Lock When N1 wants to acquire a lock, it sends its current vector clock to the Lock Manager The Lock Manager forwards this message to the last process that acquired this lock (assume N2)

N2 replies (to N1) with all the IntervalRecords that have a timestamp between the VC sent by N1 and the VC of the IR that ended with the most recent release of that lock

N1 received IntervalRecords from N2 N1 stores these IntervalRecord in volatile memory N1 invalidates all pages for which it received a WriteNotice (in the IRs) On a page fault, N1 obtains a copy of the page, and then applies all the diffs for that page in interval order If N1 is about to write to that page, it makes a copy of it (so that it can compute the diff of its changes)

Example N1 N2 N3 acqwrite Prel IR/DIFF acq Apply diff Request write Prel acq IR/DIFF Apply diff Request write Prel IR/DIFF

Example (cont.) If N1 were to issue another acquire, it would only have to apply the diffs in the IR of time and, because its current VC was

Improvement: Garbage Collection Each N is keeping a log of all shared memory writes that it made, along with all writes that it needed to know about At a barrier, Ns can synchronize, so that each N has the most up to date copy of its pages, and the logs could then be discarded

Improvement: Sending Diffs You might notice that if N1 writes to pages P1, P2, P3 during an interval, and N2 acquires the lock next, N1 needs to send the three diffs to N2, regardless if N2 will actually need those pages In truth, N1 does not send the diffs, it sends a pointer to its local memory, where the diff is located If N2 needs to apply that diff, it will request that diff from N1, using that pointer

Adding Fault Tolerance Assume we would like the ability to survive single node failure (only one fails at a time, but multiple failures may occur during the running of the application) What information would we need to log, and where? Remember, we already log IntervalRecords and WriteNotices as part of the usual operation of TreadMarks

Ni fails and then restarts If it acquires a lock, it must see the same version of the page that it saw during the original run Therefore Nj must send it the same WriteNotices (diffs) as before, even though Nj’s current version of the page might be very different, and Nj’s vector clock has also changed

Example If N3 is restarted, when it reissues the acquire, it must receive the same set of WriteNotices as it had during its original run. If we run the algorithm un-modified, N3 would receive, and the application would be incorrect N1 N2 N3 X ACQ/WRI/REL IR

Send Log Therefore, N2 needs some way of logging which IntervalRecords it had sent to N3 It does this by storing the VC of N3 when it issued the acquire (this was sent to it with the request) and the VC of N2 when it received the request This is stored in N2’s send-log From these two VC’s, N2 can determine which IntervalRecords it had sent to N1

Example N1 N2 N3 X ACQ WRI REL IR Send-Log: {N2, } Send-Log: {N3, }

Restart When N3 restarts, it will request the acquire at time N2 will look in its send log, and see that when it received an acquire request from N3 at, it was at time, so it will send the IR of all the intervening intervals Therefore, N3 receives the same diffs as it did before

Logging, cont. Is the send-log sufficient to provide the level of fault-tolerance that we wanted? Imagine N2 had failed, and then restarted, could we then survive the failure of N3?

Logging No, we could not survive the subsequent failure of N3, because N2 no longer had its send-log We also need a way to recreate N2’s send log

Receive-Log On every acquire, N, logs its vector time, before the acquire and its new vector time after seeing the IntervalRecords sent to it by M in N’s receive-log If M fails, M’s send-log can be recreated from N’s receive-log

Example N1 N2 N3 X IR Send-Log: {N2, } Send-Log: {N3, } Recv-Log: {N1, } Recv-Log: {N2, }

If N2 were to fail, it would get restarted N1’s send-log will ensure that N2 sees the same page as it did originally When, in the future, N3 sees a VC time later than that in its receive log (wrt. N2) it will forward the information in its receive-log to N2 N2 will recreate its send-log We could now survive future failures

Checkpointing When we arrive at garbage collection point, we could checkpoint all processes Minimize rollback Survive concurrent failures Empty logs

Results

Results 2 Appl.Log Size (MB) Avg. Ckpt. Size (MB) Water SOR TSP

Results 3