Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.

Slides:



Advertisements
Similar presentations
The University of Adelaide, School of Computer Science
Advertisements

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Multiple-Writer Distributed Memory. The Sequential Consistency Memory Model P1P2 P3 switch randomly set after each memory op ensures some serial order.
Consistency Models Based on Tanenbaum/van Steen’s “Distributed Systems”, Ch. 6, section 6.2.
1 Release Consistency Slides by Konstantin Shagin, 2002.
1 Munin, Clouds and Treadmarks Distributed Shared Memory course Taken from a presentation of: Maya Maimon (University of Haifa, Israel).
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
November 1, 2005Sebastian Niezgoda TreadMarks Sebastian Niezgoda.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
(Software) Distributed Shared Memory (aka Shared Virtual Memory)
Lightweight Logging For Lazy Release Consistent DSM Costa, et. al. CS /01/01.
Memory consistency models Presented by: Gabriel Tanase.
Distributed Resource Management: Distributed Shared Memory
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
Memory Consistency Models
Distributed Shared Memory Systems and Programming
Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Distributed Shared Memory (DSM)
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.
CS425/CSE424/ECE428 – Distributed Systems Nikita Borisov - UIUC1 Some material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra,
Performance of the Shasta distributed shared memory protocol Daniel J. Scales Kourosh Gharachorloo 創造情報学専攻 M グェン トアン ドゥク.
Consistency and Replication Chapter 6. Release Consistency (1) A valid event sequence for release consistency. Use acquire/release operations to denote.
1 Lecture 13: LRC & Interconnection Networks Topics: LRC implementation, interconnection characteristics.
An Efficient Lock Protocol for Home-based Lazy Release Consistency Electronics and Telecommunications Research Institute (ETRI) 2001/5/16 HeeChul Yun.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck)
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Distributed Shared Memory Presentation by Deepthi Reddy.
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix.
Memory Consistency Models 1. Uniform Consistency Models Only have read and write operations Sequential Consistency Pipelined-RAM Causal Consistency Coherence.
Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.
DISTRIBUTED COMPUTING
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004.
CIS 720 Distributed Shared Memory. Shared Memory Shared memory programs are easier to write Multiprocessor systems Message passing systems: - no physically.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Consistency and Replication CSCI 6900/4900. FIFO Consistency Relaxes the constraints of the causal consistency “Writes done by a single process are seen.
The University of Adelaide, School of Computer Science
Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.
Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck)
Distributed Shared Memory
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
Relaxed Consistency models and software distributed memory
Pete Keleher, Alan L. Cox, Sandhya Dwarkadas and Willy Zwaenepoel
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Consistency Models.
Outline Midterm results summary Distributed file systems – continued
Distributed Shared Memory
CSS490 Distributed Shared Memory
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Distributed Resource Management: Distributed Shared Memory
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.

Overview u Software DSM u Release Consistency u Eager Release Consistency u Lazy Release Consistency u Conclusion

Software DSM u Provides shared address space using software support u Rely on (user level) memory management techniques to detect access/updates to shared data u Memory coherence protocol – illusion of shared memory u High Communication overheads and Large page- size coherence units u Sending messages expensive in Software DSM

Release Consistency u Extension of weak consistency u Weak Consistency  Synchronization – Globally Update Memory  Local changes propagated to all processors u Release Consistency  Propagates only locked memory as needed.

RC – Shared Memory Accesses Shared Memory Accesses OrdinarySpecial SyncNsync AcquireRelease

RC – Formal Definition u A system is release consistent if  Before an ordinary access is allowed to perform with respect to any other processor, all previous acquires must be performed  Before a release is allowed to perform with respect to any other processor, all the previous reads and writes must be performed.  Special accesses are sequentially consistent with each other.

Eager Release Consistency (based on Munin’s write share protocol) u Release  Modification propagated at release u Invalidate Protocol – Sends invalidations u Update Protocol  Diffs – limit the amount of data exchanged

Eager Release Consistency (..Contd) u Acquire  No consistency related operations  Protocol locates the processor that last executed a release on the same variable u Access Miss  Message to directory manager.  Directory manager forwards request to current owner

Eager Release Consistency P1 P2 P3 P4 w(x) rel acq w(x) rel acq r(x) Repeated Updates of Cached Copies in Eager RC

Lazy Release Consistency u Rather than eagerly “sync up” data at release point, why not “lazily” wait until the subsequent acquire? u Propagation of modifications postponed until the time of an acquire. u To do so happened-before-1 partial order is used.

Lazy Release Consistency P1 P2 P3 P4 w(x) rel acq w(x) rel acq r(x) Message Traffic in LRC

happened-before-1 Partial Order u Shared memory accesses are partially ordered by happened-before-1, denoted by, defined as follows:  If a1 and a2 are accesses on the same processor, and a1 occurs before a2 in program order, then a1 a2  If a1 is a release on processor p1, and a2 is an acquire on the same location on processor p2, and a2 returns the value written by a1, then a1 a2  If a1 a2, a2 a3, then a1 a3. hb1

Write Notices u RC requires that before a processor may continue past an acquire, all shared accesses that precede acquire must be performed at the acquiring processor u LRC – Guaranteed by write notices u Write Notice  Indication of modification

Write Notice Propagation u Execution of each processor is divided into intervals u Interval beginning – special access executed by that processor u Interval performed at a processor  All modifications during that interval have been performed at the processor

Write Notice Propagation P1 P2 P3 P4 w(x) rel acq w(x) rel acq r(x) i p1 i P2 i p3 i p4

Write Notice Propagation u V p (i)  Vector Timestamp for interval i and processor p. u Number of elements in V p (i) = Number of processors u Entry for p in V p (i) = i u Entry for q in V p (i) = Most recent interval of q performed at p

Write Notice Propagation u V p1 (i p1 ) = { i p1, 0, 0, 0} u V p2 (i p2 ) = {i p1, i p2, 0, 0} u V p3 (i p3 ) = {0, i p2, i p3, 0} u V p4 (i p4 ) = {0, 0, i p3, i p4 } u On acquire, the acquiring processor p3 sends its current vector timestamp to previous releaser p2. u Processor p2 uses this information to send p3 the write notices for all intervals of all processors that have performed at p2 but not at p1

Data Movement Protocols u Multiple Writer Protocol u False Sharing  Occurs when two or more processors access different variables within a page, with at least one of the accesses being a write  Generates large amount of message traffic  Handling false sharing for software DSM – important because of large page size u LRC allows multiple writer protocol:  Allows concurrent writes to different part of the page  No message traffic  Modifications merged using diffs

Invalidate Vs Update u Invalidate  Acquiring processor invalidates all pages in its cache for which it receives write notices. u Update  Updates those pages  Diffs must be obtained for all concurrent modifiers.  For interval i, diffs must be obtained from all intervals j, such that, j i, and there exists no k such that j k i hb1

Access Misses u Copy of page as well as a number of diffs may have to be retrieved u Modifications summarized by diffs are merged before access u Access Miss:  At interval i, diffs must be obtained from all intervals j, such that, j i, and there exists no k such that j k i u If processor has an invalidated copy of page  Whole page not sent  Write-notices contain all the necessary information of diffs  Reduces the amount of data sent. hb1

Conclusion u Performance of Software DSM – Sensitive to the number of messages and amount of data exchanged to create shared memory abstraction. u LRC aims at reducing both the number of messages and amount of data exchanged by allowing changes to propagate lazily, only when needed.