Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

Slides:



Advertisements
Similar presentations
The Effect of Network Total Order, Broadcast, and Remote-Write on Network- Based Shared Memory Computing Robert Stets, Sandhya Dwarkadas, Leonidas Kontothanassis,
Advertisements

SE-292 High Performance Computing
Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Multiple-Writer Distributed Memory. The Sequential Consistency Memory Model P1P2 P3 switch randomly set after each memory op ensures some serial order.
Presented by Evan Yang. Overview of Munin  Distributed shared memory (DSM) system  Unique features Multiple consistency protocols Release consistency.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Distributed Shared Memory
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 4 -- Spring 2001.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Memory consistency models Presented by: Gabriel Tanase.
Distributed Resource Management: Distributed Shared Memory
Memory Consistency Models
Consistency. Consistency model: –A constraint on the system state observable by applications Examples: –Local/disk memory : –Database: What is consistency?
CSS434 DSM1 CSS434 Distributed Shared Memory Textbook Ch18 Professor: Munehiro Fukuda.
PRASHANTHI NARAYAN NETTEM.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed Shared Memory.
CompSci 143A1 9. Linking and Sharing 9.1 Single-Copy Sharing –Why Share –Requirements for Sharing –Linking and Sharing 9.2 Sharing in Systems without Virtual.
Distributed Shared Memory Systems and Programming
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Distributed Shared Memory (DSM)
2008 dce Distributed Shared Memory Pham Quoc Cuong & Phan Dinh Khoi Use some slides of James Deak - NJIT.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
B. Prabhakaran 1 Distributed Shared Memory DSM provides a virtual address space that is shared among all nodes in the distributed system. Programs access.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
CS425/CSE424/ECE428 – Distributed Systems Nikita Borisov - UIUC1 Some material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra,
An Efficient Lock Protocol for Home-based Lazy Release Consistency Electronics and Telecommunications Research Institute (ETRI) 2001/5/16 HeeChul Yun.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Distributed Shared Memory Presentation by Deepthi Reddy.
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
Ronny Krashinsky Erik Machnicki Software Cache Coherent Shared Memory under Split-C.
DISTRIBUTED COMPUTING
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004.
CIS 720 Distributed Shared Memory. Shared Memory Shared memory programs are easier to write Multiprocessor systems Message passing systems: - no physically.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 7, 2005 Session 23.
Lecture 28-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) December 2, 2010 Lecture 28 Distributed.
Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.
Distributed Shared Memory
Lecture 18: Coherence and Synchronization
Reactive Synchronization Algorithms for Multiprocessors
Ivy Eva Wu.
Chapter 10 Distributed Shared Memory
CMSC 611: Advanced Computer Architecture
Outline Midterm results summary Distributed file systems – continued
Distributed Shared Memory
Distributed Shared Memory
CSS490 Distributed Shared Memory
Lecture 25: Multiprocessors
Lecture 25: Multiprocessors
Lecture 24: Multiprocessors
Distributed Resource Management: Distributed Shared Memory
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
Presentation transcript:

Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of Toronto (Original Authors: J. B. Carter, et al.) ECE 1147, Parallel Computation Oct. 30, 2006

2 Distributed Shared Memory Shared address space spanning the processors of a distributed memory multiprocessor proc1proc3 X=0 proc2 X=0

3 Distributed Shared Memory mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory

4 Distributed Shared Memory Challenges –Good performance comparable to shared memory programs –No significant deviation from shared memory coding model –Low communication and message passing overheads

5 Munin System Characterized features –Software released consistency –Multiple consistency protocols Deviations from shared memory model –Annotated shared memory variable pattern –All Synchronization visible to system

6 Contents Basic concepts –Shared object –Software release consistency –Multiple consistency protocols Software implementation –Prototype overview –Execution process –Advanced programming features –Data object directory and delayed update queue –Synchronization Performance Overview of other DSM systems Conclusion

7 Basic Concepts Basic concepts –Shared object –Software release consistency –Multiple consistency protocols Software implementation –Prototype overview –Execution process –Advanced programming features –Data object directory and delayed update queue –Synchronization Performance Overview of other DSM systems Conclusion

8 Shared Object x y x x 8-kilo

9 Software Release Consistency Sequential Consistency –All processors observe the same order –Must correspond to some serial order –Only ordering constraint is that reads/writes of P1 appear in the same order, but no restrictions on relative ordering between processors. Synchronous read/write –Writes must be propagated before moving on to the next operation

10 Software consistency Problems –Message passing overhead –False sharing w(x) r(y) r(x) w(x)

11 Weak Consistency Data modifications only propagated at synchronization. Works fine if program properly synchronized through system primitives. w(x) r(y) r(x) synch w(x)

12 Weak Consistency w(x) r(y) r(x) synch

13 Software Release Consistency Special weak consistency protocol Reduction of message passing overhead Two categories of shared variable operations –Ordinary access Read Write –Synchronization access (lock, semaphore, barrier) Acquire Release

14 Software Release Consistency Before ordinary access (read, write) allowed, all previous acquire performed Before release allowed, all previous ordinary access performed Before acquire allowed, all previous release performed Before release allowed, all previous acquire performed In a word, results of writes prior to a release propagated before next processor acquiring this released lock

15 Eager Release Consistency Write propagating at release

16 Lazy Release Consistency Write propagating at acquire

17 Multiple Consistency Protocols No single consistency protocol suitable for all parallelization purpose Shared variables accessed in different ways within single program Variable access pattern changes during execution Multiple protocols allow access pattern-oriented tuning for different shared variables

18 Multiple Consistency Protocols High-level sharing pattern annotation –Specified in shared variable declaration –Combinations of low-level protocol parameters Low-level protocol parameter –Specified in shared variable directory –Specific aspect of protocol

19 Protocol Parameters I:invalidate or update? R:Replicas allowed? D:Delayed operation allowed? FO:Having fixed owner? M:Multiple writers allowed? S:Stable access pattern? FL:Flushing changes to owner? W:Writable? (write protected?)

20 Sharing annotations Read only –Simplest pattern: once initialized, no further access –Suitable for constant etc. Migratory –Only one thread can access at one period of time –Suitable for variables accessed only in critical session Write-shared –Can be written concurrently by multiple threads –Different threads update different words of variable Producer-consumer –Written only by one threads and read by others –Replicate and update the object, not invalidate

21 Sharing annotations Example: producer-consumer for some number of timesteps/iterations { for (i=0; i<n; i++ ) for( j=1, j<n, j++ ) temp[i][j] = 0.25 * ( grid[i-1][j] + grid[i+1][j] grid[i][j-1] + grid[i][j+1] ); for( i=0; i<n; i++ ) for( j=1; j<n; j++ ) grid[i][j] = temp[i][j]; } back

22 Sharing annotations Reduction –Accessed by fetching and operation (read, write then release) –Example: min(), a++ Result –Phase 1: multiple write allowed –Phase 2: one thread (the result) access exclusively Conventional –Conventional update protocol for shared variables

23 Sharing annotations Sharing Annotations Protocol Parameters IRDFOMSFLW Read-onlyNY-----N MigratoryYN-NN-NY Write-sharedNYYNYNNY Producer- Consumer NYYNYYNY ReductionNYNYN-NY ResultNYYYY-YY ConventionalYYNNN-NY

24 Software Implementation Basic concepts –Shared object –Software release consistency –Multiple consistency protocols Software implementation –Prototype overview –Execution process –Advanced programming features –Data object directory and delayed update queue –Synchronization Performance Overview of other DSM systems Conclusion

25 Prototype Overview A simple processor converting annotations to suitable format A linker creating the shared memory segment Library routines linked into program Operating system support for fault handling and page table manipulation

26 Execution Process Compiling Sharing annotations Munin processor Auxiliary file Linker Shared data segment Shared data description table

27 Execution Process Initialization P1 P2 Pn.... Munin root thread Munin worker thread User_init() Code copy Data segment Code copy Data segment user root thread

28 Execution Process Synchronization P1 P2 Pn.... Munin root thread Munin worker thread Synchronization operation User thread

29 Advanced Programming Features Associate data & Synch backback msg acq(m) r(x) rel(m) msg acq(m) r(x) rel(m) w(x)

30 Advanced Programming Features PhaseChange() –Change the producer consumer relationship –Example: adaptive mesh sorsor ChangeAnnotation() –Change the access pattern in execution Invalidate() Flush() SingleObject() PreAcquire()

31 Data Object Directory Start Address and Size Protocol parameters Object state (valid, writable, invalid) Copyset (which remote has copies) Synchq (corresponding synchronization object)Synchq Probable owner Home node Access control semaphore Links

32 Delayed Update Queue acq(m) w(x) w(y) rel(m) x x y

33 Multiple Writer Handling

34 Multiple Writer Handling

35 Synchronization Queue based synchronization Request – reply – lock forward mechanism AcquireLock(), Unlock(), WaitAtBarrier()

36 Performance Basic concepts –Shared object –Software release consistency –Multiple consistency protocols Software implementation –Prototype overview –Execution process –Advanced programming features –Data object directory and delayed update queue –Synchronization Performance Overview of other DSM systems Conclusion

37 Matrix Multiply

38 Matrix Multiply Optimized

39 SOR

40 Effect of Multiple Protocols ProtocolMatrix MultiplySOR Multiple Write-shared Conventional

41 Overview of Other DSM System Basic concepts –Shared object –Software release consistency –Multiple consistency protocols Software implementation –Prototype overview –Execution process –Advanced programming features –Data object directory and delayed update queue –Synchronization Performance Overview of other DSM systems Conclusion

42 Overview of Other DSM System Clouds:per-segment (object) based consistency protocol Mirage: per-page based Orca: reliable ordered broadcast protocol Amber:user responsible for the data distribution among processors Linda:shared variable in tuple space, atomic operation: insertion, removal, reading Midway:using entry consistency (weaker consistency than release consistency) DASH:hardware DSM

43 Conclusion Objective: efficient DSM system with similar protocol to shared memory programming and small message passing overhead Special feature: multiple protocols, software release consistency Implementation: synchronization realized by Munin root thread and Munin worker threads

44 Thank you