Download presentation
Presentation is loading. Please wait.
Published byKelley Parks Modified over 9 years ago
1
Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix Conference 1994 2008-22952 Jun Lee
2
DSM (distributed shared memory) A software system for parallel computation Shares distributed memories Easier programming −Provide a single global address space
3
DSM (distributed shared memory) No widely available DSM implementations In-house research platforms Kernel modifications Poor performance −Imitating consistency protocols of hardware −False sharing
4
Treadmarks Objectives Commercially available workstations and OS −Standard Unix system on DECstation Efficient user-level DSM implementation −Reduce communication overhead Design LRC (lazy release consistency) Multiple writer protocol Lazy diff creation
5
Consistency protocol (SC) Sequential Consistency Every write visible “immediately” Single writer P0P1 R(a):0 W(a):1 R(a):1
6
Consistency protocol (SC) Sequential Consistency Every write visible “immediately” Single writer P0P1 R(a):0 W(a):1 R(a):? R(a):1 Big problem with page size granularity
7
Page X Consistency protocol (SC) Sequential Consistency Every write visible “immediately” Single writer W(x0):a W(x1):b a W(x2):c W(x3):d P0P1 a Page X bbccd False sharing
8
Consistency protocol (RC) Release Consistency Relaxed memory consistency model −delay making its changes visible to other processors until certain synchronization accesses occurs Synchronization points −Acquire(), Release() (similar to locks, barriers) Two types −ERC (eager), LRC (lazy)
9
Consistency protocol (RC) Release Consistency Acquire() and release() are sequentially consistent −Release() is performed after all previous operations have completed −Operations are performed after previous acquire() have been performed Acquire() and release() pair between conflicting accesses −SC and RC produce the same results.
10
Consistency protocol (RC) ERC Write information is delivered at the release point P0P1 Acquire(L) R(a):0 W(a):1 Release(L) Acquire(L) R(a):? Release(L) R(a):1 Write Notice
11
Consistency protocol (RC) ERC Write information is delivered at the release point P0P1 Acquire(L) R(a):0 W(a):1 Release(L) Acquire(L) R(a):1 Release(L) Acquire(K) Release(K)
12
Consistency protocol (RC) LRC The delivery is postponed until the acquire Fewer messages than ERC P0P1 Acquire(L) R(a):0 W(a):1 Release(L) Acquire(L) R(a):? Release(L) R(a):1
13
Consistency protocol (RC) ERC vs. LRC P0P1 Acquire(L) R(a):0 W(a):1 Release(L) P2 R(a):0 P3 R(a):0 Acquire(L) Release(L) R(a):1 ERC
14
Consistency protocol (RC) ERC vs. LRC P0P1 Acquire(L) R(a):0 W(a):1 Release(L) P2 R(a):0 P3 R(a):0 Acquire(L) Release(L) R(a):1 ERC P0P1 Acquire(L) R(a):0 W(a):1 Release(L) P2 R(a):0 P3 R(a):0 Acquire(L) Release(L) R(a):1 LRC
15
Page X Multiple writer protocol Page X W(x0):a W(x1):b W(x2):c W(x3):? P0P1 abc Acquire(L) Release(L) Acquire(L) ac W(x0):a W(x1):b W(x2):c W(x3):d P0 P1 a Page X bc abcd False sharing
16
Page X Multiple writer protocol Page X W(x0):a W(x1):b W(x2):c W(x3):? P0P1 abcd W(x0):a W(x1):b W(x2):c W(x3):d P0 P1 a Page X bc abcd False sharing Acquire(L) Release(L) Acquire(L) Release(L) W(x3):d ac
17
Page X Twin and Diff Page X W(x0):a W(x1):b W(x2):c W(x3):? P0P1 abcd Acquire(L) Release(L) Acquire(L) Release(L) W(x3):d Twin X Diff ac diff ac
18
Page X Twin and Diff Page X W(x0):a W(x1):b W(x2):c W(x3):? P0P1 abcd Acquire(L) Release(L) Acquire(L) Release(L) W(x3):d Twin X Diff ac diff ac Twin X b ac Diff b diff ac interval
19
Implementation
20
Etc. Lock & barrier Statically assigned manager Garbage collection reclaim the space used by write notice records, interval records, and diffs Triggered when the free space drops below a threshold
21
Evaluation Experimental Environment 8 DECstation-5000/240 connected to a 100-Mbps ATM LAN and a 10-Mbps Ethernet Applications Water – molecular dynamics simulation Jacobi – Successive Over-Relaxation TSP – branch & bound algorithm to solve the traveling salesman problem Quicksort – using bubblesort to sort subarray of less than 1K element ILINK – genetic linkage analysis
22
Evaluation Speedup Execution statistics
23
Evaluation Execution time breakdown
24
Evaluation Unix overhead breakdownTreadMarks overhead breakdown
25
Evaluation Execution time breakdown for Water
26
Evaluation ERC vs. LRC Speedup Message rate
27
Evaluation ERC vs. LRC Data rate Diff creation rate
28
Implementation... pages time stamp 000... pages time stamp 000 Acq(L) P0 sideP1 side P P
29
Implementation... pages time stamp 100... pages time stamp 000 Acq(L) P0 sideP1 side W(a) Rel(L) a W(b) b twin P0 W.N P1 W.N P P
30
Implementation... pages time stamp 200... pages time stamp 000 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(b) b twin P0 W.N P1 W.N
31
Implementation... pages time stamp 200... pages time stamp 000 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(b) b twin P0 W.N P0 100 P1 W.N
32
Implementation... pages time stamp 200... pages time stamp 110 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(c) W(b) b twin P0 W.N P0 W.N P1 b diff a P a W.N P0P1 100
33
Implementation... pages time stamp 200 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(c) Rel(L) W(b) P0 W.N a diff P... pages time stamp 110 b P0 W.N P1 b diff a a W.N P0P1 100 c W.N P twin b a
34
Implementation... pages time stamp 200 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(c) Rel(L) W(b) P0 W.N a diff P... pages time stamp 120 b P0 W.N P1 b diff a a W.N P0P1 100 c W.N twin b a
35
Implementation P1 sideP2 side P0P1 100... pages time stamp 000 P Acq(L)... pages time stamp 120 b P0 W.N P1 b diff a a W.N c P1 W.N twin b a 000110
36
Implementation P1 sideP2 side P0P1 100... pages time stamp 111 P Acq(L)... pages time stamp 120 b P0 W.N P1 b diff a a W.N c P1 W.N twin b a 000110 P0 W.N P1 W.N P1 W.N
37
Thank you ! Any questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.