Download presentation
Presentation is loading. Please wait.
Published byCaroline Blake Modified over 9 years ago
1
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel
2
Agenda DSM Overview TreadMarks Overview Vector Clocks Multi-writer Protocol (diffs) TreadMarks Algorithm Implementation Limitations
3
DSM Overview Global address space virtualization of disparate physical memory Program using normal thread/locking techniques (no MPI) Proc Mem Proc Mem Proc Mem Proc Mem
4
DSM Overview Communication overhead incurred to synchronize memory Maximize parallel computation and limit communication to improve performance Proc Mem Proc Mem Proc Mem Proc Mem
5
TreadMarks Overview Minimize communications to improve DSM performance Lazy Release Consistency (Vector Clocks) Multiple Writers (Lazy Diff Creation) Delay communication as long as possible (possibly even avoid)
6
TreadMarks Overview Release Consistency Release Consistency: Shared memory updates must be visible when the release is visible No need to send updates immediately upon write P1P1 P2P2 w(x)
7
TreadMarks Overview Lazy Release Consistency Lazy Release Consistency: Shared memory updates are not made visible until the time of acquire No update propagated if update never acquired P1P1 P2P2 w(x)
8
Vector Clocks Global clock mechanism for identifying causal ordering of events in distributed systems Mattern (1989) and Fidge (1991) P1P1 P2P2 P3P3
9
Vector Clocks Each process maintains a vector of counters One for each process in the system P1P1 P2P2 P3P3 000000 000000 000000
10
Vector Clocks Each process maintains a vector of counters One for each process in the system P1P1 P2P2 P3P3 000000 000000 000000
11
Vector Clocks Increments own counter upon Local Event P1P1 P2P2 P3P3 000000 000000 000000 100100
12
Vector Clocks Increments own counter upon Local Event P1P1 P2P2 P3P3 000000 000000 000000 100100 001001
13
Vector Clocks Increments own counter and updates all other counters upon Receiving Message P1P1 P2P2 P3P3 000000 000000 000000 100100 001001 202202 002002
14
Vector Clocks Increments own counter and updates all other counters upon Receiving Message P1P1 P2P2 P3P3 000000 000000 000000 100100 001001 202202 002002 302302 312312
15
Diff Creation Retains copy of page upon first writing P2P2 P1P1
16
Diff Creation Retains copy of page upon first writing P2P2 P1P1
17
Diff Creation Create diff by comparing modified page against original (RLC) P2P2 P1P1
18
Diff Creation Send diff to other processes P2P2 P1P1
19
Lazy Diff Creation Diffs created only when a page is invalidated Or the modifications are requested explicitly access miss on invalidated page P2P2 P1P1
20
TreadMarks Algorithm P 1 Cannot proceed past acquire until: All modifications have been received from processes whose vector timestamps are smaller P 1 ’s P1P1 P3P3 000000 000000 100100 001001
21
TreadMarks Algorithm On acquire: P 1 Sends Vector Timestamp to releaser P1P1 P3P3 000000 000000 100100 001001 100100
22
TreadMarks Algorithm On acquire: P 1 Sends Vector Timestamp to releaser P 2 Attaches invalidations for all updated counters P1P1 P3P3 000000 000000 100100 001001 100100 101101 invalidate
23
TreadMarks Algorithm On acquire: P 1 Sends Vector Timestamp to releaser P 2 Attaches invalidations for all updated counters P 2 Sends updated Vector Timestamp with invalidations P1P1 P3P3 000000 000000 100100 001001 101101 invalidate 101101
24
TreadMarks Algorithm Diffs generated when: Receiving invalidation (i.e. P 1 had made prior updates to this page also) Page is accessed (miss) P1P1 P3P3 000000 000000 100100 001001 101101 invalidate diff w(x)
25
TreadMarks Implementation Data Structures Page array page 1 2 proc_id Write notice record Diff pool Proc array 1 Interval* record *VC counter
26
TreadMarks Implementation Locks Each lock is statically assigned a manager (RR) Keeps track of processors Lock acquires are sent to manager (forwarded to last processor to obtain lock) Upon release, sends (for each interval): Processor ID and Vector Timestamp Any invalidations that are necessary
27
TreadMarks Implementation Barriers Centralized barrier Manager Upon arrival at barrier: Notifies Manager of intervals that the manager does not already have Incorporated when Manager arrives at barrier When all clients have arrived: Manager notifies all clients of intervals they do not already have Expensive
28
Limitations Achieved nearly linear speedup for TSP, Jacobi, Quicksort, ILINK algorithms Water: Each molecule in simulation is protected by lock and frequently accessed Barriers used in synchronization Speedup is limited by low computation to communication ratio of algorithm (many fine-grained messages)
29
Limitations TSP: Eager Release Consistency performs better than Lazy Release Consistency (Fig. 9) Updates occur on invalidation and access misses (writes/synchronization points) TSP algorithm reads stale ‘current minimum’ value without synchronization
30
Limitations Depends on events (write/synchronization) to trigger consistency operations More opportunities to read stale data (TSP) Reduced redundancy increases risk of data loss
31
Summary Improves performance by improving computation to communication ratio Delay consistency updates until page access is acquired Weaker consistency implies greater likelihood of reading stale data and data loss Procrastination = Performance
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.