CSCS: A Concise Implementation of User-Level Distributed Shared Memory Zhi Zhai Feng Shen Computer Science and Engineering University of Notre Dame Dec. 11, 2009 Final Presentation
DSM Overview DSM Characteristics: Physically: distributed memory Logically: a single shared address space Figure 1 DSM architecture
Related Work Models and Main Features: IVY (Yale) - Divided Space: Shared & Private space Mirage (UCLA) - Time Interval d : Avoid page thrashing TreadMarks (Rice) - Lazy Release Consistency : Improve efficiency SAM (Stanford)
System Design Figure 2 Server/Client mode
System Design Server –Holder of metadata only –Thread-based Connection –Event-based Service
System Design Figure 3 Server Process/Threads
System Design Client –Physical memory owner –UI/Work/Page Fetch Thread –Fixed-home Protocol –Not Aware of Peer Clients
System Design Figure 4 Client process/thread
System Design Figure 5 Sample Operation
Implementation Message Passing: TCP socket Figure 6 Message Passing
Implementation Server/Client Page Table –Server holds most up-to-date meta data –Server managers whole virtual memory space –Server records id & addresses of all nodes –Client owns the most up-to-date local memory segment –Client caches referenced pages from peer nodes
Client IDIP Address (e.g.) ….… Page #Frame #Access BitsPage Owner 057PROT_READ1 167PROT_READ|PROT_WRITE1 257PROT_READ3 ………… Figure 7 Connection Table Figure 8 Server Page Table
Implementation Page #Frame #Access BitsPage OwnerRef Count 030PROT_READ10 131PROT_READ10 232PROT_READ14 360PROT_READ|PROT_WRITE PROT_READ50 …………… Figure 9 Client Page Table
Implementation Page fault handler –Client Server Check the access right Fetch the page owner id/address Update global access bits –Client Client Connect to the page owner Cache the referenced page Update local access bits
Implementation Page fault handler –Page fault type Read remote page Write on a page –Assumption Reading happens more often than writing Writing needs most-to-date copy more than reading
Implementation Assume reading remote page dsm call: dsm_do_no_page () Truly a remote reading fault? NO: double page fault dsm call: dsm_do_wrt_page () YES: continue Figure 10 Page fault handler wordflow
Implementation Memory Consistency Model –Assumption Revisit Reading happens more often than writing Writing needs most-to-date copy more than reading –Multi-Reader/Single Writer Snap-shot for reading Every writing triggers page fault –Locks on pages being referenced Semaphore-like reference counts: If ref_count > 0 Waiting/Re-random
DSM Evaluation Figure 11 Parallel Computation on ASP Problem
DSM Evaluation Figure 12 Execution time comparison
DSM Evaluation Figure 13 Message Transmission Comparison
DSM Evaluation Figure 14 Network Traffic Comparison
Future Work Enhance system robustness Evaluate scalability boundary Provide better programmability