Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.

Slides:



Advertisements
Similar presentations
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Advertisements

Distributed Shared Memory
Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 4 -- Spring 2001.
12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
Distributed Resource Management: Distributed Shared Memory
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
Computer Organization and Architecture
PRASHANTHI NARAYAN NETTEM.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
ECE669 L17: Memory Systems April 1, 2004 ECE 669 Parallel Computer Architecture Lecture 17 Memory Systems.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed Shared Memory.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Multi-level Hashing for Peer-to-Peer System in Wireless Ad Hoc Environment Dewan Tanvir Ahmed and Shervin Shirmohammadi Distributed & Collaborative Virtual.
Distributed Shared Memory Systems and Programming
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Distributed Shared Memory (DSM)
2008 dce Distributed Shared Memory Pham Quoc Cuong & Phan Dinh Khoi Use some slides of James Deak - NJIT.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
B. Prabhakaran 1 Distributed Shared Memory DSM provides a virtual address space that is shared among all nodes in the distributed system. Programs access.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
CS425/CSE424/ECE428 – Distributed Systems Nikita Borisov - UIUC1 Some material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra,
Serverless Network File Systems Overview by Joseph Thompson.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Distributed Shared Memory Presentation by Deepthi Reddy.
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
Memory Coherence in Shared Virtual Memory Systems Yeong Ouk Kim, Hyun Gi Ahn.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
Virtual Memory Pranav Shah CS147 - Sin Min Lee. Concept of Virtual Memory Purpose of Virtual Memory - to use hard disk as an extension of RAM. Personal.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 26 – Alternative Architectures.
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Computer Organization
Understanding Operating Systems Seventh Edition
Distributed Shared Memory
ITEC 202 Operating Systems
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
Swapping Segmented paging allows us to have non-contiguous allocations
Ivy Eva Wu.
Review.
Introduction to Operating Systems
CMSC 611: Advanced Computer Architecture
Page Replacement.
Outline Midterm results summary Distributed file systems – continued
Multiprocessors - Flynn’s taxonomy (1966)
Distributed Shared Memory
User-level Distributed Shared Memory
CSS490 Distributed Shared Memory
Lecture 25: Multiprocessors
High Performance Computing
The University of Adelaide, School of Computer Science
Cache coherence CEG 4131 Computer Architecture III
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Distributed Resource Management: Distributed Shared Memory
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Presentation transcript:

Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University 박정호

Contents Introduction Shared Virtual Memory – Memory Mapping manager Memory Coherence Problem Solution to the memory coherence problem – Centralized Managed Algorithm – Distributed Managed Algorithm Experiments Conclusion Memory Coherence in Shared Virtual Memory System2

Introduction Shared virtual memory – Provides a virtual address space that is shared among all processors – Applications can use it just as they do a traditional virtual memory – Not only “pages” data between physical memories and disks Conventional virtual memory system – But also “pages” data between the physical memories of the individual processors Data can naturally migrate between processors on demand Memory coherence problem for a shared virtual memory – Main difficulty in building a shared virtual memory – This paper concentrate on the memory coherence problem Memory Coherence in Shared Virtual Memory System3

Shared Virtual Memory A single address space shared by a number of processors Address space is partitioned into pages Memory Coherence in Shared Virtual Memory System4 CPU 1 Memory 1 Mapping Manager Shared virtual memory CPU 2 Memory 2 Mapping Manager CPU 3 Memory 3 Mapping Manager CPU 4 Memory 4 Mapping Manager

Shared Virtual Memory (cont’d) Memory mapping managers – Mapping between local memories and the shared virtual memory – Keeping the address space coherent The value returned by a read operation is always the same as the value written by the most recent write operation to the same address – Views its local memory as a large cache Key goals – To allow processes of a program to execute on different processors in parallel Memory Coherence in Shared Virtual Memory System5

Memory Coherence Problem Memory coherent – The value returned by a read operation is always the same as the value written by the most recent write operation to the same address Two design choices for build a shared virtual memory system – Granularity of the memory units(“page sizes”) Sending large packets – is not much more expensive than small one – is the greater chance for contention than small one Right size is clearly application dependent – 1K bytes is suitable with respect to contention and communication overhead – Strategy for maintaining coherence Page synchronization and page ownership Memory Coherence in Shared Virtual Memory System6

Memory Coherence Problem (cont’d) Page synchronization – Invalidation Invalidates all copies of the page when write page fault occurs – Write-broadcast Writes to all copies of the page when write page fault occurs Page Ownership – Fixed A page is always owned by the same processor – Dynamic Centralized manager Distributed manager – Fixed – Dynamic Memory Coherence in Shared Virtual Memory System7

Memory Coherence Problem (cont’d) Memory Coherence in Shared Virtual Memory System8

Memory Coherence Problem (cont’d) Page Table – Access: indicates the accessibility to the page (nil/read/write) – Copy set: contains the processor number that have read copies of the page – Lock: synchronizes multiple page faults by different processes on the same processor and synchronizes remote page requests Invalidate – A primitive that invalidate copies of a page Memory Coherence in Shared Virtual Memory System9 invalidate ( p, copy_set ) for i in copy_set DO send an invalidation request to processor i;

Centralized Manager Algorithms The centralized manager – Resides on a single processor – Maintains a table called “info” per page Owner: owner that page(= the most recent processor to have write access to it) Copy set: lists all processors that have copies of the page Lock: used for synchronizing request to the page Each processors – Maintains a page table called “PTable” per page Access, Lock Memory Coherence in Shared Virtual Memory System10

Centralized Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System11

Centralized Manager Algorithms(cont’d) e.g. [Read page fault in P2] Memory Coherence in Shared Virtual Memory System12 P1, Centralized Manager P2 P3 Info owner copy set lock P3 { P1 } 0 PTable access lock0 PTable access lock R 0

Centralized Manager Algorithms(cont’d) e.g. [Read page fault in P2] Memory Coherence in Shared Virtual Memory System13 P1, Centralized Manager P2 P3 Info owner copy set lock P3 { P1 } 0 PTable access lock 1 PTable access lock R 0

Centralized Manager Algorithms(cont’d) e.g. [Read page fault in P2] Memory Coherence in Shared Virtual Memory System14 P1, Centralized Manager P2 P3 1. Request Info owner copy set lock P3 { P1, P2 } 1 PTable access lock1 PTable access lock R 0

e.g. [Read page fault in P2] lock1 Centralized Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System15 P1, Centralized Manager P2 P3 1. Request Info owner copy set P3 { P1, P2 } PTable access lock1 PTable access lock R 0

e.g. [Read page fault in P2] lock1 Centralized Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System16 P1, Centralized Manager P2 P3 1. Request 2. Ask to send copy to P2 Info owner copy set P3 { P1, P2 } PTable access lock1 PTable access lock R 1

e.g. [Read page fault in P2] lock1 Centralized Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System17 P1, Centralized Manager P2 P3 1. Request 2. Ask to send copy to P2 Info owner copy set P3 { P1, P2 } PTable access lock1 PTable access lock R 1

e.g. [Read page fault in P2] lock1 Centralized Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System18 P1, Centralized Manager P2 P3 1. Request 2. Ask to send copy to P2 Info owner copy set P3 { P1, P2 } PTable access lock1 PTable access lock R 0 3. Send copy 4. Confirmation

e.g. [Read page fault in P2] lock Centralized Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System19 P1, Centralized Manager P2 P3 1. Request 2. Ask to send copy to P2 Info owner copy set P3 { P1, P2 } PTable access lock 0 PTable access lock R 0 3. Send copy R 4. Confirmation 0

e.g. [Write page fault in P2] lock Centralized Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System20 P1, Centralized Manager P2 P3 Info owner copy set P3 { P1, P2 } PTable access lock0 PTable access lock R 0 R 0

e.g. [Write page fault in P2] lock Centralized Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System21 P1, Centralized Manager P2 P3 Info owner copy set P3 { P1, P2 } PTable access lock PTable access lock R 0 R 0 1. Write Request 1

e.g. [Write page fault in P2] lock Centralized Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System22 P1, Centralized Manager P2 P3 Info owner copy set { P1, P2 } PTable access lock1 PTable access lock R Write Request 2. Invalidate of copy set P2

e.g. [Write page fault in P2] lock Centralized Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System23 P1, Centralized Manager P2 P3 Info owner copy set P2 PTable access lock1 PTable access lock R Write Request 2. Invalidate of copy set { } 3. Ask to send page to P2

e.g. [Write page fault in P2] lock Centralized Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System24 P1, Centralized Manager P2 P3 Info owner copy set P2 PTable access lock1 PTable access lock Write Request 2. Invalidate of copy set { } 3. Ask to send page to P2 4. Send page to P2

e.g. [Write page fault in P2] lock Centralized Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System25 P1, Centralized Manager P2 P3 Info owner copy set PTable access lock 0 PTable access lock 0 W 1. Write Request { } 3. Ask to send page to P2 4. Send page to P2 5. confirmation 0 P2

Centralized Manager Algorithms(cont’d) An Improved Centralized Manager Algorithm – The synchronization of page ownership move to the individual owners Eliminating the confirmation message to the manager – The locking mechanism on each processor now deals not only with multiple local requests, but also with remote requests – But, for large N there still might be a bottleneck at the manager processor because it must respond to every page fault Memory Coherence in Shared Virtual Memory System26 Added

Distributed Manager Algorithms A Fixed Distributed Manager Algorithm – Every processor is given a predetermined subset of the pages to manager – e.g., H(p) = p mod N, N is # of processors H(p) = (p/s) mod N, s is # of pages per segment (e.g., s = 3) – Proceeds as in the centralized manager algorithm – The fixed distributed manager algorithm is superior to the centralized manager algorithms when a parallel program exhibits a high rate of page faults Memory Coherence in Shared Virtual Memory System27 P1P2P3P1P2P3P1P2P3P1P2P P1P2P1P2P3P1P2P3P1P P1

Distributed Manager Algorithms(cont’d) A Broadcast Distributed Manager Algorithm – Each processor managers precisely those pages that it owns – Faulting processor send broadcasts into the network to find the true owner of a page The Owner table is eliminated Information of ownership is stored in each processor’s PTable Memory Coherence in Shared Virtual Memory System28

Distributed Manager Algorithms(cont’d) e.g., Memory Coherence in Shared Virtual Memory System29 e.g. [Read page fault in P2] P1 P2 P3 PTable access lock copy set owner PTable access lock R copy set{ P1 } ownerP3 PTable access lock R copy set ownerP3

Distributed Manager Algorithms(cont’d) e.g., Memory Coherence in Shared Virtual Memory System30 e.g. [Read page fault in P2] P1 P2 P3 PTable access lock copy set owner PTable access lock R copy set{ P1 } ownerP3 PTable access lock R copy set ownerP3 1. Broadcast read request

Distributed Manager Algorithms(cont’d) e.g., Memory Coherence in Shared Virtual Memory System31 e.g. [Read page fault in P2] P1 P2 P3 PTable access lock copy set owner PTable access lock copy set ownerP3 PTable access lock R copy set ownerP3 1. Broadcast read request 2. Send copy of p { P1, P2 } R

Distributed Manager Algorithms(cont’d) A Dynamic Distributed Manager Algorithm – The heart of this algorithm is keeping track of the ownership of all pages in each processors’ local Ptable There is no fixed owner or manager Owner field -> probOwner field The information that it contains is just a hint When a processor has a page fault, it sends request to the processor indicated by the probOwner field – If that processor is the true owner, it proceeds as in the centralized manager algorithm – Else, forwards the request to the processor indicated by its probOwner field Memory Coherence in Shared Virtual Memory System32 P1P2P3Pn …… page fault true ownerprobable owner request forward

Distributed Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System33 e.g. [Read page fault in P2] e.g., e.g. [Read page fault in P2] P1 P2 P3 PTable access lock copy set probOwner P1 PTable access lock R copy set{ P1 } P3 PTable access lock R copy set{ P1 } P3 probOwner

Distributed Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System34 e.g. [Read page fault in P2] e.g., e.g. [Read page fault in P2] P1 P2 P3 PTable access lock copy set probOwner P1 PTable access lock R copy set{ P1 } P3 PTable access lock R copy set{ P1 } P3 probOwner 1. read request

Distributed Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System35 e.g. [Read page fault in P2] e.g., e.g. [Read page fault in P2] P1 P2 P3 PTable access lock copy set probOwner P1 PTable access lock R copy set{ P1 } P3 PTable access lock R copy set{ P1 } probOwner 1. read request 2. forward request P3

Distributed Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System36 e.g. [Read page fault in P2] e.g., P1 P2 P3 PTable access lock copy set { P1, P3 } probOwner P2 PTable access lock R copy set PTable access lock R copy set{ P1 } probOwner 1. read request 2. forward request 3. Send copy of p and copy set { P1, P3 } R P2

Distributed Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System37 e.g. [Write page fault in P1] P1 P2 P3 PTable access lock copy set { P1, P3 } probOwner P2 PTable access lock R copy set PTable access lock R copy set{ P1 } probOwner { P1, P3 } R P2

Distributed Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System38 e.g. [Write page fault in P1] P1 P2 P3 PTable access lock copy set { P1, P3 } probOwner P2 PTable access lock R copy set PTable access lock R copy set{ P1 } probOwner { P1, P3 } R P2 1. write request

Distributed Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System39 e.g. [Write page fault in P1] P1 P2 P3 PTable access lock copy set { P1, P3 } probOwner P2 PTable access lock R copy set PTable access lock R copy set{ P1 } probOwner { P1, P3 } P2 1. write request 2. Send p and copy set

Distributed Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System40 e.g. [Write page fault in P1] P1 P2 P3 PTable access lock copy set { } probOwner P1 PTable access lock copy set PTable access lock R copy set probOwner P1 P2 1. write request 2. Send p and copy set 3. Invalidate in copy set { }

Distributed Manager Algorithms(cont’d) Memory Coherence in Shared Virtual Memory System41 e.g. [Write page fault in P1] P1 P2 P3 PTable access lock copy set { } probOwner P1 PTable access lock copy set PTable access lock W copy set probOwner P1 1. write request 2. Send p and copy set 3. Invalidate in copy set { }

Distributed Manager Algorithms(cont’d) Distribution of Copy Set – Copy set data is distributed and stored as a tree – Improves system performance “Divide and conquer” effect – Balanced tree -> invalidation process takes log m for m read copies A read fault only needs to find a single processor that holds a copy of the page (not necessarily the owner) Memory Coherence in Shared Virtual Memory System42 P1 P2 P3P4 P5P6 P7 propagate Invalidation message owner

Experiments Experimental Setup – Implement a prototype shared virtual memory system called IVY (Integrated shared Virtual memory at Yale) Implemented on top of a modified Aegis operating system of Apollo DOMAIN computer system IVY can be used to run parallel programs on any number of processors on an Apollo ring network – Parallel Programs Parallel Jacobi program – Solving three dimensional PDEs(partial differential equations) Parallel sorting Parallel matrix multiplication Parallel dot-product Memory Coherence in Shared Virtual Memory System43

Experiments (cont’d) Speed up of 3D-PDE, where n = 50^3 – Super-linear speed up Data structure for the problem is greater than the size of physical memory on a single processor Memory Coherence in Shared Virtual Memory System44 1 processor Processor 1 with initialization Processor 2 without initialization Ideal Solution Experimental Result

Experiments (cont’d) Speed up of 3-D PDE, where n=40^3 Memory Coherence in Shared Virtual Memory System45 Ideal Solution Experimental Result

Experiments (cont’d) Speedup of the merge-split sort, where 200K elements – Ideal solution Even with no communication costs, merge-split sort does not yield linear speed up the costs of all memory references are the same Memory Coherence in Shared Virtual Memory System46 Ideal Solution Experimental Result

Experiments (cont’d) Speed up of parallel dot-product where each vector has 128K elements – Shared virtual memory system is not likely to provide speed up Ratio of the communication cost to the computation cost is large Memory Coherence in Shared Virtual Memory System47 Ideal Solution Two vector have a random distribution on each processor Two vector is in one processor

Experiments (cont’d) Speed up of matrix multiplication, where 128 x 128 square matrix Memory Coherence in Shared Virtual Memory System48 Ideal Solution Experimental Result

Experiments (cont’d) Compare to memory coherence algorithms – All algorithms have similar number of page faults – Number or requests of fixed distributed manager algorithm is similar to improved centralized manager algorithm Both algorithm need a request to locate the owner of the page for almost every page fault that occurs on a non-manager processor Memory Coherence in Shared Virtual Memory System49

Conclusion This paper studied two general classes of algorithms for solving the memory coherence problem – Centralized manager Straightforward and easy to implement Traffic bottleneck at the central manager – Distributed Manager Fixed dist. Manager – Alleviates the bottleneck – On average, still need to spend about two message to locate an owner Dynamic dist. Manager – The most desirable overall features This paper gives possibility of using a shared virtual memory system to construct a large-scale shared memory multiprocessor system Memory Coherence in Shared Virtual Memory System50

Q & A Thank you! Memory Coherence in Shared Virtual Memory System51