CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 25: Distributed Shared Memory All slides © IG.

Slides:



Advertisements
Similar presentations
1 Lecture 6: Directory Protocols Topics: directory-based cache coherence implementations (wrap-up of SGI Origin and Sequent NUMA case study)
Advertisements

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Distributed Shared Memory
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 4 -- Spring 2001.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Lightweight Logging For Lazy Release Consistent DSM Costa, et. al. CS /01/01.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
Distributed Resource Management: Distributed Shared Memory
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Computer System Overview Chapter 1. Basic computer structure CPU Memory memory bus I/O bus diskNet interface.
PRASHANTHI NARAYAN NETTEM.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
ECE669 L17: Memory Systems April 1, 2004 ECE 669 Parallel Computer Architecture Lecture 17 Memory Systems.
Multiprocessor Cache Coherency
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Distributed Shared Memory Systems and Programming
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Distributed Shared Memory (DSM)
2008 dce Distributed Shared Memory Pham Quoc Cuong & Phan Dinh Khoi Use some slides of James Deak - NJIT.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.
CS425/CSE424/ECE428 – Distributed Systems Nikita Borisov - UIUC1 Some material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra,
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Distributed Shared Memory Presentation by Deepthi Reddy.
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
CIS 720 Distributed Shared Memory. Shared Memory Shared memory programs are easier to write Multiprocessor systems Message passing systems: - no physically.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Lecture 28-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) December 2, 2010 Lecture 28 Distributed.
Virtual Memory Chapter 7.4.
CS161 – Design and Architecture of Computer
Distributed Shared Memory
Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013
Multiprocessor Cache Coherency
Ivy Eva Wu.
Introduction to Operating Systems
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Lecture 26 A: Distributed Shared Memory
Outline Midterm results summary Distributed file systems – continued
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Distributed Shared Memory
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Distributed Shared Memory
Translation Buffers (TLB’s)
Lecture 25: Multiprocessors
Translation Buffers (TLB’s)
Lecture 26 A: Distributed Shared Memory
Lecture 24: Multiprocessors
CSE 471 Autumn 1998 Virtual memory
Distributed Resource Management: Distributed Shared Memory
Lecture 9: Caching and Demand-Paged Virtual Memory
Translation Buffers (TLBs)
Review What are the advantages/disadvantages of pages versus segments?
COT 5611 Operating Systems Design Principles Spring 2014
Presentation transcript:

CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 25: Distributed Shared Memory All slides © IG

Message passing network So Far … Process send message receive message

Processes could share memory pages instead? Makes it convenient to write programs Reuse programs But what if … Process write to page 5 read page 5 Page 0 Page 1 Page 2 … Page N-1

Distributed Shared Memory = processes virtually share pages How do you implement DSM over a message- passing network? Distributed Shared Memory Process write to page 5 read page 5

1.Message-passing can be implemented over DSM! –Use a common page as buffer to read/write messages 2.DSM can be implemented over a message-passing network! In fact … Process write to page 5 read page 5

Cache maintained at each process –Cache stores pages accessed recently by that process Read/write first goes to cache DSM over Message-Passing Network Process Cache Process Cache Process Cache Process Cache

Pages can be mapped in local memory When page is present in memory, page hit Otherwise, page fault (kernel trap) occurs –Kernel trap handler: invokes the DSM software –May contact other processes in DSM group, via multicast DSM over Message-Passing Network (2)

Owner = Process with latest version of page Each page is in either R or W state When page in R state, owner has an R copy, but other processes may also have R copies –but no W copies exist When page is in W state, only owner has a copy DSM: Invalidate Protocol

Process 1 is owner (O) and has page in R state Read from cache. No messages sent. Process 1 Attempting a Read: Scenario 1 Process 1 page (R)(O) Process 2 Process 3 Process 4

Process 1 is owner (O) and has page in W state Read from cache. No messages sent. Process 1 Attempting a Read: Scenario 2 Process 1 page (W)(O) Process 2 Process 3 Process 4

Process 1 is owner (O) and has page in R state Other processes also have page in R state Read from cache. No messages sent. Process 1 Attempting a Read: Scenario 3 Process 1 page (R)(O) Process 2 Process 3 page (R) Process 4 page (R)

Process 1 has page in R state Other processes also have page in R state, and someone else is owner Read from cache. No messages sent. Process 1 Attempting a Read: Scenario 4 Process 1 page (R) Process 2 Process 3 page (R) Process 4 page (R) (O)

Process 1 does not have page Other process(es) has/have page in (R) state Ask for a copy of page. Use multicast. Mark it as R Do Read Process 1 Attempting a Read: Scenario 5 Process 1 Process 2 Process 3 page (R) Process 4 page (R) (O)

Process 1 does not have page Other process(es) has/have page in (R) state Ask for a copy of page. Use multicast. Mark it as R Do Read End State: Read Scenario 5 Process 1 page (R) Process 2 Process 3 page (R) Process 4 page (R) (O)

Process 1 does not have page Another process has page in (W) state Ask other process to degrade its copy to (R). Locate process via multicast Get page; mark it as R Do Read Process 1 Attempting a Read: Scenario 6 Process 1 Process 2 Process 3 Process 4 page (W) (O)

End State: Read Scenario 6 Process 1 page (R) Process 2 Process 3 Process 4 page (R) (O) Process 1 does not have page Another process has page in (W) state Ask other process to degrade its copy to (R). Locate process via multicast Get page; mark it as R Do Read

Process 1 is owner (O) and has page in W state Write to cache. No messages sent. Process 1 Attempting a Write: Scenario 1 Process 1 page (W)(O) Process 2 Process 3 Process 4

Process 1 is owner (O) has page in R state Other processes may also have page in R state Ask other processes to invalidate their copies of page. Use multicast. Mark page as (W). Do write. Process 1 Attempting a Write: Scenario 2 Process 1 page (R)(O) Process 2 Process 3 page (R) Process 4 page (R)

Process 1 is owner (O) has page in R state Other processes may also have page in R state Ask other processes to invalidate their copies of page. Use multicast. Mark page as (W). Do write. End State: Write Scenario 2 Process 1 page (W)(O) Process 2 Process 3 page (R) Process 4 page (R)

Process 1 has page in R state Other processes may also have page in R state, and someone else is owner Ask other processes to invalidate their copies of page. Use multicast. Mark page as (W), become owner Do write Process 1 Attempting a Write: Scenario 3 Process 1 page (R) Process 2 Process 3 page (R) Process 4 page (R) (O)

Process 1 has page in R state Other processes may also have page in R state, and someone else is owner Ask other processes to invalidate their copies of page. Use multicast. Mark page as (W), become owner Do write End State: Write Scenario 3 Process 1 page (W) (O) Process 2 Process 3 page (R) Process 4 page (R) (O)

Process 1 does not have page Other process(es) has/have page in (R) or (W) state Ask other processes to invalidate their copies of the page. Use multicast. Fetch all copies; use the latest copy; mark it as (W); become owner Do Write Process 1 Attempting a Write: Scenario 4 Process 1 Process 2 Process 3 page (R) Process 4 page (R) (O)

Process 1 does not have page Other process(es) has/have page in (R) or (W) state Ask other processes to invalidate their copies of the page. Use multicast. Fetch all copies; use the latest copy; mark it as (W); become owner Do Write End State: Write Scenario 4 Process 1 page (W) (O) Process 2 Process 3 page (R) Process 4 page (R) (O)

That was the invalidate approach If two processes write same page concurrently –Flip-flopping behavior where one process invalidates the other –Lots of network transfer –Can happen when unrelated variables fall on same page –Called false sharing Need to set page size to capture a process’ locality of interest If page size much larger, then have false sharing If page size much smaller, then too many page transfers => also inefficient Invalidate Downsides

Instead: could use Update approach –Multiple processes allowed to have page in W state –On a write to a page, multicast newly written value (or part of page) to all other holders of that page –Other processes can then continue reading and writing page Update preferable over Invalidate –When lots of sharing among processes –Writes are to small variables –Page sizes large Generally though, Invalidate better and preferred option An Alternative Approach: Update

Whenever multiple processes share data, consistency comes into picture DSM systems can be implemented with: –Linearizability –Sequential Consistency –Causal Consistency –Pipelined RAM (FIFO) Consistency –Eventual Consistency –(Also other models like Release consistency) –These should be familiar to you from the course! As one goes down this order, speed increases while consistency gets weaker Consistency

DSM was very popular over a decade ago But may be making a comeback now –Faster networks like Infiniband + SSDs => Remote Direct Memory Access (RDMA) becoming popular –Will this grow? Or stay the same as it is right now? –Time will tell! Is it Alive?

DSM = Distributed Shared Memory –Processes share pages, rather than sending/receiving messages –Useful abstraction: allows processes to use same code as if they were all running over the same OS (multiprocessor OS) DSM can be implemented over a message-passing interface Invalidate vs. Update protocols Summary