Distributed System UNIT-III.

Distributed System UNIT-III

The Byzantine General Problem
Hi everyone, the paper byzantine general problem I am going present today is done by Leslie lamport, robert shostak, marshall pease from SRI international.

Once upon a time... Some of them may be traitors who will try to confuse the others Communicating only by messenger The Background of this problem is as this. once upon a time, Byzantine troops are going to attack a village of Gauls. A group of generals of the Byzantine army camped with their troops around an enemy city. They communicate with each other only by messenger. They must agree upon a common battle plan, but some of them may be traitors who will try to confuse the others Generals must agree upon a common battle plan The pictures are taken from: R. Goscinny and A. Uderzo, Asterix and Latraviata.

Byzantine Generals Problem & Impossible Results
Find an algorithm To ensure that the loyal generals will reach agreement A small number of traitors cannot cause the loyal generals to adopt a bad plan Remodeled as a commanding general sending an order to his lieutenants IC1: All loyal generals get same result IC2: If commander is loyal, all loyal generals follow his choice No solution will work unless there are more than 2/3 loyal ones And here comes our task. We want to find an algorithm to ensure that the loyal generals will reach agreement. At the same time, we hope that a small number of traitors will not cause the loyal generals to adopt a bad plan. For easy description, the Byzantine generals problems can be modeled as this: we are required to satisfy the two so called ‘interactive consistency’ conditions IC one and IC two. The first one is, all loyal generals will get the same result; the second one is, if the commander is loyal, then all loyal generals will follow his choice.

Example: Poor Lieutenant 1’s Dilemma
Attack He said retreat Lieutenant 2 (Traitor) Commander Commander (Traitor) Attack Retreat IC1 violated ! Let’s look at an example with only 3 generals. The upper picture shows a possible situation where one of the lieutenants, lieutenant 2 is traitor. The commander sends an attack message to both lieutenants, however, since lieutenant 2 is traitor, he told lieutenant 1 that, the commander said retreat. So, for lieutenant 1, he gets two different messages from the commander and the lieutenant now. Which one should he trust? Is there a good strategy for him? Let’s consider another case shown in the right figure. Here the commander is a traitor while lieutenant 2 is loyal. The commander send attack to lieutenant 1 while retreat to lieutenant 2. Lieutenant 2 is loyal, so he told lieutenant 1 that the commander said retreat. Notice that, for lieutenant 1, these two situations are exactly the same, he has no information to distinguish these two. Thus, whatever strategy he uses, he must get the same decision under these two cases. He cannot tell who is the traitor, so In order to satisfy the IC2, he must obey the commander’s order, whatever the other lieutenant said. For Lieutenant 2, it is similar. So Lieutenant 1 will attack but lieutenant 2 will retreat. Then IC1 is violated! The two situations are identical to me! He said retreat Lieutenant 1 Lieutenant 2 Attack Retreat

Solutions Solution 1: Using Oral Messages
Solution 2: Using Signed Messages

Solution using Oral Message
Solution for more than 3m+1 generals with m traitors Oral messages: Every message that is sent is delivered correctly The receiver of a message knows who sent it The absence of a message can be detected Function 'majority': With the property that if a majority of the values vi equals v, then majority(v1,...,vn-1) equals v. Order set Vi Each lieutenant uses it to store orders from others Algorithm OM(m) can deal with m traitors Defined recursively Although we have no idea to deal with traitors more than 1/3 of the generals, however, using oral messages, we can give a solution to the byzantine general problem for more than 3m+1 generals with m traitors. What is an oral message? First, every message that is sent is delivered correctly; second, the receiver of a message knows who sent it; and third, the absence of a message can be detected. In other words, the contents of an oral message are fully controlled by the sender, and so a traitor can send any kind of message. In order to make a decision we also need the majority function. The majority funcition, as the name implies, is function that applies majority voting. It has the property that if……. With these tools, we have an algorithm OM(m) that can deal with m traitors. This algorithm is defined recursively. Now we are trying to describe it.

Base case: OM(0) Commander 0 Commander sends messages to Lieutenants
Each Lieutenant receives and records it. attack attack attack Lieutenant i Lieutenant j Lieutenant k First let’s look at the base case. In the base case, the commander sends message to each Lieutenant. Here we illustrate the situation that the commander is loyal. Remember that the commander need not to be loyal. For each lieutenant, what he need to do is to add the order into his order set. As shown in the figure. Vi ={v0:attack} Vi ={v0:attack} Vi ={v0:attack}

OM(m) Commander Each Lieutenant act as the commander in OM(m-1)
Send messages to ‘his’ Lieutenants Do this recursively attack attack attack attack attack …… Lieutenant i Lieutenant j Lieutenant k What happens in the general case? For each Lieutenant, when he gets a value from the commander, he sends it to all the other lieutenants. This action is repeated recursively. That is, the lieutenant i receives an order from the commander, records it, and then, in this round he will play as the commander to the other n-2 lieutenants. And the same thing happens to lieuntenant i+1, which will play the commander role in the execution of OM(m-2). Until we reach the base case, the next round begins. attack

Step 3: Majority Vote Commander
For any m, Algorithm OM(m) satisfies conditions IC1 and IC2 if there are more than 3m generals and at most m traitors My decision is: majority(v1,v2,…,v_n-1) Me too Me too …… Lieutenant 1 Lieutenant 2 Lieutenant n-1 After all the Lieutenants have sent their values and got others’ value, they can make a decision using majority vote based on their information vector. Since there are

OM(1): Lieutenant 3 is a traitor
Commander IC1 achieved IC2 achieved Attack Attack Attack Majority(attack,attack,attack) =attack Majority(attack,attack,retreat) =attack Attack Attack Retreat Attack Lieutenant 1 Lieutenant 2 Lieutenant 3 (Traitor) This figure illustrates the messages received by lieutenant 2 when the commander sends the value v and lieutenant 3 is a traitor. In the first step, Attack Attack

OM(1): Commander is a traitor
Commander (Traitor) IC1 achieved IC2 need not be satisfied Retreat Attack Retreat Majority(attack,retreat,retreat) =retreat Majority(attack,retreat,retreat) =retreat Majority(attack,retreat,retreat) =retreat Attack Retreat Retreat Retreat Lieutenant 1 Lieutenant 2 Lieutenant 3 Attack Retreat

Solution with Signed Messages
What is a signed message? A loyal general's signature cannot be forged, and any alteration of the contents of his signed messages can be detected Anyone can verify the authenticity of a general's signature Function choice(V): decision making If the set V consists of the single element v, then choice(V)=v Note: no other characteristics needed for choice(V) Are there any better answers? Well, if we enable the use of signed messages, the situation will be better. Before describing the algorithm, we must know what a signed message means. Each signed message has at least the sender’s signatures on it, and a loyal general’s signature cannot be forged, and any alteration of the contents of the signed messages can be detected. Also, anyone can verify the authenticity of a general’s signature. We also need a function ‘choice’ to make a decision based on the input information. The function only needs to satisfy two conditions: first, if the set V consists of the single element v, then choice V equals v. Second, choice emptyset equals retreat. No other characteristics are needed.

Step 1 Commander sends message to each Lieutenant
For any Lieutenant i, if he receives the v:0 message and he has not received any order yet Let Vi={v} Send v:0:i to other lieutenants Commander (Traitor) retreat:0 attack:0 attack:0 Vj={attack} Vj={attack,attack} attack:0:i Lieutenant j With these specifications, we can solve the problem better. In the first step, the commander sends his signed message to all the lieutenants. For any Lieutenant i, if he receives the v colon zero message, and he has not received any order yet, then he will let vi equals set of v, and send message v colon zero colon I to other lieutenants attack:0:i Lieutenant k Lieutenant i Vk={retreat} Vk={retreat,attack} Vi={attack}

Step 2 If Lieutenant i receives a message of v:0:j1:…:jk, and v is NOT in set Vi, then Add v to Vi If k<m, send v:0:j1:…:jk:i to every lieutenant except j1,…,jk When any Lieutenant i will receive no more messages Make decision using choice(Vi) Commander (Traitor) retreat:0 Attack:0 Attack:0 Vj={attack,attack,retreat} They get the same order set! Vi=Vj=Vk Lieutenant j With these specifications, we can solve the problem better. In the first step, the commander sends his signed message to all the lieutenants. For any Lieutenant i, if he receives the v colon zero message, and he has not received any order yet, then he will let vi equals set of v, and send message v colon zero colon I to other lieutenants Lieutenant k Vk={attack,attack,retreat} Lieutenant i Vi={attack,attack,retreat}

Example The traitor can not cheat now!
Commander (Traitor) For any m, Algoritym SM(m) solves the Byzantine Generals Problem if there are at most m traitors. The traitor can not cheat now! Retreat:0 Attack:0 They get same information, thus same decision Retreat:0:2 Attack:0:1 Lieutenant 1 Lieutenant 2 V1 = {Attack,Retreat} V2 = {Attack,Retreat}

Conclusion The requirements (Interactive Consistency Condition)
IC1: All loyal generals get same result IC2: If commander is loyal, all loyal generals follow his choice Theorems to remember: 1. For any m, Algorithm OM(m) satisfies conditions IC1 and IC2 if there are more than 3m generals and at most m traitors 2. For any m, Algorithm SM(m) solves the Byzantine Generals Problem if there are at most m traitors.

Discussions These solutions are not used in practice
Why? What if the messages get lost a lot during communication? Are there any other way besides ‘majority’ and ‘same information’?

Naïve solution ith general sends v(i) to all other generals
To deal with two requirements: All generals combine their information v(1), v(2), .., v(n) in the same way Majority (v(1), v(2), …, v(n)), ignore minority traitors Naïve solution does not work: Traitors may send different values to different generals. Loyal generals might get conflicting values from traitors Requirement: Any two loyal generals must use the same value of v(i) to decide on same plan of action.

Reduction of General Problem
Insight: We can restrict ourselves to the problem of one general sending its order to others. Byzantine Generals Problem (BGP): A commanding general (commander) must send an order to his n-1 lieutenants. Interactive Consistency Conditions: IC1: All loyal lieutenants obey the same order. IC2: If the commanding general is loyal, then every loyal lieutenant obeys the order he sends. Note: If General is loyal, IC2 => IC1. Original problem: each general sends his value v(i) by using the above solution, with other generals acting as lieutenants.

3-General Impossibly Example
3 generals, 1 traitor among them. Two messages: Attack or Retreat Shaded – Traitor L1 sees (A,R). Who is the traitor? C or L2? Fig 1: L1 has to attack to satisfy IC2. Fig 2: L1 attacks, L2 retreats. IC1 violated.

General Impossibility
In general, no solutions with fewer than 3m+1 generals can cope with m traitors. Proof by contradiction. Assume there is a solution for 3m Albanians with m traitors. Reduce to 3-General problem. - Solution to 3m problem => Solution to 3-General problem!!

Solution I – Oral Messages
If there are 3m+1 generals, solution allows up to m traitors. Oral messages – the sending of content is entirely under the control of sender. Assumptions on oral messages: A1 – Each message that is sent is delivered correctly. A2 – The receiver of a message knows who sent it. A3 – The absence of a message can be detected. Assures: Traitors cannot interfere with communication as third party. Traitors cannot send fake messages Traitors cannot interfere by being silent. Default order to “retreat” for silent traitor.

Oral Messages (Cont) Algorithm OM(0) Algorithm OM(m), m>0
Commander send his value to every lieutenant. Each lieutenant (L) use the value received from commander, or RETREAT if no value is received. Algorithm OM(m), m>0 Commander sends his value to every Lieutenant (vi) Each Lieutenant acts as commander for OM(m-1) and sends vi to the other n-2 lieutenants (or RETREAT) For each i, and each j<>i, let vj be the value lieutenant i receives from lieutenant j in step (2) using OM(m-1). Lieutenant i uses the value majority (v1, …, vn-1). Why j<>i? “Trust myself more than what others said I said.”

Restate Algorithm OM(M): Revisit Interactive Consistency goals:
Commander sends out command. Each lieutenant acts as commander in OM(m-1). Sends out command to other lieutenants. Use majority to compute value based on commands received by other lieutenants in OM(m-1) Revisit Interactive Consistency goals: IC1: All loyal lieutenants obey the same command. IC2: If the commanding general is loyal, then every loyal lieutenant obeys the command he sends.

Example (n=4, m=1) Algorithm OM(1): L3 is a traitor.
L1 and L2 both receive v,v,x. (IC1 is met.) IC2 is met because L1 and L2 obeys C

Example (n=4, m=1) Algorithm OM(1): Commander is a traitor.
All lieutenants receive x,y,z. (IC1 is met). IC2 is irrelevant since commander is a traitor.

Expensive Communication
OM(m) invokes n-1 OM(m-1) OM(m-1) invokes n-2 OM(m-2) OM(m-2) invokes n-3 OM(m-3) … OM(m-k) will be called (n-1)…(n-k) times O(nm) – Expensive!

Distributed File Systems

Introduction File systems are responsible for the organization, storage, retrieval, naming, sharing and protection of files. Files contain both data and attributes. A typical attribute record structure is illustrated in Figure .

Figure . File attribute record structure
Introduction File length Creation timestamp Read timestamp Write timestamp Attribute timestamp Reference count Owner File type Access control list Figure . File attribute record structure

Introduction Distributed file systems support the sharing of information in the form of files and hardware resources. With the advent of distributed object systems (CORBA, Java) and the web, the picture has become more complex.

Definition of a DFS DFS: multiple users, multiple sites, and (possibly) distributed storage of files. Benefits File sharing Uniform view of system from different clients Centralized administration Goals of a distributed file system Network Transparency (access transparency) Availability

Goals Network (Access)Transparency Users should be able to access files over a network as easily as if the files were stored locally. Users should not have to know the physical location of a file to access it. Transparency can be addressed through naming and file mounting mechanisms

Components of Access Transparency
Location Transparency: file name doesn’t specify physical location Location Independence: files can be moved to new physical location, no need to change references to them. (A name is independent of its addresses Location independence → location transparency, but the reverse is not necessarily true.

Goals Availability: files should be easily and quickly accessible.
The number of users, system failures, or other consequences of distribution shouldn’t compromise the availability. Addressed mainly through replication.

Introduction Distributed File system requirements
Related requirements in distributed file systems are: Transparency Concurrency Replication Heterogeneity Fault tolerance Consistency Security Efficiency

Architectures Client-Server Symmetric
Traditional; e.g. Sun Microsystem Network File System (NFS) Cluster-Based Client-Server; e.g., Google File System (GFS) Symmetric Fully decentralized; based on peer-to-peer technology e.g., Ivy (uses a Chord DHT approach)

Client-Server Architecture
One or more machines (file servers) manage the file system. Files are stored on disks at the servers Requests for file operations are made from clients to the servers. Client-server systems centralize storage and management; P2P systems decentralize it.

Communication Network
cache cache client client Communication Network cache cache Server Disks Server Server Architecture of a distributed file system: client-server model

Sun’s Network File System
Sun’s NFS for many years was the most widely used distributed file system. NFSv3: version three, used for many years NFSv4: introduced in 2003 Version 4 made significant changes

Overview NFS goals: Each file server presents a standard view of its local file system transparent access to remote files compatibility with multiple operating systems and platforms. easy crash recovery at server (at least v1-v3) Originally UNIX based; now available for most operating systems. NFS communication protocols lets processes running in different environments share a file system.

Access Models Clients access the server transparently through an interface similar to the local file system interface Client-side caching may be used to save time and network traffic Server defines and performs all file operations

Distributed File Systems Services
Services provided by the distributed file system: (1) Name Server: Provides mapping (name resolution) the names supplied by clients into objects (files and directories) Takes place when process attempts to access file or directory the first time. (2) Cache manager: Improves performance through file caching Caching at the client - When client references file at server: Copy of data brought from server to client machine Subsequent accesses done locally at the client Caching at the server: File saved in memory to reduce subsequent access time * Issue: different cached copies can become inconsistent. Cache managers (at server and clients) have to provide coordination.

Typical Data Access in a Client/File Server Architecture

Mechanisms used in distributed file systems
Mounting The mount mechanism binds together several filename spaces (collection of files and directories) into a single hierarchically structured name space (Example: UNIX and its derivatives) A name space ‘A’ can be mounted (bounded) at an internal node (mount point) of a name space ‘B’ Implementation: kernel maintains the mount table, mapping mount points to storage devices

Mechanisms used in distributed file systems (cont.)
(1) Mounting (cont.) Location of mount information a. Mount information maintained at clients Each client mounts every file system Different clients may not see the same filename space If files move to another server, every client needs to update its mount table Example: SUN NFS b. Mount information maintained at servers Every client see the same filename space If files move to another server, mount info at server only needs to change Example: Sprite File System

Mechanisms used in distributed file systems (cont.)
(2) Caching Improves file system performance by exploiting the locality of reference When client references a remote file, the file is cached in the main memory of the server (server cache) and at the client (client cache) When multiple clients modify shared (cached) data, cache consistency becomes a problem It is very difficult to implement a solution that guarantees consistency (3) Hints Treat the cached data as hints, i.e. cached data may not be completely accurate Can be used by applications that can discover that the cached data is invalid and can recover Example: After the name of a file is mapped to an address, that address is stored as a hint in the cache If the address later fails, it is purged from the cache The name server is consulted to provide the actual location of the file and the cache is updated

Mechanism used in distributed file systems (cont.)
(4) Bulk data transfer Observations: Overhead introduced by protocols does not depend on the amount of data transferred in one transaction Most files are accessed in their entirety Common practice: when client requests one block of data, multiple consecutive blocks are transferred (5) Encryption Encryption is needed to provide security in distributed systems Entities that need to communicate send request to authentication server Authentication server provides key for conversation

Design Issues 1. Naming and name resolution Terminology
Name: each object in a file system (file, directory) has a unique name Name resolution: mapping a name to an object or multiple objects (replication) Name space: collection of names with or without same resolution mechanism Approaches to naming files in a distributed system (a) Concatenate name of host to names of files on that host Advantage: unique filenames, simple resolution Disadvantages: Conflicts with network transparency Moving file to another host requires changing its name and the applications using it (b) Mount remote directories onto local directories Requires that host of remote directory is known After mounting, files referenced location-transparent (I.e., file name does not reveal its location) (c) Have a single global directory All files belong to a single name space Limitation: having unique system wide filenames require a single computing facility or cooperating facilities

Design Issues (cont.) 1. Naming and Name Resolution (cont.) Contexts
Solve the problem of system-wide unique names, by partitioning a name space into contexts (geographical, organizational, etc.) Name resolution is done within that context Interpretation may lead to another context File Name = Context + Name local to context Nameserver Process that maps file names to objects (files, directories) Implementation options Single name Server Simple implementation, reliability and performance issues Several Name Servers (on different hosts) Each server responsible for a domain Example: Client requests access to file ‘A/B/C’ Local name server looks up a table (in kernel) Local name server points to a remote server for ‘/B/C’ mapping

Design Issues (Cont.) 3. Writing policy
Question: once a client writes into a file (and the local cache), when should the modified cache be sent to the server? Options: Write-through: all writes at the clients, immediately transferred to the servers Advantage: reliability Disadvantage: performance, it does not take advantage of the cache Delayed writing: delay transfer to servers Advantages: Many writes take place (including intermediate results) before a transfer Some data may be deleted Disadvantage: reliability Delayed writing until file is closed at client For short open intervals, same as delayed writing For long intervals, reliability problems

Design Issues (Cont.) 4. Availability
Issue: what is the level of availability of files in a distributed file system? Resolution: use replication to increase availability, i.e. many copies (replicas) of files are maintained at different sites/servers Replication issues: How to keep replicas consistent How to detect inconsistency among replicas Unit of replication File Group of files a) Volume: group of all files of a user or group or all files in a server Advantage: ease of implementation Disadvantage: wasteful, user may need only a subset replicated b) Primary pack vs. pack Primary pack:all files of a user Pack: subset of primary pack. Can receive a different degree of replication for each pack

Design Issues (Cont.) 5. Scalability 6. Semantics
Issue: can the design support a growing system? Example: server-initiated cache invalidation complexity and load grow with size of system. Possible solutions: Do not provide cache invalidation service for read-only files Provide design to allow users to share cached data Design file servers for scalability: threads, SMPs, clusters 6. Semantics Expected semantics: a read will return data stored by the latest write Possible options: All read and writes go through the server Disadvantage: communication overhead Use of lock mechanism Disadvantage: file not always available

Case Studies: The Sun Network File System (NSF)
Developed by Sun Microsystems to provide a distributed file system independent of the hardware and operating system Architecture Virtual File System (VFS): File system interface that allows NSF to support different file systems Requests for operation on remote files are routed by VFS to NFS Requests are sent to the VFS on the remote using The remote procedure call (RPC), and The external data representation (XDR) VFS on the remote server initiates files system operation locally Vnode (Virtual Node): There is a network-wide vnode for every object in the file system (file or directory)- equivalent of UNIX inode vnode has a mount table, allowing any node to be a mount node

Cluster-based or Clustered File System
A distributed file system that consists of several servers that share the responsibilities of the system, as opposed to a single server (possibly replicated). The design decisions for a cluster-based systems are mostly related to how the data is distributed across the cluster and how it is managed.

Cluster-Based DFS Some cluster-based systems organize the clusters in an application specific manner For file systems used primarily for parallel applications, the data in a file might be striped across several servers so it can be read in parallel. Or, it might make more sense to partition the file system itself – some portion of the total number of files are stored on each server. For systems that process huge numbers of requests; e.g., large data centers, reliability and management issues take precedence. e.g., Google File System

Google File System (GFS)
GFS uses a cluster-based approach implemented on ordinary commodity Linux boxes (not high-end servers). Servers fail on a regular basis, just because there are so many of them, so the system is designed to be fault tolerant. There are a number of replicated clusters that map to DNS servers map requests to the clusters in a round-robin fashion, as a load-balancing mechanism; locality is also considered.

Scalability in GFS Clients only contact the master to get metadata, so it isn’t a bottleneck. Updates are performed by having a client update the nearest server which pushes the updates to one of the backups, which in turn sends it on to the next and so on. Updates aren’t committed until all replicas are complete. Information for mapping file names to contact addresses is efficiently organized & stored (mostly) in the master’s memory. Access time is optimized due to infrequent disk accesses.

Distributed Resource Management: Distributed Shared Memory

Distributed shared memory (DSM)
What The distributed shared memory (DSM) implements the shared memory model in distributed systems, which have no physical shared memory The shared memory model provides a virtual address space shared between all nodes The overcome the high cost of communication in distributed systems, DSM systems move data to the location of access How: Data moves between main memory and secondary memory (within a node) and between main memories of different nodes Each data object is owned by a node Initial owner is the node that created object Ownership can change as object moves from node to node When a process accesses data in the shared address space, the mapping manager maps shared memory address to physical memory (local or remote)

Distributed shared memory (Cont.)
NODE 1 NODE 2 NODE 3 Memory Memory Memory Mapping Manager Mapping Manager Mapping Manager Shared Memory

Advantages of distributed shared memory (DSM)
Data sharing is implicit, hiding data movement (as opposed to ‘Send’/‘Receive’ in message passing model) Passing data structures containing pointers is easier (in message passing model data moves between different address spaces) Moving entire object to user takes advantage of locality difference Less expensive to build than tightly coupled multiprocessor system: off-the-shelf hardware, no expensive interface to shared physical memory Very large total physical memory for all nodes: Large programs can run more efficiently No serial access to common bus for shared physical memory like in multiprocessor systems Programs written for shared memory multiprocessors can be run on DSM systems with minimum changes

Algorithms for implementing DSM
Issues How to keep track of the location of remote data How to minimize communication overhead when accessing remote data How to access concurrently remote data at several nodes 1. The Central Server Algorithm Central server maintains all shared data Read request: returns data item Write request: updates data and returns acknowledgement message Implementation A timeout is used to resend a request if acknowledgment fails Associated sequence numbers can be used to detect duplicate write requests If an application’s request to access shared data fails repeatedly, a failure condition is sent to the application Issues: performance and reliability Possible solutions Partition shared data between several servers Use a mapping function to distribute/locate data

Algorithms for implementing DSM (cont.)
2. The Migration Algorithm Operation Ship (migrate) entire data object (page, block) containing data item to requesting location Allow only one node to access a shared data at a time Advantages Takes advantage of the locality of reference DSM can be integrated with VM at each node Make DSM page multiple of VM page size A locally held shared memory can be mapped into the VM page address space If page not local, fault-handler migrates page and removes it from address space at remote node To locate a remote data object: Use a location server Maintain hints at each node Broadcast query Issues Only one node can access a data object at a time Thrashing can occur: to minimize it, set minimum time data object resides at a node

3. The Read-Replication Algorithm Replicates data objects to multiple nodes DSM keeps track of location of data objects Multiple nodes can have read access or one node write access (multiple readers-one writer protocol) After a write, all copies are invalidated or updated DSM has to keep track of locations of all copies of data objects. Examples of implementations: IVY: owner node of data object knows all nodes that have copies PLUS: distributed linked-list tracks all nodes that have copies Advantage The read-replication can lead to substantial performance improvements if the ratio of reads to writes is large

4. The Full–Replication Algorithm Extension of read-replication algorithm: multiple nodes can read and multiple nodes can write (multiple-readers, multiple-writers protocol) Issue: consistency of data for multiple writers Solution: use of gap-free sequencer All writes sent to sequencer Sequencer assigns sequence number and sends write request to all sites that have copies Each node performs writes according to sequence numbers A gap in sequence numbers indicates a missing write request: node asks for retransmission of missing write requests

Memory coherence DSM are based on
Replicated shared data objects Concurrent access of data objects at many nodes Coherent memory: when value returned by read operation is the expected value (e.g., value of most recent write) Mechanism that control/synchronizes accesses is needed to maintain memory coherence Sequential consistency: A system is sequentially consistent if The result of any execution of operations of all processors is the same as if they were executed in sequential order, and The operations of each processor appear in this sequence in the order specified by its program General consistency: All copies of a memory location (replicas) eventually contain same data when all writes issued by every processor have completed

Memory coherence (Cont.)
Processor consistency: Operations issued by a processor are performed in the order they are issued Operations issued by several processors may not be performed in the same order (e.g. simultaneous reads of same location by different processors may yields different results) Weak consistency: Memory is consistent only (immediately) after a synchronization operation A regular data access can be performed only after all previous synchronization accesses have completed Release consistency: Further relaxation of weak consistency Synchronization operations must be consistent which each other only within a processor Synchronization operations: Acquire (i.e. lock), Release (i.e. unlock) Sequence: Acquire Regular access Release

Coherence Protocols Issues 1. Write-invalidate protocol
How do we ensure that all replicas have the same information How do we ensure that nodes do not access stale data 1. Write-invalidate protocol A write to shared data invalidates all copies except one before write executes Invalidated copies are no longer accessible Advantage: good performance for Many updates between reads Per node locality of reference Disadvantage Invalidations sent to all nodes that have copies Inefficient if many nodes access same object Examples: most DSM systems: IVY, Clouds, Dash, Memnet, Mermaid, and Mirage 2. Write-update protocol A write to shared data causes all copies to be updated (new value sent, instead of validation) More difficult to implement

Design issues Granularity: size of shared memory unit
If DSM page size is a multiple of the local virtual memory (VM) management page size (supported by hardware), then DSM can be integrated with VM, i.e. use the VM page handling Advantages vs. disadvantages of using a large page size: (+) Exploit locality of reference (+) Less overhead in page transport (-) More contention for page by many processes Advantages vs. disadvantages of using a small page size (+) Less contention (+) Less false sharing (page contains two items, not shared but needed by two processes) (-) More page traffic Examples PLUS: page size 4 Kbytes, unit of memory access is 32-bit word Clouds, Munin: object is unit of shared data structure

Design issues (cont.) Page replacement
Replacement algorithm (e.g. LRU) must take into account page access modes: shared, private, read-only, writable Example: LRU with access modes Private (local) pages to be replaced before shared ones Private pages swapped to disk Shared pages sent over network to owner Read-only pages may be discarded (owners have a copy)

Distributed System UNIT-III.

Similar presentations

Presentation on theme: "Distributed System UNIT-III."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed System UNIT-III.

Similar presentations

Presentation on theme: "Distributed System UNIT-III."— Presentation transcript:

Similar presentations

About project

Feedback