Naming in Distributed System

Slides:



Advertisements
Similar presentations
(Chapter 5) Deleting Objects
Advertisements

Rasool Jalili, OS2, Sem Naming Chapter 4. Rasool Jalili, OS2, Sem Advertisment!! Please inform the students to subscribe to the mailing.
Dr. Kalpakis CMSC621 Advanced Operating Systems Naming.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Naming (2) DISTRIBUTED.
Naming Computer Engineering Department Distributed Systems Course Asst. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2014.
Distributed Systems Principles and Paradigms Chapter 04 Naming.
Naming Chapter 4. Names, Addresses, and Identifiers Name: String (of bits/characters) that refers to an entity (e.g. process, file, device, …) Access.
Naming in Distributed System Presented by Faraz Rasheed & Uzair Ahmed RealTime & Multimedia Lab Kyung Hee University, Korea.
The implementation of a name space
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Chapter 4  Naming 1 Naming Chapter 4 Chapter 4  Naming 2 Why Naming?  Names are needed to o Identify entities o Share resources o Refer to locations,
Computer Science Lecture 9, page 1 CS677: Distributed OS Today: Naming Names are used to share resources, uniquely identify entities and refer to locations.
Naming Names in computer systems are used to share resources, to uniquely identify entities, to refer to locations and so on. An important issue with naming.
NamingCS-4513, D-Term Naming CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,
Distributed Systems CS Naming – Part II Lecture 6, Sep 26, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
NamingCS-4513, D-Term Naming CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,
Distributed Systems Naming Chapter 5.
Naming Names in computer systems are used to share resources, to uniquely identify entities, to refer to locations and so on. An important issue with naming.
DNS. Outline r Domain Name System r DNS Hierarchy r Resolution.
Naming Chapter 5. n Most of the lecture notes are based on slides by Prof. Jalal Y. Kawash at Univ. of Calgary n Some slides are from Brennen Reynolds.
Distributed Computing COEN 317 DC2: Naming, part 1.
ICS362 Distributed Systems Dr Ken Cosh Week 5. Review Communication – Fundamentals – Remote Procedure Calls (RPC) – Message Oriented Communication – Stream.
Computer Science Lecture 9, page 1 CS677: Distributed OS Today: Naming Names are used to share resources, uniquely identify entities and refer to locations.
5.1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Naming Chapter 4. Name Spaces (1) A general naming graph with a single root node.
Naming Chapter 4.
Distributed Computing COEN 317 DC2: Naming, part 1.
Naming CSCI 4780/6780.
Naming (1) Chapter 4. Chapter 4 topics What’s in a name? Approaches for naming schemes Directories and location services Distributed garbage collection.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
1. Outline  Introduction  Different Mechanisms Broadcasting Multicasting Forward Pointers Home-based approach Distributed Hash Tables Hierarchical approaches.
More Distributed Garbage Collection DC4 Reference Listing Distributed Mark and Sweep Tracing in Groups.
ADVANCED OPERATING SYSTEMS STRUCTURED NAMING BY KANNA KARRI.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Naming CSCI 6900/4900. Mounting Mounting – Merging different namespaces transparently File system example –Directory node of one namespace stores identifier.
Naming CSCI 6900/4900. Names & Naming System Names have unique importance –Resource sharing –Identifying entities –Location reference Name can be resolved.
Naming CSCI 4780/6780. Name Space Implementation Naming service – A service that lets users to add/delete and lookup names In large distributed systems.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Naming CSCI 6900/4900. Unreferenced Objects in Dist. Systems Objects no longer needed as nobody has a reference to them and hence will not use them Garbage.
Chapter 5 Naming (I) Speaker : Jyun-Yao Huang 1 Application and Practice of Distributed Systems.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Domain Name System: DNS To identify an entity, TCP/IP protocols use the IP address, which uniquely identifies the Connection of a host to the Internet.
Slide 1 Structured Naming. Slide 2 Given Credit Where It Is Due The following slides are borrowed from Dr. Katerina Goseva-Popstojanova at West Virginia.
CS 372 COMPUTER COMMUNICATION AND NETWORKS
Chapter 25 Domain Name System.
IMPLEMENTING NAME RESOLUTION USING DNS
Naming Chapter 4.
Naming A name in a distributed system is a string of bits or characters used to refer to an entity. To resolve name a naming system is needed.
Net 323 D: Networks Protocols
Subject Name: Computer Communication Networks Subject Code: 10EC71
5.3. Structured Naming Advanced Operating Systems Fall 2017
Lecture 7: Name and Directory Servers
Lecture 7: Name and Directory Servers
Naming (1) Chapter 4.
5.2 FLAT NAMING.
Outline Midterm results summary Distributed file systems – continued
Lecture 8: Name and Directory Servers
Distributed Systems CS
Distributed Systems CS
Distributed Systems CS
Distributed Systems CS
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Distributed Systems CS
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Computer Networks Primary, Secondary and Root Servers
Presentation transcript:

Naming in Distributed System G.Ramesh Babu

Contents Naming Entities Name Resolution Implementation of Name Space Names, Identifiers and Address Name Spaces Name Resolution Closure Mechanism Linking and Mounting Implementation of Name Space Implementation of Resolution Conclusion

Why naming is important? Names are used to Share resources Uniquely identify entities To refer locations, and so on… Name resolution allows a process to access the named entity

Naming Entities Name  string of characters used to refer to an entity Entity in DS can be anything, e.g., hosts, printers, disks, files, mailboxes, web pages, etc Access Point  To access an entity Address  name of access point Access points of an entity may change

Identifier and True Identifiers We need single name of entity independent from the address of that entity  location independent Identifiers  name that uniquely identifies an entity True Identifier has three properties Refers to at most one entity Each entity is referred to by at most one identifier Never reused Differentiating point for Address and Identifier

Name Space Names in DS are organized into Name Spaces Name Space represented as labeled, directed graph Leaf node  no outgoing edges Directory node  number of labeled outgoing edges Stores directory table containing entries for each outgoing edge as a pair (edge label, node identifier) Root Node  only outgoing edges Path Name sequence of labels Absolute Path  first node in path name is root Relative Path  the opposite case

General Naming Graph

Name Resolution The process of looking up a name Closure Mechanism  Knowing how and where to start name resolution Mounting  transparent way for name resolution with different name spaces Mounted File System  letting a directory node store the identifier of a directory node from a different name space (foreign name space) Mount point  directory node storing the node identifier Mounting point  directory node in the foreign name space Normally the mounting point is root

Mounted File System During resolution, mounting point is looked up & resolution proceeds by accessing its directory table Mounting requires at least Name of an access protocol (for communication) Name of the server (resolved to address) Name of mounting point in foreign name space (resolved to node identifier in foreign NS) Each of these names needs to be resolved Three names can be represented as URL nfs://oslab.khu.ac.kr/home/faraz

Mounted File System

Global Name Service (GNS) Another way to merge different name spaces Mechanism  add a new root node and make the exiting root node its children Problem Existing names need to be changed. E.g., home/faraz  people/home/faraz Expansion is generally hidden from user Has a significant performance overhead when merging 100s or 1000s of name spaces

Global Name Service (GNS)

Implementation of Name Space For large scale DS, name spaces are organized hierarchically Name Spaces are partitioned into three logical layers Global Layer  formed by highest-level nodes Administration Layer  formed by directory nodes managed within a single organization Managerial Layer  formed by nodes that may typically change regularly

Implementation of Name Space

Implementation of Name Space Item Global Administrational Managerial Geographical scale of network Worldwide Organization Department Total number of nodes Few Many Vast numbers Responsiveness to lookups Seconds Milliseconds Immediate Update propagation Lazy Number of replicas None or few None Is client-side caching applied? Yes Sometimes

Implementation of Name Resolution Assumptions No replication of name servers No client side caching Each client has access to a local name server Two possible implementations Iterative Name Resolution Server will resolve the path name as far as it can, and return each intermediate result to the client Recursive Name Resolution A name server passes the result to the next name server found by it

Iterative Name Resolution Advantage Less burden on name sever Disadvantage More communication cost

Recursive Name Resolution Advantages Caching result is more effective Reduced communication cost Disadvantage Demands high performance on each name server

Domain Name System (DNS) An example implementation of name resolution Primarily used for looking up host address and mail servers DNS name space is hierarchically organized as a rooted tree A label is a case sensitive string with max. length of 63 characters Max. length of complete path name is 255 characters The root is represented by a dot We generally omit this dot for readability

Locating Mobile Entities

Naming versus Locating Entities Entities are named for lookup and subsequent access Human-friendly Names Identifiers Addresses Virtually all naming systems maintain mapping from Human-friendly names to addresses Partitioning of Name space Global Level Administrator Level Managerial Level

Naming versus Locating Entities ftp.cs.vu.nl cs.vu.nl ftp.abc.cs.vu.nl abc ftp.khu.ac.kr

Naming versus Locating Entities Possible Solutions Record the address of new machine Lookup operation shall work Another update shall be required to database in case it changes again Record the name of the new machine Less efficient Find the name of new machine Lookup the address associated with the name Addition of step to lookup operation For highly mobile entities, it becomes only worse

Naming versus Locating Entities Direct, single level mapping between names and addresses. T-level mapping using identities.

Simple solutions: Broadcasting and multicasting A location service accepts an identifier as input and returns the current address of the identified entity. Simple solutions exist to work in local area network. Address Resolution Protocol (ARP) to map the IP address of a machine to its data-link address, which uses broadcasting. Multicasting can be used to locate entities in point-to-point networks (such as the Internet). Each multicasting address can be associated with multiple replicated entities.

Forwarding Pointers (1) The principle of forwarding pointers using (proxy, skeleton) pairs.

Forwarding Pointers (1) Redirecting a forwarding pointer, by storing a shortcut in a proxy.

Home-Based Approaches Example: The principle of Mobile IP. (Perkins, 1997) Fall-back mechanism for location services based on “Forwarding Pointers” Draw Backs: Increased Communication Latency: One has to Contact the Home even if the Host is present in Local network. Home location must always exist Solution: Two-Tiered Scheme, Locate the entity in local registry first, then contact the Entity’s Home location. (Mohan and Jain, 1994) applied it in Mobile Telephony. Home Location be kept at traditional Naming Service and let the client first look up the location of Home. That location can be cached.

Hierarchical Approaches (1) Global Location Service (Van Steen et al, 1998) representative of many Personal Communication Systems (Pitoura and Samaras, 2001, Wang 1993) Network is divided into Hierarchy of Domains similar as DNS Domain - Sub Domains  Leaf Domain (LAN/Cell) Each Entity present in a Domain D is represented by a Location Record in the directory node dir(D) Each Location record stores a pointer to the directory node of the next entity, where each location record stores a pointer to the directory node of next lower level sub-domain. Hierarchical organization of a location service into domains, each having an associated directory node.

Hierarchical Approaches (2) An example of storing information of an entity having two addresses in different leaf domains.

Hierarchical Approaches (3) Looking up a location in a hierarchically organized location service. Lookup operations exploit LOCALITY

Hierarchical Approaches (4) Insertion is Installing a Chain of Pointers in top-Down fashion Deletion is Analogous to Insertion. Delete process continues until a pointer is removed from a location record that remains nonempty afterwards An insert request is forwarded to the first node that knows about entity E. A chain of forwarding pointers to the leaf node is created.

Pointer Caches (1) Caching a reference to a directory node of the lowest-level domain in which an entity will reside most of the time. Storing lookup results in traditional Location Services is highly effective because the entities are STATIONARY. For Mobile Entities, caching is not effective. But E moves in D regularly, then a reference to dir(D), can, in principal, be cached at every node along the path from the leaf node where the lookup was initiated Pointer Caching Approach is described by (Jain, 1996) Global Location Service (Van Steen, 1998); (Baggio et al, 2000) Improvements: By letting dir(D) store actual location of E, instead of a pointer to sub-domain. It shall make lookup operation in only two steps 1) get appropriate directory node 2) get the actual location of the E Open Questions: Which Domain pointer should be cached if E moves in two domains regularly. When to invalidate the cache entry

Pointer Caches (2) A cache entry that needs to be invalidated because it returns a nonlocal address, while such an address is available.

Scalability Issues Problem: Root node is required to store a location record for each entity and to process requests for entity Storage: Location record  1 KB, Billion records take on tera byte 10 100 GB disks Looup/update request processing: single root node becomes bottleneck Solution: Partition the root node/high-level directories into sub nodes. Each sub node is reponsible for a specific sub set of all entities. Question: Where to physically place each sub node in the network. Answer 1) centralized approach. Keep all at the same place. And root node is implemented by means of parallel computer The scalability issues related to uniformly placing subnodes of a partitioned root node across the network covered by a location service.

The Problem of Unreferenced Objects An example of a graph representing objects containing references to each other. Having a remote reference to an object doesn’t mean that the object will always be accessible Uni-processor systems VS distributed systems (Plainfosse and Shpiro, 1995) and (Abdullah and RingWood, 1998).

Reference Counting (1) The problem of maintaining a proper reference count in the presence of unreliable communication. Popular Method in Uni-processor systems. Problems: Unreliable Communication If no special measures are taken to detect duplicate messages then skeleton may falsely increment its reference counter again. When a remote reference is to be removed and message is lost again.

Reference Counting (2) Passing a reference requires three messages. That is performance impact in distributed systems. Copying a reference to another process and incrementing the counter too late A solution.

Advanced Referencing Counting (1) Only Decrement counting can be used to maintain reference integrity. Weighted Reference Counting The initial assignment of weights in weighted reference counting Weight assignment when creating a new reference.

Advanced Referencing Counting (2) Weight assignment when copying a reference. Problem: Only a limited number of references can be made.

Advanced Referencing Counting (3) Creating an indirection when the partial weight of a reference has reached 1. Use of Indirection: This is similar to forwarding pointers and suffer from the same problems. Chains are performance degrading Chains are more Susceptible to failure

Advanced Referencing Counting (4) Generation Reference Counting: Each remote reference is created as Proxy/Skeleton pair. (p, s) Each proxy has a generation number. When created it is set to zero. When reference is copied and new proxy is made, this number adds up. Skeleton maintains a table G in which G[i] denotes the outstanding copies of generation i. When a proxy is removed, the Copy Counter (n) and Generation number (k) is sent to Skeleton. Skeleton adjusts G by decrementing G[k] by one and incrementing G[k+1] by n Advantages: Handle duplicate references without the need to contact skeleton at proxy creation time Creating and copying a remote reference in generation reference counting.

Reference Listing (1) Skeleton Keeps track of Proxies Instead of counting them maintain an explicit list of references Adding/removing references to the list have no effect on the fact the proxy is already exists/removed Idempotent Operations Repeatable without affecting the end result Increment/decrement operation are clearly not idempotent Java RMI based on (Birrell et al, 1993)

Reference Listing (2) Advantages Drawback Solution Don’t require reliable communication Duplicate messages need not to be detected Only insertion/deletion should be acknowledged Easier to keep system consistent in case of process failures Drawback Scale badly Solution Leasing

Identifying Unreachable Entities Trace based garbage collection Scalability problems Naïve tracing Mark and sweep collectors White, Grey, Black marks Drawbacks Reachability graphs need to remain same during both phases No process can run when GC is running Naïve tracing in Distributed Systems: Emerald Systems, (jul et al, 1998)

Tracing in Groups (1) Initial marking of skeletons. (Lang et al, 1992) Only for (proxy, skeleton) sets. Distributed GC. Address Scalability. Basic idea is to let low level groups collect garbage, and leave the analysis of inter group references

Tracing in Groups (2) After local propagation in each process.

Tracing in Groups (3) Final marking. Garbage Reclamation is actually be performed by the Local GC.

Conclusion Naming, organization of names and name resolution are key issue in any distributed systems Locating entities is an open research issues. There are few methods like Forwarding pointers, hierarchical approaches, home based approaches and pointer caches but each has its own short comings Reference counting, advanced reference counting and Reference listing are few methods that can be used for unreferenced objects

- All is well that ends well ! Thank you all  Questions / Comments?