Distributed Systems Topic 4: Naming, Trading, and Peer-to-Peer Systems

Distributed Systems Topic 4: Naming, Trading, and Peer-to-Peer Systems
Dr. Michael R. Lyu Computer Science & Engineering Department The Chinese University In this topic we are going to look at the question: How can we find distributed components in a location transparent way? We are going to review naming and trading, two well-known techniques. Naming is concerned with defining external names for components so that the components can be identified by a name server by means of the external name. This approach is very much the same as the identification of participants of the telephone network by means of the white pages. In the white pages a telephone number (an identifier for the network participant) can be looked up given the name of a person or company. In order to do so, network participants have to register themselves in the directory. Trading is concerned with locating components based on the services the component has to offer. This approach is very much the same as the identification of company phone numbers based on the yellow pages. In there the phone numbers of companies are organized around the services that the companies have to offer. In order to do so, companies have to register themselves with the yellow page publisher. Peer-to-peer systems represent a paradigm for the construction of distributed systems and applications in which data and computational resources are contributed by many hosts on the Internet, all of which participate in the provision of a uniform service.

Outline 1. Naming 2. Trading 3. P2P Systems 4. Summary
In this lecture, we first define naming service, and review different name services, such as directories in the network file system, the internet domain service and the X.500 directory service. We are then going to discuss the distributed system name service in more detail. We shall then discuss trading as a technique to register and identify components by the set of services they offer. In the third part, we are going to examine P2P systems. As usual, a summary indicates what you should remember from this lecture and the source of textbook chapters.

1 The Need: Location Transparency
Avoid using physical locations for locating components! Naming: Locating components by external names Similar to white pages Trading: Locating components by service characteristics Similar to yellow pages The location transparency principle suggests to keep the physical location of components transparent for both the component itself and all clients of the component. Only then can the component be migrated to other servers without having to change the components or its clients. This means that we have to avoid using physical locations for identifying components. In a typical distributed system framework, e.g., CORBA, location transparency is already supported by the fact that objects are identified by object references, which are independent of the object´s location. If a client possesses a reference to a server object, the client is able to invoke any operations from the server. Naming supports the definition of external names for components. Then clients of a component that know the external name can locate the component. Trading supports the definition of service characteristics for a component with a trader. Then clients can ask the trader to identify those components that provide the type of service the client is interested in.

1 Naming 1. Names, Addresses and Other Attributes
2. Naming Service Examples - Network File System (NFS) - Internet Domain Service 3. Common Characteristics 4. Limitations Names facilitate communication and resource sharing. Names are not the only useful means of identification: descriptive attributes are another. Example of names: Physical network addresses and logical internetwork addresses Port, process and group identifier Textual, human-readable service names Resource identifiers Files For the discussion of naming, we are going to review a number of name servers that you almost certainly have used already. These are the network file system and the Internet domain service. We are then going to distill the common characteristics of these name servers. Knowledge of these characteristics will prove very useful for the design and the assessment of any naming service. The assessment of the distributed system Naming Service reveals a number of limitations, which motivate the need for the trading service as another service that is used for identifying components.

1.1 Names and Their Attributes
Names are used to refer to a wide variety of resources (computers, services, remote files, or users). Names are not the only useful means of identification: descriptive attributes are another. A name is resolved when it is translated into data about the named resource or object, often in order to invoke an action upon it. The association between a name and an object is called a binding. In general, names are bound to attributes of the named objects. An attribute is the value of a property associated with an object. Names are used to refer to a wide variety of resources (computers, services, remote files, or users). Any process that requires access to a specific resource must possess a name or an identifier for it. A name is resolved when it is translated into data about the named resource or object, often in order to invoke an action upon it. The association between a name and an object is called a binding. In general, names are bound to attributes of the named objects, rather than the implementation of the objects themselves. An attribute is the value of a property associated with an object. Names are also needed to refer to entities in a distributed system that are beyond the scope of any single service. These entities include users (with proper names, login names, user identifiers and electronic mail addresses), computers (with host names) and services themselves (such as file service, printer service).

1.2 Name Services Name service: To store a collection of one or more naming contexts, which are sets of bindings between textual names and attributes for objects such as users, computers, services and remote objects. Name service operations: binding a name and resolve a name Name management is separated from other services for openness: Unification: using the same naming scheme. Integration: sharing name resources in different administrative domains. Name space: the collection of all valid names recognized by a name service. Hierarchic name space: names are hierarchically organized so each part of a name is resolved relative to a separate context. Name service examples: Network File Service (NFS) Directories, Internet Domain Name Service, and X.500 Directory Service. A name service stores a collection of one or more naming contexts – sets of bindings between textual names and attributes for objects such as users, computers, services and remote objects. The major operation that a name service supports is to resolve a name – that is, to look up attributes from a given name. Name management is separated from other services largely because of the openness of distributes systems, which brings the following motivations: Unification: It is often convenient for resources managed by different services to use the same naming scheme. URI is a good example of this. Integration: It is not always possible to predict the scope of sharing in a distributed system. It may become necessary to share and therefore name resources that were created in different administrative domains. Without a common name service, the administrative domains may use entirely different naming conventions. A name space is the collection of all valid names recognized by a particular service. For a name to be valid means that the service will attempt to look it up, even though that name may prove not to correspond to any object – to be unbound. Names may have an internal structure that represents their position in a hierarchic name space, or they may be chosen from a flat set of numeric or symbolic identifiers. The most important advantage of hierarchic name spaces is that each part of a name is resolved relative to a separate context, and the same name may be used with different meanings in different contexts. Three examples are given: Network File Service (NFS) directories, Internet Domain Name Service, and X.500 Directory Service.

1.2.1 NFS Directories Example1: /uac/lec/lyu/course_web
teaching course_web www usr sbin bin inetd ls rlogin uac king cprj A naming service that you have already used is part of the network file system (NFS). This slide presents a subset of the departmental NFS. NFS is based on directories. Directories include a number of name bindings, each of which maps a name to a file or a subdirectory. Names are unique within the scope of the directory and they are used to identify the file or the directory they refer to. Names can be composed to path names by delimiting the name components using a '/'. These composite path names identify files or directories in exactly the same way as simple names. Every file or directory of the file system must have at least one entry in some directory. If the last binding is removed the file or the directory ceases to exist. A file or directory can have more than one name. An example is the directory that is shared by users ’lyu' and ‘king'. In lyu's home directory that directory has the name ‘course_web' while user king has given it the name 'www'. We call them “aliases”. The naming scheme for files in the NFS supports location transparency because now files can be identified using pathnames rather than physical addresses (such as the hard-disk drive names C:) or the IP address of the server machine to which a partition of the file system is connected. Location is only visible for administrators, who have to mount partitions of the file system from remote servers. While doing so, they attach these remote partitions to local directories. Example1: /uac/lec/lyu/course_web Example2: /uac/cprj/king/www

1.2.2 Internet Domain Name Service
The object named by DNS are IP addresses of computers as (naming) domains. In DNS, any name can be resolved by any client, due to Hierarchical partitioning of name database Replication of naming data Caching Internet DNS name space is partitioned both organizationally and according to geography. The original Internet naming scheme held all host names and addresses in a single central master file, which was downloaded by FTP to all computers that required them. It suffers many shortcomings. The objects named by the new DNS are primarily computers – for which mainly IP addresses are stored as attributes – and what we have referred to as naming domains are simply domains in the DNS. The top-level organizational domains (also called generic domains) in use today across the Internet are: com – Commercial organization edu – Universities and other educational institutes gov – US governmental agencies mil – US military organizations net – Major network support centres org – Organizations not mentioned above int – International organizations In addition, every country/special region has its own domains: uk – United Kingdom hk – Hong Kong cn -- China

1.2.2 Internet Domain Name Service
a.root-servers.net (root) edu.hk ac.uk cse.cuhk.edu.hk se.cuhk.edu.hk ns1.cuhk.edu.hk (edu.hk) ns1.cs.ucl.ac.uk (ac.uk) ic.ac.uk qmw.ac.uk city.ac.uk nameserv.city.ac.uk (city.ac.uk) *.city.ac.uk *.cse.cuhk.edu.hk beryl.cuhk.edu.hk (cse.cuhk.edu.hk) Another global name service that has become very prominent recently is the Internet Domain Name Service (DNS). The root servers of DNS such as a.root-servers.net hold entries for several levels of domain, as well as entries for first-level domain names. Note: Name server names are in italics, their IP addresses are in numbers, and the corresponding domains are in parentheses. Each DNS node maintains a table with domains of which it knows the name servers. The root node, for instance would have entries identifying the domains '.hk' and '.edu.hk' representing Hong Kong and all academic sites in the Hong Kong. The name server for academic sites in the Hong Kong is a server operated by the Chinese University and that would know the different name servers serving the different local area networks of universities in the HK. As such it would have an entry for CUHK server that is operated by the Computing Services Center. This name server would know all machines in this university. A name lookup performed by a machine of CSE´s local network of a machine in the network of ‘city.ac.uk’ would then first be performed by cse.cuhk.edu.hk. If that name server could not resolve the binding, it would ask the next higher level name server and so on until it gets to the root. The root would know the name server for the domain '.uk' (i.e., ns1.cs.ucl.ac.uk) and ask that name server for the name server of the subdomain ‘city.ac.uk’ (i.e., nameserv.city.ac.uk). The power of DNS lies in the fact that it performs replication and caching of name bindings. Thus once a name binding is found, it will be cached on all name servers that were involved in the search. In most cases, therefore, the root name server would not be involved since local name servers already know to which name servers to talk to in order to resolve a name binding.

1.2.2 Composed Naming Domains from URL
URL Resource ID (IP number, port number, pathname) Network address 2:60:8c:2:b0:5a file Web server WebExamples/earth.html 8888 DNS lookup Socket This slide shows composed naming domains used to access a resource from a URL. The domain name portion of a URL resolved first via the DNS into an IP address and then via Address Resolution Protocol (ARP) to an Ethernet address for the web server. The last part of the URL is resolved by the file system on the web server to locate the relevant file.

1.2.2 Uniform Resource Identifier (URI)
URL is a particular type of URI in identifying web resources. URLs scale up to an unlimited set of web resources for efficient access. URLs are essentially addresses of web resources, which do not support mobility transparency. Uniform Resource Name (URN) allows web resources to move around. Uniform Resource Characteristic (URC), a subset of URN, describes attributes of web resources. URLs are the principal means of identifying web resources. They are in fact a particular type of Uniform Resource Identifier (URI). URLs have the important properties that they scale to an unlimited set of web resources, and they are efficient handles for resources. It is easy to access a resource from the information in its URL (a DNS computer name and a pathname on that machine). Because URLs are essentially addresses of web resources, they suffer from the disadvantage that if a resource is deleted or if it moves, then there will in general be dangling links to the resource containing the old URL. In other words, they do not support mobility transparency. The other main type of URI is a Uniform Resource Name (URN). URNs are intended to solve the dangling link problem and to provide richer modes of finding resources on the Web. A web resource with URN persists even though the resource may move. The owner of the resource registers its name, along with its current URL, with a URN lookup service that will provide the URL when given the URN. The owner registers the new URL if the resource moves. A URN is of the form urn:nameSpace:nameSpace-specificName. Uniform Resource Characteristics (URCs, aka Uniform Resource Citations) are a subset of URNs for the description of a web resource consisting of attributes of the resource. URCs are for describing web resources and also for looking up web resources that match their attribute specification.

1.2.2 Navigation Navigation is the process of locating naming data from among more than one name server in order to resolve a name. Iterative navigation: a client presents the name to the local name server which attempts to resolve it. Multicast navigation: a client multicasts the name to be resolved to the group of name servers. Server-controlled navigation: A name server coordinates the resolution of the name and passes the result back to the user agent. Non-recursive vs. recursive Any name service, such as DNS, that stores a very large database and is used by a large population will not store all of its naming information on a single server computer to avoid single point failure. Replication is used to achieve both high availability and performance. Data is general partitioned into servers according to its domain. The partitioning of data implies that the local name server cannot answer all enquiries without the help of other name servers. The process of locating naming data from among more than one name server in order to resolve a name is called navigation. The client name resolution software carries out navigation on behalf of the client. It communicates with name servers as necessary to resolve a name. Navigation schemes include iterative navigation which is controlled by the client, or server-controlled navigation which is controlled by a dedicated server. Multicast communications can be used in these navigation schemes. Server-controlled navigation is further distinguished with non-recursive and recursive ones.

1.2.2 Iterative Navigation NS2 2 Name 1 NS1 servers Client 3 NS3
DNS supports the model known as iterative navigation. To resolve a name, a client presents the name to the local name server, which attempts to resolve it. If the local name server has the name, it returns the result immediately. If it does not, it will suggest another server that will be able to help. Resolution proceeds at the new server, with further navigation as necessary until the name is located or is discovered to be unbound. In multicast navigation, a client multicasts the name to be resolved and the required object type to the group of name servers. Only the server that holds the named attributes responds to the request. Multicast navigation is also used in discovery services. Iterative navigation: A client iteratively contacts name servers NS1–NS3 in order to resolve a name In multicast navigation, the client multicasts the name to be resolved to a group of name servers.

1.2.2 Server-Controlled Navigation
3 5 4 A name server NS1 communicates with other name servers on behalf of a client client Recursive server-controlled NS2 NS1 NS3 Non-recursive Under non-recursive server-controlled navigation, any name server may be chosen by the client. This server communicates by multicast or iteratively with its peers, as though it were a client. Under recursive server-controlled navigation, the client once more contacts a single server If this server does not store the name, the server contacts a peer storing a (larger) prefix of the name, which in turn attempts to resolve it. This procedure continues recursively until the name is resolved. If a name service spans distinct administrative domains, the clients executing in one administrative domain may be prohibited from accessing name servers belonging to another such domain. Moreover, even name servers may be prohibited from discovering the disposition of naming data across name servers in another administrative domain. Then, both client-controlled and non-recursive server-controlled navigation are inappropriate, and recursive server-controlled navigation must be used.

Name Caching Caching is a general computing scheme to improve performance of data access. Name caching is a process in maintaining the results of previous name resolutions in server memory space. In addition to name service’s performance, caching helps to maintain the availability of both the name service and other services. Caching is widely applicable because naming data are changed relatively rarely. In DNS and other name services, client name resolution software and servers maintain a cache of the results of previous name resolutions. When a client requests a name lookup, the name resolution software consults its cache. If it holds a recent result from a previous lookup for the name, it returns it to the client; otherwise, it sets about finding it from a server. That server, in tern, may return data cached from other servers. Caching is a key to a name service’s performance and assists in maintaining the availability of both the name service and other services despite name server crashes. Its role in enhancing response times by saving communication with name servers is clear. Caching can be used to eliminate high-level name servers – the root server, in particular – from the navigation path, allowing resolution to proceed despite some server failures. Caching by client name resolvers is widely applied in name services and is particularly successful because naming data are changed relatively rarely.

1.3 Common Characteristics
Concerns to be addressed by a naming service: Names. Name Spaces. Naming service provides operations for Defining names of components (bind). Lookup components by name (resolve). Persistence of bindings. All the naming services we looked at include the concept of external names that can be defined for distributed for components, be they file names, names of organizations or Internet domain names. All names are defined within the scope of hierarchically organized name spaces. These are directories in NFS or the X.500 directory tree or name servers in the Internet. All naming services provide two fundamental operations to define and lookup names. The operation that defines a new name is usually referred to as 'bind', while the operation that searches for a component is commonly denoted as 'resolve'. Moreover, the name bindings are stored persistently by the name servers. Directory and file names are stored as part of the file system on disks. Directory entries in X.500 are stored persistently by the respective servers and the Internet domain name servers store name bindings persistently in configuration databases.

1.3 Common Characteristics
Qualities of service: Distribution of name spaces Performance profile Caching Replication Transaction properties of concurrent naming operations The implementation of the different naming services also have to choose from a common portfolio of qualities of services. The naming services can be regarded as distributed systems themselves and as such they are subject to the various properties and principles that we have discussed in the first week's Introduction session. In order to avoid slowing down applications, the naming services have to achieve a decent performance profile. This is usually done by trading the performance of the bind and resolve operations against each other. In some naming services, such as the Internet DNS or the X.500 service, it does not happen very often that a new component is given a name. The performance of bind is, therefore, less important and is often traded against an improvement of the lookup performance. In these cases caching and replication may be used to speed up the lookup. Finally, it has to be decided whether the lookup and bind operations have all or some of the transaction properties so that are atomic and/or isolated against concurrent naming operations.

1.4 Limitations Limitation of Naming: Client always has to identify the server by name. Inappropriate if client just wants to use a service at a certain quality but does not know who is the service provider. Example: Automatic cinema ticketing, Video on demand, Electronic commerce. With naming, we have now seen one technique by means of which client components can identify server components. Naming is useful if the client component exactly knows from which server component it wants to invoke a certain service. Naming, however, is not appropriate if a client knows which type of service it wants to use but does not know (and does not have to know) the server component. There are various examples of these situations: A distributed cinema ticketing system with which clients specify which movie they want to see with which quality of service (area of cinema, size of screen, Dolby Surround etc.) but the detailed cinema or even the seats are not necessarily important. A video on demand server where clients specify the film they want to rent but do not care from which video provider they obtain the films. An electronic commerce system that supports trade in shares where clients specify which shares to sell or buy but do not care about the stock exchange selected for the deal. In all these situations clients do not need to know the server component that provides the service, but only negotiate with a trader component that operates on the clients' behalf. We are now going to look at a trading service, that has been specified to be useful in the situations sketched above.

2 Trading 1. Characteristics 2. Example 3. A Typical Trading Service
To discuss trading, we are going to review the common characteristics of trading systems first. We then explain these characteristics on the basis of an example, a video on demand server. Finally, we take a closer look at the trading approach supported by distributed system.

2.1 Trading Characteristics
Trader operates as broker between client and server. Enables client to change perspective from ´who?´ to ´what?´ Selection between multiple service providers. Similar ideas in: yellow pages insurance broker stock brokerage. Trading is another primitive to locate components in a location transparent way. The principle idea of a trading service is to have a mediator that acts as a broker between clients and servers. This broker enables clients to change its perspective when it tries to locate a server component from locating individual server components (who is the server that can help me?) to the set of services the client is interested in (what server offers the services that I need?). The broker then selects a suitable service provider on behalf of the client. The idea is not restricted to distributed systems, but is found as well in other (non computerized) systems. A good example are the yellow pages. Service providers, such as plumbers or lawyers, register with the publisher of the yellow pages. Clients who wish to use a particular service can then lookup the yellow pages by service providers to find a provider that offers the required service. Other examples similar to the yellow pages are stock brokers and insurance brokers.

Common language between client and server: Service types Qualities of service Server registers service with trader. Server defines assured quality of service: Static QoS definition Dynamic QoS definition. There has to be a language for expressing types of services that both client and server understand. This language has to be expressive enough to define the different types of services that a server offers or that a client may wish to use. Moreover, the language has to be expressive enough so that clients can ask for certain degrees of quality of a service, such as performance, reliability or privacy. Servers would use the expressiveness of the language in order to advertise the quality of their services. In order to enable a trading service to act as a broker between clients and servers, the servers have to register the services they offer with the trader. They use the above language to declare the types of services they offer and their qualities during the registration. The quality of service may be defined statically or dynamically. A static definition is appropriate (because it is simpler) if the quality of service is independent of the state of the server. This might be the case for qualities such as precision, privacy or reliability. For qualities such as performance, however, the server may not be able to ensure a particular quality of service statically at the time it registers the service with the trader. Then a dynamic definition of the quality would be used that would make the trading service inquire about the quality when a client needs to know it.

Clients ask trader for a service of a certain type at a certain level of quality Trader supports service matching service shopping After servers have registered the types of services they offer with the trader, the trader is in a position to respond to service inquiries from clients. The situation is similar to that the yellow pages have been published. A Client would then use the common language to ask the trader for a server that provides the type of service the client is interested in. Clients may or may not include specifications of the quality of service that they expect the server to provide. The trader then reacts to such an enquiry of clients in different ways. The trader may itself attempt to match the clients request with the best offer and just return the identification of a single server that provides the service with the intended quality. This technique is known as service matching. The trader may also compile a list of those servers that offer a service which matches the clients request. The trader would then return that list to the client in order to enable the client to select the most appropriate server itself. This technique is known as service shopping.

2.2 Example Distributed system for video-on-demand: Server Video-on-
Trader Video-on- demand provider MGM Warner Independent User Server As an example, let us consider a distributed system for video-on-demand. Users in such a system would be interested in watching certain films on video. Users are not interested in which company has published a film and where they can obtain an electronic copy they want to watch. Users would therefore use a client component that connects to a trader running on a machine of a video-on-demand provider. Film publishers, such as MGM and Warner, as well as independent companies would run video server components where electronic copies of films in different formats are stored to be downloaded. The different formats imply different qualities of service. High-resolution formats take longer to download but are nicer to watch. The video server components would register the titles they offer as different service types with the trader of the video-on-demand server. They would specify resolution and size (a.k.a. download time) as quality of service information. When a user wants to watch a particular video, the respective client component on the user´s workstation would inquire at the video-on-demand server for video servers that keep an electronic copy of that video. The client may add a quality of service specification to identify a certain resolution (say 1024x768 and 256 colors). With service matching, the video-on-demand server would return the identification of a server that provides the video in the required resolution. With service shopping, the video-on-demand server would return a list of servers that have the video in the requested (or a better) resolution.

2.3 Typical Trading Service
Trader Client Server (1) Register (2) Lookup (2a) Monitor QoS (3) Application There are three principle components involved in the application of a typical trading service: server, client and trader. These components participate in three steps during trading. In the first step, the server registers its services with the trader using a registration interface exported by the trader. While doing so it specifies the services and the properties of these services. After that, clients can lookup services using a lookup interface offered by the trader. If qualities of services are specified dynamically by the server the trader will invoke operations from the server to monitor the quality of services at run-time. Once the client has obtained an object reference of a server from the trader, the client can use the object reference to apply the server´s services.

3 Peer-to-Peer Systems Introduction Napster and its legacy
Peer-to-peer middleware Routing overlays Peer-to-peer systems represent a paradigm for the construction of distributed systems and applications in which data and computational resources are contributed by many host in the Internet, all of which participate in the provision of a uniform service. Their emergence is a consequence of the very rapid growth of the Internet, embracing many millions of computers and similar numbers of users requiring access to shared resources.

3.1 Introduction The goal of peer-to-peer systems is to enable the sharing of data and resources. It aims to support useful distributed services and applications using data and computing resources available in the personal computers and workstations as application. Shirky: “Applications that exploit resources available at the edges of the Internet – storage, cycles, content, human presence.” Discuss the general techniques that simplify the construction of peer-to-peer applications and enhance their scalability, reliability and security. The goal of peer-to-peer systems is to enable the sharing of data and resources on a very large scale by elimination of any requirement for separately-managed servers and their associated infrastructure. Peer-to-peer systems aim to support useful distributed services and applications using data and computing resources available in the personal computers and workstations as application. This is increasingly attractive as the performance difference between desktop and server machines narrows and broadband network connections proliferate. But there is another, broader aim: “applications that exploit resources available at the edges of the Internet – storage, cycles, content, human presence”. Each type of resource sharing mentioned in that definition is already represented by distributed applications available for most types of personal computer. The purpose of this part is to discuss the general techniques that simplify the construction of peer-to-peer applications and enhance their scalability, reliability and security.

3.1.1 Characteristics of Peer-to-Peer Systems
Ensure each user contributes resources to the systems. All nodes have the same functional capabilities and responsibilities. Their correct operation does not depend on the existence of any centrally-administered systems. Offer limited degree of anonymity to the providers and users of resources. Choice of algorithm for the placement of data. Peer-to-peer systems’ design ensure that each user contributes resources to the systems. Although they may differ in the resources that they contribute, all nodes in the peer-to-peer systems have the same functional capabilities and responsibilities. Their correct operation does not depend on the existence of any centrally-administered systems They can be designed to offer limited degree of anonymity to the providers and users of resources. A key issues for their efficient operation is the choice of an algorithm for the placement of data across many hosts and subsequent access to it in a manner that balances the workload and ensures availability without adding undue overhead.

3.1.1 Example: Distributed Video Streaming
Traditional client/server solution is not appropriate for video streaming Bottlenecks Server load The server bandwidth is the major bottleneck Edge capacity One connection to one client The connection may degrade End to end bandwidth Scalability? Server becomes a bottleneck when too many clients want information from it 3 bottlenecks Server load Edge capacity End-to-end bandwidth

3.1.1 Example: P2P-Based Solution
Collaboration between client peers Source server alleviated Better scalability Better reliability

3.1.2 Peer-to-Peer Systems Three generations of peer-to-peer system and application development can be identified. The first generation was launched by the Napster music exchange service The second generation of file sharing applications offering greater scalability, anonymity and fault tolerance quickly. The third generation is characterized by emergence of middleware layers for the application-independent management. Three generations of peer-to-peer system and application development can be identified. The first generation was launched by the Napster music exchange service The second generation of file sharing applications offering greater scalability, anonymity and fault tolerance quickly. The third generation is characterized by emergence of middleware layers for the application-independent management of distributed resources on a global scale..

3.2 Napster The first application in which a demand for a globally-scalable information storage and retrieval service emerged. Popular for music exchange. Napster’s architecture included centralized indexes but users supplied the files. The first application in which a demand for a globally-scalable information storage and retrieval service emerged was the downloading of digital music files. Both the need and the feasibility of a peer-to-peer solution were first demonstrated by the Napster file sharing system which provided a means for users to share files. Napster became very popular for music exchange soon after its launch in At its peak, several million users were registered and thousands were swapping music files simultaneously. Napster’s architecture included centralized indexes but users supplied the files, which were stored and accessed on their personal computers.

3.2.1 Napster’s Method of Operation
Napster’s method of operation is illustrated by the sequence of steps shown in the figure. Note that in step 5 clients are expected to add their own music files to the pool of shared resources by transmitting a link to the Napster indexing service for each available file. Thus the motivation for Napster and the key to its success was to make a large, widely-distributed set of files available to users throughout the Internet, fulfilling Shirky’s dictum by providing access to “shared resources at the edges of the Internet.”

3.2.2 Napster’s Legacy The original Napster was shut down as a result of legal proceedings. Anonymity for the receivers and the providers of shared data and other resources is a concern for the designers of peer-to-peer systems. Napster demonstrated the feasibility of building a useful large-scale service which depends almost wholly on data and computers owned by ordinary Internet users. Napster was shut down as a result of legal proceedings that were instituted against the operators of the Napster service by the owners of the copyright in some of the material that was made available on it. Anonymity for the receivers and the providers of shared data and other resources is a concern for the designers of peer-to-peer systems. In systems with many nodes, the routing requests and results can be made sufficiently tortuous to conceal their sources and the contents of the files can be distributed across multiple nodes, spreading the responsibility for making them available. Napster demonstrated the feasibility of building a useful large-scale service which depends almost wholly on data and computers owned by ordinary Internet users. To avoid swamping the computing resources of individual users and their network connections, Napster took account of network locality.

3.3 Peer-to-Peer Middleware
Provide a mechanism to enable clients to access data resources quickly and dependably whenever they are located throughout the network. Designed specially to meet the need for the automatic placement and subsequent location of the distributed objects. Peer-to-peer middleware systems provide a mechanism to enable clients to access data resources quickly and dependably whenever they are located throughout the network. This location problem exist in several services that predate the peer-to-peer paradigm. Peer-to-peer middleware systems are designed specially to meet the need for the automatic placement and subsequent location of the distributed objects managed by peer-to-peer systems and applications.

3.3.1 Functional Requirement
Simplify the construction of services that are implemented across many hosts in a widely distributed network. Enable clients to locate and communicate with any individual resources. Ability to add new resources (or hosts) and to remove them. The function of peer-to-peer middleware is to simplify the construction of services that are implemented across many hosts in a widely distributed network. To achieve this it must enable clients to locate and communicate with any individual resources made available to a service, even though the resources are widely distributed amongst the hosts. Other important requirements include the ability to add new resources and to remove them at will and to add hosts to the service and remove them.

3.3.2 Non-Function Requirements
Global scalability Load balancing Optimization for local interactions between neighboring peers Accommodating to highly dynamic host availability Security of data in an environment with heterogeneous trust Anonymity, deniability and resistance to censorship Global scalability: one of the aims of p2p applications is to exploit the hardware resources of very large numbers of hosts connected to the Internet. P2p middleware must therefore be designed to support applications that access millions of objects on tens of thousands or hundreds of thousands of hosts. Load balancing: The performance of any system designed to exploit a large number of computers depends upon the balanced distribution of workload across them. Optimization for local interactions between neighboring peers: the “network distance” between nodes that interact has a substantial impact on the latency of individual interactions. Network traffic loadings are also impacted by it. Accommodating to highly dynamic host availability: most of the p2p systems are constructed from host computers that are free to join or leave the system at any time. The host and network segments used in p2p systems are not owned or managed by any single authority. Security of data in an environment with heterogeneous trust: in global-scale systems with participating hosts of diverse ownership, trust must be built up by the use of authentication and encryption mechanisms to ensure the integrity and privacy of information. Anonymity, deniability and resistance to censorship: A related requirement is that the host that hold data should be able plausibly to deny responsibility for holding or supplying it. The use of large numbers of hosts in p2p systems can be helpful in achieving these properties.

3.1.3 Routing Overlays versus IP Routing
Routing overlays share many characteristics with IP packet routing infrastructure. Additional application-level routing mechanism is required. Routing overlays share many characteristics with IP packet routing infrastructure that constitutes the primary communication mechanism of the Internet

3.4 Routing Overlay Information Distribution
Object: Node: D C’s routing knowledge D’s routing knowledge A’s routing knowledge B’s routing knowledge C A B Knowledge of the locations of objects must be partitioned and distributed throughout the network. Each node is made responsible for maintaining detailed knowledge of the locations of nodes and objects in a portion of the name space as well as a general knowledge of the topology of the entire name space, which is shown in this figure. A high degree of replication of the name space knowledge is necessary to ensure dependability in the face of the volatile availability of hosts and intermittent network connectivity.

3.4 Routing Overlays The distributed algorithm known as a routing overlay takes responsibility for locating nodes and objects. The routing overlay ensures that any node can access any object by routing each request through a sequence of nodes. Peer-to-peer systems usually store multiple replicas of objects to ensure availability. The distributed algorithm known as a routing overlay takes responsibility for locating nodes and objects. The name denotes the fact that the middleware takes the form of a layer that is responsible for routing requests from any client to a host that holds the object to which the request addressed. The objects of interest may be placed and subsequently relocated to any node in the network without client involvement. The routing overlay ensures that any node can access any object by routing each request through a sequence of nodes, exploiting knowledge at each of them to locate the destination object. Peer-to-peer systems usually store multiple replicas of objects to ensure availability. In that case, the routing overlay maintains knowledge of the location of all available replicas and delivers requests to the nearest “live” node that has a copy of the relevant object.

4 Summary Naming services provide facilities to give external names to components. Trading services match service types requested by clients to servers that can satisfy them. P2P systems exploit existing naming, routing, data replication and security techniques for resource sharing. Read Chapters 10 and 13 of the textbook. In summary we have seen that location transparency requires other forms of identifying components than physical addresses. We have seen two basic techniques to locate servers: naming and trading. Naming services, such as the Internet Domain Name Service, the X.500 directory service or the distributed system Naming service map hierarchically organized names to component identifications such as IP addresses or a distributed system (e.g., CORBA) object references. Trading services provide an even higher level of abstraction than naming in that they use a common language between servers and clients to describe service types and their qualities. Servers register the services they offer with a trader and clients use the trader to inquire about services. Once the trader has matched a client enquiry with a service offer, clients and servers communicate privately without involvement of the trader. Hence traders usually do not impose any overhead on the communication between clients and servers, they just enable clients to find a most suitable server. P2P systems are emerging that have the capacity to share computing resources, storage and data present in computers ‘at the edges of the Internet’ on a global scale. exploit existing naming, routing, data replication and security techniques for resource sharing. Read Chapter 10 for P2P systems, Chapter 13 for name services.

Distributed Systems Topic 4: Naming, Trading, and Peer-to-Peer Systems

Similar presentations

Presentation on theme: "Distributed Systems Topic 4: Naming, Trading, and Peer-to-Peer Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Systems Topic 4: Naming, Trading, and Peer-to-Peer Systems

Similar presentations

Presentation on theme: "Distributed Systems Topic 4: Naming, Trading, and Peer-to-Peer Systems"— Presentation transcript:

Similar presentations

About project

Feedback