November 2005Distributed systems: general services1 Distributed Systems: General Services.

November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005Distributed systems: general services2 Overview of chapters Introduction Co-ordination models and languages General services –Ch 11 Time Services, 11.1-11.3 –Ch 9 Name Services –Ch 8 Distributed File Systems –Ch 10 Peer-to-peer Systems Distributed algorithms Shared data

November 2005Distributed systems: general services3 This chapter: overview Introduction Time services Name services Distributed file systems Peer-to-peer systems

November 2005Distributed systems: general services4 Introduction Objectives –Study existing subsystems Services offered Architecture Limitations – Learn about non-functional requirements Techniques used Results Limitations

November 2005Distributed systems: general services6 Time services: Overview What & Why Clock synchronization Logical clocks (see chapter on algorithms )

November 2005Distributed systems: general services7 Time services Definition of time –International Atomic Time (TAI) A second is defined as 9.192.631.770 periods of transition between two hyperfine levels of the ground state of Caesium-133 –Astronomical time Based on the rotation of the earth on its axis and its rotation about the sun. But due to the tidal friction the earth’s rotation is getting longer –Co-ordinated Universal Time (UTC) International standard based Atomic time, but a leap second is occasionally inserted or deleted to keep up with Astronomical time

November 2005Distributed systems: general services8 Time services (cont.) Why is time important? –Accounting (e.g. logins, files,…) –Performance measurement –Resource management –some algorithms depend on it: Transactions using timestamps Authentication (e.g. Kerberos) Example Make

November 2005Distributed systems: general services9 Time services (cont.) An example: Unix tool “make” –large program: multiple source files: graphics.c multiple object files: graphics.o change of one file  recompilation of all files – use of make to handle changes: change of graphics.c at 3h45 make examines creation time of graphics.o e.g. 3h43: recompiles graphics.c to create an up-to- date graphics.o

November 2005Distributed systems: general services10 Time services (cont.) Make in a distributed environment: source file on system A object file on system B Clock B 3h.. 47484950 Graphics.o created Clock A 3h.. 41424344 Graphics.c modified time Equal times Make graphics.o  no effect!!

November 2005Distributed systems: general services11 Time services (cont.) Hardware clocks –each computer has a physical clock H(t): value of hardware clock/counter different clocks  different values e.g. crystal-based clocks drift due to –different shapes, temperatures, … Typical drift rates: –1  sec/sec  1 sec in 11.6 days –High precision quartz clocks: 0.1-0.01  sec/sec Software clocks –C(t) =  H(t) +  Ex. nanosecs elapsed since reference time

November 2005Distributed systems: general services12 Time services (cont.) Changing local time –Hardware: influence H(t) –Software: change  and/or  –Forward: Set clock forward some clock ticks appear to have been missed –Backward: Set backward? unacceptable Let clock run slower for a short period of time

November 2005Distributed systems: general services13 Time services (cont.) Properties –External synchronization to UTC (S) | S(t) - C(t) | < D –Internal synchronization | C i (t) – C j (t)| < D –Correctness Drift rate <  –Monotonicity t’ > t  C(t’) > C(t)

November 2005Distributed systems: general services14 Time services: Overview What & Why Synchronising Physical clocks –Synchronising with UTC –Christian’s algorithm –The Berkeley algorithm –Network Time Protocol Logical clocks (see chapter on algorithms )

November 2005Distributed systems: general services15 Time services synchronising with UTC Broadcast by Radio-stations –WWV (USA) (accuracy 0.1 - 10 msec.) Broadcast by satellites –Geos (0.1 msec.) –GPS (1 msec) Receivers to workstations

November 2005Distributed systems: general services16 Time services Christian’s algorithm assumption: time server process T s on a computer with UTC receiver process P: request/reply with time server PTsTs T-request reply (t)

November 2005Distributed systems: general services17 Time services Christian’s algorithm how to set clock at P? –time t inserted in T s –time for P: t + T trans –T trans = min + x min = minimum transmission time x? depends on –network delay –process scheduling

November 2005Distributed systems: general services18 Time services Christian’s algorithm In a synchronous system –Upper bound on delay: max –P sets time to t + (max – min) / 2 –Accuracy: (max – min) / 2 In an asynchronous system?

November 2005Distributed systems: general services19 Time services Christian’s algorithm practical approach –P measures round-trip time: T round –P sets time to t + T round /2 –accuracy? time at T s when message arrives at P is in range: [t + min, t + T round - min] accuracy = ((T round /2) - min) problems: –single point of failure –impostor time server

November 2005Distributed systems: general services20 Time services The Berkeley algorithm Used on computers running Berkeley Unix active master elected among computers whose clocks have to be synchronised –no central time server –upon failure of master: re-election

November 2005Distributed systems: general services21 Time services The Berkeley algorithm algorithm of master: –periodically polls the clock value of other computers –estimates its local clock time (based on round- trip delays) –averages the obtained values –returns the amount by which each individual’s slave clock needs adjustment –fault-tolerant average (strange values excluded) may be used

November 2005Distributed systems: general services22 Time services Network Time Protocol Standard for the Internet Design aims and features: –Accurate synchronization despite variable message delays Statistical techniques to filter timing data from different servers –Reliable service that can survive losses of connectivity Redundant servers hierarchy of servers in synchronised subnets

November 2005Distributed systems: general services23 Time services Network Time Protocol 1 22 333 Level = stratum Stratum1 have UTC receiver synchronisation

November 2005Distributed systems: general services24 Time services Network Time Protocol Three working modes for NTP servers –Multicast mode used on high speed LAN slave sets clock assuming small delay –Procedure-call mode similar to Christian’s algorithm –Symmetric Mode used by master servers and higher levels association between servers

November 2005Distributed systems: general services25 Time services Network Time Protocol Procedure-call and Symmetric mode –each message bears timestamps of recent message events Server B T i-2 Server A T i-1 T i-3 T i time m’m

November 2005Distributed systems: general services26 Time services Network Time Protocol –Compute approximations of offset and delay Server B T i-2 Server A T i-1 T i-3 T i o true offset m’m Transmission timestt’ T i-2 = T i-3 + t + o T i = T i-1 + t’ - o a = T i-2 - T i-3 = t + o b = T i-1 - T i = o – t’ estimates d i = a - b o i = (a + b) / 2 o i - d i / 2  o  o i + d i / 2 d delay total transmission time

November 2005Distributed systems: general services27 Time services Network Time Protocol Procedure-call and Symmetric mode –data filtering applied to successive pairs –NTP has complex peer-selection algorithm; favoured are peers with : a lower stratum number a lower synchronisation dispersion –Accuracy: 10 msec on Internet 1 msec on LAN

November 2005Distributed systems: general services29 Name services Overview –Introduction –Basic service –Case study: Domain name system –Directory services

November 2005Distributed systems: general services30 Name services Introduction Types of name: –users –files, devices –service names (e.g. RPC) –port, process, group identifiers Attributes associated with names –Email address, login name, password –disk, block number –network address of server –...

November 2005Distributed systems: general services31 Name services Introduction Goal: –binding between name and attributes History –originally: quite simple problem scope: a single LAN naïve solution: all names + attributes in a single file –now: interconnected networks different administrative domains in a single name space

November 2005Distributed systems: general services32 Name services Introduction naming service separate from other services: –unification general naming scheme for different objects e.g. local + remote files in same scheme e.g. files + devices+ pipes in same scheme e.g. URL –integration scope of sharing difficult to predict merge different administrative domains requires extensible name space

November 2005Distributed systems: general services33 Name services Introduction General requirements: –scalable –long lifetime –high availability –fault isolation –tolerance of mistrust

November 2005Distributed systems: general services34 Name services Overview –Introduction –Basic service –Domain name system –Directory services

November 2005Distributed systems: general services35 Name services Basic service Operations: identifier lookup (name, context) register (name, context, identifier) delete (name, context) From a single server to multiple servers: –hierarchical names –efficiency –availability

November 2005Distributed systems: general services36 Name services Basic service Hierarchical name space –introduction of directories or subdomains /users/pv/edu/distri/services.ppt cs.kuleuven.ac.be –easier administration delegation of authority less name conflicts: unique per directory unit of distribution: server per directory + single root server

November 2005Distributed systems: general services37 Name services Basic service Hierarchical name space –navigation process of locating naming data from more than one server iterative or recursive –result of lookup: attributes references to another name server

November 2005Distributed systems: general services38 Name services Basic service root srcusers syspvann Lookup(“/users/pv/edu/…”,…) P Ref to server-users

November 2005Distributed systems: general services39 Name services Basic service root srcusers syspvann P Lookup(“pv/edu/…”,…) Ref to server-pv

November 2005Distributed systems: general services40 Name services Basic service Efficiency  caching –hierarchy introduces multiple servers  more communication overhead  more processing capacity –caching at client side: are cached fewer request  lookup performance higher inconsistency? –Data is rarely changed –limit on validity

November 2005Distributed systems: general services41 Name services Basic service Availability  replication –at least two failure independent servers –primary server: accepts updates –secondary server(s): get copy of data of primary period check for updates at primary –level of consistency?

November 2005Distributed systems: general services43 Name services Domain name system DNS = Internet name service originally: an ftp-able file –not scalable –centralised administration a general name service for computers and domains –partitioning –delegation –replication and caching

November 2005Distributed systems: general services44 Name services Domain name system Domain name –hierarchy nix.cs.kuleuven.ac.be country (België) univ. (academic) K.U.Leuven dept. computer science name of computersystem

November 2005Distributed systems: general services45 Name services Domain name system 3 kinds of Top Level Domains (TLD) –2-letter country codes (ISO 3166) –generic names (similar organisations) comcommercial organisations orgnon-profit organisations (bv. vzw) intinternational organisations (nato, EU, …) netnetwork providers –USA oriented names eduuniversities govAmerican government milAmerican army –new generic names biz, info, name, aero, museum,...

November 2005Distributed systems: general services46 Name services Domain name system For each TLD: –administrator (assign names within domain) –“be”: DNS BE vzw (previous: dept. computer science) Each organisation with a name : –responsible for new (sub)names –e.g. cs.kuleuven.ac.be Hierarchical naming + delegation  workable structure

November 2005Distributed systems: general services47 Name services Domain name system Name servers –root name server + –server per (group of) subdomains + replication  high availability + caching  acceptable performance –time-to-live to limit inconsistencies

November 2005Distributed systems: general services48 Name services Domain name system Name server cs.kuleuven.ac.be Systems/subdomainstypeIP-adres of cs.kuleuven.ac.be nixA134.58.42.36 idefixA134.58.41.7 droopyA134.58.41.10 stevinA134.58.41.16... A = Address

November 2005Distributed systems: general services49 Name services Domain name system Name server kuleuven.ac.be Machines/subdomeinentypeIP-adres van kuleuven.ac.be csNS134.58.39.1 esatNS… wwwA…... NS = NameServer

November 2005Distributed systems: general services50 Name services Domain name system Example : www.cs.vu.nl Lokale NS (cs.kuleuven.ac.be) www. cs.vu. nl Root-NS NS (nl) NS (vu.nl) NS (cs.vu.nl) 130.37.24.11 130.37.24.11

November 2005Distributed systems: general services51 Name services Domain name system Extensions: –mail host location used by electronic mail software requests where mail for a domain should be delivered –reverse resolution (IP address -> domain name) –host information Weak points: –security

November 2005Distributed systems: general services52 Name services Domain name system Good system design: –partitioning of data multiple servers –replication of servers high availability limited inconsistencies NO load balancing; NO server preference –caching acceptable performance limited inconsistencies

November 2005Distributed systems: general services54 Name services Directory services Name service exact name  attributes Directory service some attributes of object  all attributes Examples: –request: E-mail address of Janssen at KULeuven –result: list of names of all persons called Janssen –select person and read attributes

November 2005Distributed systems: general services55 Name services Directory services Directory information base –list of entries –each entry set of type/list-of-values pairs optional and mandatory pairs some pairs determine name –distributed implementation

November 2005Distributed systems: general services56 Name services Directory services X500 –CCITT standard LDAP –light weight directory access protocol –Internet standard

November 2005Distributed systems: general services58 Distributed file systems Overview –File service architecture –case study: NFS –case study AFS –comparison NFS <> AFS

November 2005Distributed systems: general services59 Distributed file systems File service architecture Definitions: –file –directory –file system cf. Operating systems

November 2005Distributed systems: general services60 Distributed file systems File service architecture Requirements: –addressed by most file systems: access transparency location transparency failure transparency performance transparency –related to distribution: concurrent updates hardware and operating system heterogeneity scalability

November 2005Distributed systems: general services61 Distributed file systems File service architecture Requirements (cont.) : –scalable to a very large number of nodes replication transparency migration transparency –future: support for fine-grained distribution of data tolerance to network partitioning

November 2005Distributed systems: general services62 Distributed file systems File service architecture File service components Flat file service Directory service Network Client module: API User program UFID names

November 2005Distributed systems: general services63 Distributed file systems File service architecture Flat file service –file = data + attributes –data = sequence of items –operations: simple read & write –attributes: in flat file & directory service –UFID: unique file identifier

November 2005Distributed systems: general services64 Distributed file systems File service architecture Flat file service: operations –Read (FileID, Position, n)  Data –Write (FileID, Position, Data) –Create ()  FileID –Delete (FileID) –GetAttributes (FileId)  Attr –SetAttributes (FileID, Attr)

November 2005Distributed systems: general services65 Distributed file systems File service architecture Flat file service: fault tolerance –straightforward for simple servers –idempotent operations –stateless servers

November 2005Distributed systems: general services66 Distributed file systems File service architecture Directory service –translate file name in UFID –substitute for open –responsible for access control

November 2005Distributed systems: general services67 Distributed file systems File service architecture Directory service: operations –Lookup (Dir, Name, AccessMode, UserID)  UFID –AddName (Dir, Name, FileID, UserID) –UnName (Dir, Name) –ReName (Dir, OldName, NewName) –GetNames (Dir, Pattern)  list-of-names

November 2005Distributed systems: general services68 Distributed file systems File service architecture Implementation techniques: –known techniques from OS experience –remain important –distributed file service: comparable in performance reliability

November 2005Distributed systems: general services69 Distributed file systems File service architecture Implementation techniques: overview –file groups –space leaks –capabilities and access control –access modes –file representation –file location –group location –file addressing –caching

November 2005Distributed systems: general services70 Distributed file systems File service architecture Implementation techniques: file groups –(similar: file system, partition) =collection of files mounted on a server computer –unit of distribution over servers –transparent migration of file groups –once created file is locked in file group –UFID = file group identifier + file identifier

November 2005Distributed systems: general services71 Distributed file systems File service architecture Implementation techniques: space leaks –2 steps for creating a file create (empty) file and get new UFID enter name + UFID in directory –failure after step 1: file exists in file server unreachable: UFID not in any directory  lost space on disk –detection requires co-operation between file server directory server

November 2005Distributed systems: general services72 Distributed file systems File service architecture Implementation techniques: capabilities =digital key: access to resource granted on presentation of the capability –request to directory server: file name + user id + mode of access  UFID including permitted access modes –construction of UFID –unique –encode access –unforgeable

November 2005Distributed systems: general services73 Distributed file systems File service architecture Implementation techniques: capabilities File group idFile nrFile group idFile nrRandom nrFile group idFile nrRandom nr Access bits File group idFile nr Access bits Encrypted (access bits + random number) Reuse? Access check? Forgeable?

November 2005Distributed systems: general services74 Distributed file systems File service architecture Implementation techniques: file location –from UFID  location of file server –use of replicated group location database file group id, PortId –why replication? –why location not encoded in UFID?

November 2005Distributed systems: general services75 Distributed file systems File service architecture Implementation techniques: caching –server cache: reduce delay for disk I/O selecting blocks for release coherence: –dirty flags –write-through caching –client cache: reduce network delay always use write-through synchronisation problems with multiple caches

November 2005Distributed systems: general services76 Distributed file systems Overview –File service model –case study: Network File System -- NFS –case study: AFS –comparison NFS <> AFS

November 2005Distributed systems: general services77 Distributed file systems NFS Background and aims –first file service product –emulate UNIX file system interface –de facto standard key interfaces published in public domain source code available for reference implementation –supports diskless workstations not important anymore

November 2005Distributed systems: general services78 Distributed file systems NFS Design characteristics –client and server modules can be in any node Large installations include a few servers –Clients: On Unix: emulation of standard UNIX file system for MS/DOS, Windows, Apple, … –Integrated file and directory service –Integration of remote file systems in a local one: mount  remote mount

November 2005Distributed systems: general services79 Distributed file systems NFS: configuration Unix mount system call –each disk partition contains hierarchical FS –how integrate? Name partitions a:/usr/students/john glue partitions together –invisible for user –partitions remain useful for system managers

November 2005Distributed systems: general services80 Distributed file systems NFS: configuration Unix mount system call / (root) vmunixusr... studentsstaffx Root partition / (root) pierreann... networkproje Partition c Directory staff  root of c: /usr/staff/ann/network

November 2005Distributed systems: general services81 Distributed file systems NFS: configuration Remote mount / (root) vmunixusr... studentsstaffx client / (root) users... annpierre Server 1 Directory staff  users: /usr/staff/ann/...

November 2005Distributed systems: general services82 Distributed file systems NFS: configuration Mount service on server –enables clients to integrate (part of) remote file system in the local name space –exported file systems in /etc/exports + access list (= hosts permitted to mount; secure?) On client side –file systems to mount enumerated in /etc/rc –typically mount at start up time

November 2005Distributed systems: general services83 Distributed file systems NFS: configuration Mounting semantics –hard client waits until request for a remote file succeeds eventually forever –soft failure returned if request does not succeed after n retries breaks Unix failure semantics

November 2005Distributed systems: general services84 Distributed file systems NFS: configuration Automounter –principle: empty mount points on clients mount on first request for remote file –acts as a server for a local client gets references to empty mount points maps mount points to remote file systems referenced file system mounted on mount point via a symbolic link, to avoid redundant requests to automounter

November 2005Distributed systems: general services85 Distributed file systems NFS: implementation In UNIX: client and server modules implemented in kernel virtual file system: –internal key interface, based on file handles for remote files

November 2005Distributed systems: general services86 Distributed file systems NFS: implementation UNIX kernel Virtual file system UNIX file system NFS client User client process System call Client computer Net- work UNIX kernel Virtual file system UNIX file system NFS server server computer NFS protocol

November 2005Distributed systems: general services87 Distributed file systems NFS: implementation Virtual file system –Added to UNIX kernel to make distinction Local files Remote files –File handles: file ID in NFS Base: inode number –File ID in partition on UNIX system Extended with: –File system identifier –inode generation number (to enable reuse)

November 2005Distributed systems: general services88 Distributed file systems NFS: implementation Client integration: –NFS client module integrated in kernel offers standard UNIX interface no client recompilation/reloading single client module for all user level processes encryption in kernel server integration –only for performance reasons –user level = 80% of kernel level version

November 2005Distributed systems: general services89 Distributed file systems NFS: implementation Directory service –name resolution co-ordinated in client –step-by-step process for multi-part file names –mapping tables in server: high overhead reduced by caching Access control and authentication –based on UNIX user ID and group ID –included and checked for every NFS request –secure NFS 4.0 thanks to use of DES encryption

November 2005Distributed systems: general services90 Distributed file systems NFS: implementation Caching –Unix caching based on disk blocks delayed write (why?) read ahead (why?) periodically sync to flush buffers in cache –Caching in NFS Server caching Client caching

November 2005Distributed systems: general services91 Distributed file systems NFS: implementation Server caching in NFS –based on standard UNIX caching: 2 modes –write-through (instead of delayed write) failure semantics  performance  –delayed write Data stored in cache, till commit operation is received –Close on client  commit operation on server Failure semantics? Performance 

November 2005Distributed systems: general services92 Distributed file systems NFS: implementation Client caching –cached are results of read, write, getattr, lookup, readdir –problem: multiple copies of same data at different NFS clients –NFS clients use read-ahead and delayed write

November 2005Distributed systems: general services93 Distributed file systems NFS: implementation Client caching (cont.) –handling of writes block of file is fetched and updated changed block is marked as dirty dirty pages of files are flushed to server asynchronously –on close of file –sync operation on client –by bio-daemon (when block is filled) dirty pages of directories are flushed to server –by bio-daemon without further delay

November 2005Distributed systems: general services94 Distributed file systems NFS: implementation Client caching (cont.) –consistency checks based on time-stamps indicating last modification of file on server validation checks –when file is opened –when a new block is fetched –assumed to remain valid for a fixed time (3 sec for file, 30 sec for directory) –next operation causes another check costly procedure

November 2005Distributed systems: general services95 Distributed file systems NFS: implementation Caching –Cached entry is valid (T – Tc) < t or (Tm client = Tm server ) –T: current time –Tc: time when cache entry was last validated –t: freshness interval (3.. 30 secs in Solaris) –Tm: time when block was last modified at … –consistency level acceptable most UNIX applications do not depend critically on synchronisation of file updates

November 2005Distributed systems: general services96 Distributed file systems NFS: implementation Performance –reasonable performance remote files on fast disk better than local files on slow disk RPC packets are 9 Kb to contain 8Kb disk blocks –Lookup operation covers about 50% of server calls –Drawbacks: frequent getattr calls for time-stamps (cache validation) poor performance of (relative infrequent) writes (because of write-through)

November 2005Distributed systems: general services97 Distributed file systems Overview –File service model –case study: NFS –case study: Andrew File System -- AFS –comparison NFS <> AFS

November 2005Distributed systems: general services98 Distributed file systems AFS Background and aims –Base: observation of UNIX file systems files are small ( < 10 Kb) read is more common than write sequential access is common, random access is not most files are not shared shared files are often modified by one user file references come in bursts –Aim: combine best of personal computers and time-sharing systems

November 2005Distributed systems: general services99 Distributed file systems AFS Background and aims (cont.) –Assumptions about environment secured file servers public workstations workstations with local disk no private files on local disk –Key targets: scalability and security CMU 1991: 800 workstations, 40 servers

November 2005Distributed systems: general services100 Distributed file systems AFS Design characteristics –whole-file serving –whole-file caching  entire files are transmitted, not blocks –client cache realised on local disk (relatively large)  lower number of open requests on the network –separation between file and directory service

November 2005Distributed systems: general services101 Distributed file systems AFS Configuration –single (global) name space –local files temporary files system files for start-up –volume = unit of configuration and management –each server maintains a replicated location database (volume-server mappings)

November 2005Distributed systems: general services102 Distributed file systems AFS File name space / (root) binafs tmp local / (root) usersbin... annpierre shared Directory afs  root of shared file system –Symbolic link

November 2005Distributed systems: general services103 Distributed file systems AFS Implementation –Vice = file server secured systems, controlled by system management understands only file identifiers runs in user space –Venus = user/client software runs on workstations workstations keep autonomy implements directory services –kernel modifications for open and close

November 2005Distributed systems: general services104 Distributed file systems AFS User program Unix kernel Venus User program Unix kernel Venus Unix kernel Vice Unix kernel Vice workstations servers

November 2005Distributed systems: general services105 Distributed file systems AFS User program Unix kernel Venus Unix file system call Non-local file operations Unix file system

November 2005Distributed systems: general services106 Distributed file systems AFS Implementation of file system calls –open, close: UNIX kernel of workstation Venus (on workstation) VICE (on server) –read, write: UNIX kernel of workstation

November 2005Distributed systems: general services107 place copy of file in local file system and store FileName open local file and return descriptor to user process Distributed file systems AFS Open system call open( FileName,…) if FileName in shared space /afs then pass request to Venus kernel if file not present in local cache OR file present with invalid callback then pass request to Vice server venus network transfer copy of file and valid callback vice

November 2005Distributed systems: general services108 Distributed file systems AFS Read Write system call perform normal UNIX read op local copy of file kernel read( FileDescriptor,…) ….

November 2005Distributed systems: general services109 Distributed file systems AFS Close system call close local copy and inform Venus about close kernel if local copy is changed then send copy to Vice server venus network Store copy of file and send callback to other clients holding callback promise vice close( FileDescriptor,…)...

November 2005Distributed systems: general services110 Distributed file systems AFS Caching –callback principle service supplies callback promise at open promise is stored with file in client cache –callback promise can be valid or cancelled initially valid server sends message to cancel callback promise –to all clients that cache the file –whenever file is updated

November 2005Distributed systems: general services111 Distributed file systems AFS Caching maintenance –when client workstation reboots cache validation necessary because of missed messages cache validation requests are sent for each valid promise –valid callback promises are renewed on open when no communication has occurred with the server during a period T

November 2005Distributed systems: general services112 Distributed file systems AFS Update semantics –guarantee after successful open of file F on server S: latest(F, S, 0) or lostcallback(S, T) and incache(F) and latest(F, S, T) –no other concurrency control 2 copies can be updated at different workstations all updates except from last close are (silently) lost <> normal UNIX operation

November 2005Distributed systems: general services113 Distributed file systems AFS Performance: impressive <> NFS –benchmark: load on server 40% AFS 100% NFS –whole-file caching: reduction of load on servers minimises effect of network latency –read-only volumes are replicated (master copy for occasional updates) Andrew optimised for a specific pattern of use!!

November 2005Distributed systems: general services114 Distributed file systems Overview –File service model –case study: NFS –case study AFS –comparison NFS <> AFS

November 2005Distributed systems: general services115 Distributed file systems NFS <>AFS Access transparency –both in NFS and AFS –Unix file system interface is offered Location transparency –uniform view on shared files in AFS –in NFS mounting freedom same view possible if same mounting; discipline!

November 2005Distributed systems: general services116 Distributed file systems NFS <>AFS Failure transparency –NFS no state of clients stored in servers idempotent operations transparency limited by soft mounting –AFS state about clients stored in servers cache maintenance protocol handles server crashes limitations?

November 2005Distributed systems: general services117 Distributed file systems NFS <>AFS Performance transparency –NFS acceptable performance degradation –AFS only delay for first open operation on file better than NFS for small files

November 2005Distributed systems: general services118 Distributed file systems NFS <>AFS Migration transparency –limited: update of locations required –NFS update configuration files on all clients –AFS update of replicated database

November 2005Distributed systems: general services119 Distributed file systems NFS <>AFS Replication transparency –NFS not supported –AFS limited support; for read-only volumes one master for exceptional updates; manual procedure for propagating changes to other volumes

November 2005Distributed systems: general services120 Distributed file systems NFS <>AFS Concurrency transparency –not supported (in UNIX) Scalability transparency –AFS better than NFS

November 2005Distributed systems: general services121 Distributed file systems Overview –File service architecture –case study: NFS –case study AFS –comparison NFS <> AFS

November 2005Distributed systems: general services122 Distributed file systems Conclusions –Standards <> quality –evolution to standards and common key interface AFS-3 incorporates Kerberos, vnode interface, large messages for file blocks (64Kb) –network (mainly access) transparency causes inheritance of weakness: e.g. no concurrency control –evolution mainly performance driven

November 2005Distributed systems: general services123 Distributed file systems Good system design: –partitioning different storage volumes –replication of servers high availability limited inconsistencies load balancing? server preference? –caching acceptable performance limited inconsistencies

November 2005Distributed systems: general services125 Peer-to-peer systems Overview –Introduction –Napster –Middleware –Routing algorithms –OceanStore –Pond file store

November 2005Distributed systems: general services126 Peer-to-peer systems Introduction Definition 1: –Support useful distributed services –Using data and computer resources in the PCs and workstations available on the Internet Definition 2: –Applications exploiting resources available at the edges of the Internet

November 2005Distributed systems: general services127 Peer-to-peer systems Introduction Characteristics: –Each user contributes resources to the system –All nodes have the same functional capabilities and responsibilities –Correct operation does not depend on the existence of any centrally administered systems –Can offer limited degree of anonymity –Efficient algorithms for the placement of data across many hosts and subsequent access –Efficient algorithms for load balancing and availability of data Availability of participating computers is unpredictabel

November 2005Distributed systems: general services128 Peer-to-peer systems Introduction 3 generations of peer-to-peer systems and application development: –Napster: music exchange service –File-sharing applications offering greater scalability, anonymity and fault tolerance: Freenet, Gnutella, Kazaa –Middleware for the application-independent management of distributed resources on a global scale: Pastry, Tapestry,…

November 2005Distributed systems: general services129 Peer-to-peer systems Napster Download digital music files Architecture: –Centralised indices –Files stored and accessed on PCs Method of operation (next slide)

November 2005Distributed systems: general services130 Peer-to-peer systems Napster – step 5: file sharing expected!

November 2005Distributed systems: general services131 Peer-to-peer systems Napster Conclusion: –Feasibility demonstrated –Simple load balancing techniques –Replicated unified index of all available music files –No updates of files –Availability not a hard concern –Legal issues: anonymity

November 2005Distributed systems: general services133 Peer-to-peer systems middleware Goal: –Automatic placement –Subsequent location of distributed objects Functional requirements: –Locate and communicate with any resource –Add and remove resources –Add and remove hosts –API independent of type of resource }

November 2005Distributed systems: general services134 Peer-to-peer systems middleware Non-functional requirements: –Global scalability (millions of object on hundreds of thousands of hosts) –Optimization for local interactions between neighbouring peers –Accommodating to highly dynamic host availability –Security of data in an environment with heterogeneous trust –Anonymity, deniability and resistance to censorship

November 2005Distributed systems: general services135 Peer-to-peer systems middleware Approach: –Knowledge of locations of objects Partitioned Distributed Replicated (x16) throughout the network ═Routing overlay: Distributed algorithm for locating objects and nodes

November 2005Distributed systems: general services136 Peer-to-peer systems middleware

November 2005Distributed systems: general services137 Peer-to-peer systems middleware Resource identification: –GUID = globally unique identifier –Secure hash from all state –self certifying –for immutable objects part of state –e.g. name + …

November 2005Distributed systems: general services138 Peer-to-peer systems middleware API for a DHT in Pastry DHT = distributed hash table put(GUID, data) The data is stored in replicas at all nodes responsible for the object identified by GUID. remove(GUID) Deletes all references to GUID and the associated data. value = get(GUID) The data associated with GUID is retrieved from one of the nodes responsible it.

November 2005Distributed systems: general services139 Peer-to-peer systems middleware API for a DOLR in Tapestry DOLR = distributed object location and routing publish(GUID) This function makes the node performing a publish operation the host for the object corresponding to GUID. unpublish(GUID) Makes the object corresponding to GUID inaccessible. sendToObj(msg, GUID, [n]) an invocation message is sent to an object in order to access it. The optional parameter [n], requests the delivery of the same message to n replicas of the object.

November 2005Distributed systems: general services141 Peer-to-peer systems Routing algorithms prefix routing –Used in Pastry, Tapestry GUID: 128-bit –Host: secure hash on public key –Object: secure hash on name or part of state N participating nodes –O(log N) steps to route a message to any GUID –O(log N) messages to integrate a new node

November 2005Distributed systems: general services142 Peer-to-peer systems Routing algorithms Simplified algorithm: circular routing –Leaf set in each active node –Element in leaf set: GUID – IP address of nodes whose GUIDS are numerically closest on either side of its own GIUD –Size L = 2*l –Leaf sets are updated when nodes Join Leave

November 2005Distributed systems: general services143 Peer-to-peer systems Routing algorithms Circular space for GUIDs Dot = live node Leaf size = 8 Shown: routing of a message from node 65A1FC to D46A1C

November 2005Distributed systems: general services144 Peer-to-peer systems Routing algorithms Full algorithm (Pastry) –Tree-structured routing table GUID – IP pairs spread throughout entire range of GUIDs Increased density of coverage for GUIDs numerically close to GUID of local node GUIDs viewed as hexadecimal values Table classifies GUIDs based on hexadecimal prefices As many rows as hexadecimal digits

November 2005Distributed systems: general services145 Peer-to-peer systems Routing algorithms Full algorithm (Pastry)

November 2005Distributed systems: general services146 Peer-to-peer systems Routing algorithms Routing a message from 65A1FC to D46A1C Well-populated table: ~log 16 (N) hops

November 2005Distributed systems: general services147 Peer-to-peer systems Routing algorithms Full algorithm (Pastry)

November 2005Distributed systems: general services148 Peer-to-peer systems Routing algorithms Full algorithm (Pastry): host integration –New node X: computes GUID X X should know at least one nearby Pastry node A with GUID A X sends special join message to A (destination GUID X ) –A despatches the join message via Pastry Route of message: A, B, ….Z (GUID Z  GUID X ) –Each node sends relevant part of its routing table and leaf sets to X

November 2005Distributed systems: general services149 Peer-to-peer systems Routing algorithms Full algorithm (Pastry): host integration –Route of message: A, B, ….Z –Routing table of X: Row 0 of A  row 0 of X B and X share hexadecimal digits: so row i of B  row I of X … Choice for entry? proximity neighbour selection algorithm (metric: IP hops, measured latency,…) –Leaf set of Z  leaf set of X –X sends its leaf set and routing table to all nodes in its routing table and leaf set

November 2005Distributed systems: general services150 Peer-to-peer systems Routing algorithms Full algorithm (Pastry): host failure –Repair actions to update leafsets: leaf set requested from node(s) close to failed node –Routing table: repairs on a “when discovered” basis

November 2005Distributed systems: general services151 Peer-to-peer systems Routing algorithms Full algorithm (Pastry): Fault tolerance –All nodes in leaf set life? Each node sends heartbeat messages to nodes in leaf set –Malicious nodes? Clients can use at least once delivery mechanism Use limited randomness in node selection in routing algorithm

November 2005Distributed systems: general services152 Peer-to-peer systems Routing algorithms Full algorithm (Pastry): dependability –Include acks in routing algorithm: No ack alternative route node: suspected failure –Heartbeat messages –Suspected failed nodes in routing tables Probed Fail: alternative nodes –Simple Gossip protocol to exchange routing table info between nodes

November 2005Distributed systems: general services153 Peer-to-peer systems Routing algorithms Full algorithm (Pastry): evaluation –Message loss –Performance: Relative Delay Penalty (RDP) =Pastry delivery / IP/UDP delivery IP lossPastry lossPastry wrong nodeRDP 100. 000 messages 0%1.501.8 5%3.31.62.2

November 2005Distributed systems: general services155 Distributed file systems OceanStore Objectives: –Very large scale, incrementally-scalable, persistent storage facility –Mutable data objects with long-term persistence and reliability –Environment of network and computing resources constantly changing Intended use Mechanisms used

November 2005Distributed systems: general services156 Distributed file systems OceanStore Intended use: –NFS-like file system –Electronic mail hosting Mechanism needed: –Consistency between replicas Tailored to application needs by a Bayou like system –Privacy and integrity Encryption of data Byzantine agreement protocol for updates

November 2005Distributed systems: general services157 Distributed file systems OceanStore Storage organization d1d2d3d5d4 root block version i indirection blocks d2 version i+1 d1d3 certificate VGUID of current version VGUID of version i AGUID VGUID of version i-1 data blocks BGUID (copy on write)

November 2005Distributed systems: general services158 Distributed file systems OceanStore Storage organization (cont) NameMeaningDescription BGUIDblock GUIDSecure hash of a data block VGUIDversion GUIDBGUID of the root block of a version AGUIDactive GUIDUniquely identifies all the versions of an object

November 2005Distributed systems: general services159 Distributed file systems OceanStore Storage organization (cont) –New object  AGUID small set of hosts act as inner ring publish (AGUID) Refers to signed certificate recording the sequence of versions of the object –Update of object Byzantine agreement between host in inner ring (primary copy) Result disseminated to secondary replicas

November 2005Distributed systems: general services160 Distributed file systems OceanStore Performance LANWANPredominant operations in benchmark PhaseLinux NFSPondLinux NFSPond 10.01.90.92.8Read and write 20.311.09.416.8Read and write 31.11.88.31.8Read 40.51.56.91.5Read 52.621.021.532.0Read and write Total4.537.247.054.9

November 2005Distributed systems: general services161 Distributed file systems OceanStore Performance (cont) –Benchmarks: Create subdirectories recursively Copy a source tree Examine the status of all files in the tree Examine every byte of data in all files Compile and link the files

November 2005Distributed systems: general services163 Distributed Systems: General Services

November 2005Distributed systems: general services1 Distributed Systems: General Services.

Similar presentations

Presentation on theme: "November 2005Distributed systems: general services1 Distributed Systems: General Services."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

November 2005Distributed systems: general services1 Distributed Systems: General Services.

Similar presentations

Presentation on theme: "November 2005Distributed systems: general services1 Distributed Systems: General Services."— Presentation transcript:

Similar presentations

About project

Feedback