ScaLAB seminar 21st October Intrinsic References in Distributed Systems Presented by: Nimish Pachapurkar
ScaLAB seminar 21st October Snapshot: To contrast and compare Intrinsic References with Physical References. Storage and Retrieval mechanism using intrinsic references : Elephant Store Use of intrinsic references in Hierarchical data structures Terminology: Collision resistance: Extremely difficult to find two sequences with same hash. Implies that hash is unique (sufficiently so…) One-way hash: Given a hash of a sequence it is difficult to reconstruct the sequence. Reference => Hash AND Referent => byte sequence (ex. Memory addresses and data, URLs and web pages etc.)
ScaLAB seminar 21st October Physical References – Relationship between reference and referent is defined by state of the physical system. Change in the state changes the referent. All accesses to referent have to be through the system. Bottleneck and potential failure point Intrinsic References - Collision resistant (unique) and one-way hash value State Independence: The relationship between S and R depends only on the hash function. Uniqueness: A given R refers only to a particular S from which it was obtained. Physical storage is still required to store/retrieve the referents.
ScaLAB seminar 21st October Intrinsic References and Distributed Storage – Useful for Distributed, replicated storage mechanism. No reference-referent inconsistency (hash gives the reference) Simple hashing can check for the correctness of the data Opaque Storage – Used for storing an instance of a data structure in Elephant Store Serialize the data structure, store the byte sequence. Called OPAQUE representation as data structure is hidden behind the byte sequence. Hash of the sequence is the reference (digest). Retrieval: Retrieve the byte sequence from store, de-serialize Opaque Reference (Hash digest) Data Structure Serialization (makes the structure opaque)
ScaLAB seminar 21st October HDAGs – Hash based Acyclic Directed Graph. Nodes are directories arcs are directory – sub-directory relationships. Root digest of a rooted HDAG is used as intrinsic reference to the whole HDAG. Application: Can be used to represent a file system or mail system. Root digest uniquely represents the state of whole directory structure and not just the root directory
ScaLAB seminar 21st October Versions and Change (Problems with OR) – For a file system, example of Opaque representation is a tarball of the directory structure. Change in any file will cause the opaque representation to change. Hash digest also changes. There is no relationship between the old and new representations. Solution: Use HDAGs Adding a file to a directory is same as a new mail in Inbox. The representation of all other files & directories is not changed. Efficient than Opaque Rep. Saves communication cost among replicas for distributed storages.
ScaLAB seminar 21st October Advantages of HDAGs – Efficient for Distributed systems (version management) Every version is represented by a unique intrinsic reference which is independent of physical system. Replication and caching will never lead to inconsistencies Two versions of an object are represented by sharing majority of the storage and communication costs. Conclusions – HDAGs promise to be a useful mechanism for building and maintaining distributed storage systems.
ScaLAB seminar 21st October OS Support for P2P Programming: a Case for TPS Presented by: Nimish Pachapurkar
ScaLAB seminar 21st October Introduction – Need for RPC-like interaction mechanism for P2P infrastructures Must be decoupled Anonymous and asynchronous Layers over RPC would certainly hamper performance Type based Publish/Subscribe as a candidate Abstraction of low-level P2P library – JXTA What’s in the paper: Comparison of the implementation of TPS with pure JXTA A “first” experience Design and source code of applications
ScaLAB seminar 21st October JXTA Three layers Core Layer: Several protocols ensuring basic communication between peers, message routing or peer group creation Service Layer: Ready-made services such as content management system and wire service Application Layer: All the code written by the programmer Six concepts: ID: for any resource (peer, pipe, peergroup, codat) Peer: Any device with an electronic pulse (normal and special) Rendez-vous and routers Pipe: Virtual communication channel – asynchronous and uni- directional (wire for many-to-many) – independent of IP PeerGroup: Collection of peers Advertisement: XML msg with information about new resource Message: Any kind of communication (using XML)
ScaLAB seminar 21st October Protocols for JXTA – PDP – Peer Discovery Protocol Allows different peers to find each other PRP - Peer Resolver Protocol Just above the transport layer, dispatches JXTA message to right service PIP – Peer Information Protocol Know the status of a peer. (time the peer was up, channels available) PMP – Peer Membership Protocol Obtain group membership requirements information (credentials, password, etc.) PBP – Peer Binding Protocol Keeps different peers in a pipe bound together (even when they move) ERP – Endpoint Routing Protocol For routing messages between the peers Enables communication between 2 peers even when they do not know how to connect to each other (due to Firewall etc.)
ScaLAB seminar 21st October TPS over JXTA – Publish/Subscribe paradigm Time decoupling: Publisher and Subscriber do not need to be up at the same time Space decoupling: Publisher and Subscriber do not need to know each other Flow decoupling: Sending or receiving of messages do not block the participants. This decoupling suits the server-less architectures. Subscription based on Subject and Content Type-based: Subject => Event object type Content => State of instance of that type Type safety Subscriber knows event type in advance
ScaLAB seminar 21st October Example – Ski renting application Need to find ski rentals with reasonable rates Must surf the net for a long time Alternative: Use the TPS based P2P infrastructure Subscribe to ski-rental type and wait for answers Publisher: (A new shop is opened) Search launched for ski-rental advertisement If not found, a new one is created Programming phases –
ScaLAB seminar 21st October Performance – Invocation time Time for sendMessage() Publisher produces 50 evts JXTA-WIRE is quicker No difference between SR-JXTA and SR-TPS Throughput: Similar trends! Conclusion- TPS is a viable alternative abstraction to RPC for future Internet- wide Operating Systems to support P2P applications Simple to use, type-safe, preserves decoupled nature of P2P. Makes programming easier than with pure JXTA.