CS294, YelickDataStructs, p1 CS 294-8 Distributed Data Structures

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Distributed Storage March 12, Distributed Storage What is Distributed Storage?  Simple answer: Storage that can be shared throughout a network.
Distributed Systems 2006 Styles of Client/Server Computing.
CS 582 / CMPE 481 Distributed Systems Concurrency Control.
Distributed Database Management Systems
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
Overview Distributed vs. decentralized Why distributed databases
Reliability and Partition Types of Failures 1.Node failure 2.Communication line of failure 3.Loss of a message (or transaction) 4.Network partition 5.Any.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Communication in Distributed Systems –Part 2
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Wide-area cooperative storage with CFS
Presentation on Clustering Paper: Cluster-based Scalable Network Services; Fox, Gribble et. al Internet Services Suman K. Grandhi Pratish Halady.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Lecture 8 Epidemic communication, Server implementation.
Distributed Databases
File Systems (2). Readings r Silbershatz et al: 11.8.
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Commit Protocols. CS5204 – Operating Systems2 Fault Tolerance Causes of failure: process failure machine failure network failure Goals : transparent:
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
AN OPTIMISTIC CONCURRENCY CONTROL ALGORITHM FOR MOBILE AD-HOC NETWORK DATABASES Brendan Walker.
Networked File System CS Introduction to Operating Systems.
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
5.1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.
Global State (1) a)A consistent cut b)An inconsistent cut.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Databases Illuminated
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Wide Area Events Using DDS We have a model that can efficiently support a family of applications, Publish-Subscribe-Notify. To realize this model, we implemented.
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
CS 540 Database Management Systems
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Transactions on Replicated Data Steve Ko Computer Sciences and Engineering University at Buffalo.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
DStore: An Easy-to-Manage Persistent State Store Andy Huang and Armando Fox Stanford University.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Contents. Goal and Overview. Ingredients. The Page Model.
Computational Models Database Lab Minji Jo.
Noah Treuhaft UC Berkeley ROC Group ROC Retreat, January 2002
CHAPTER 3 Architectures for Distributed Systems
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Distributed P2P File System
The Single Node B-tree for Highly Concurrent Distributed Data Structures by Barbara Hohlt 11/23/2018.
Outline Announcements Fault Tolerance.
EEC 688/788 Secure and Dependable Computing
From Viewstamped Replication to BFT
EEC 688/788 Secure and Dependable Computing
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
EEC 688/788 Secure and Dependable Computing
Distributed Systems (15-440)
The Gamma Database Machine Project
Distributed Databases
Presentation transcript:

CS294, YelickDataStructs, p1 CS Distributed Data Structures

CS294, YelickDataStructs, p2 Agenda Overview Interface Issues Implementation Techniques Fault Tolerance Performance

CS294, YelickDataStructs, p3 Overview Distributed data structures are an obvious abstraction for distributed systems. Right? What do you want to hide within one? –Data layout? –When communication is required? –# and location of replicas –Load balancing

CS294, YelickDataStructs, p4 Distributed Data Structures Most of these are containers Two fundamentally difference kinds: –Those with integrators or ability to look at all container elements Arrays, meshes, databases*, graphs* and trees* (sometimes) –Those with only single element ops Queue, directory (hash table or tree), all *’d items above

CS294, YelickDataStructs, p5 DDS in Ninja Described in Gribble, Brewer, Hellerstein, Culler A distributed data structure (DDS) is a self-managing layer for persistent data. –High availability, concurrency, consistency, durability, fault tolerance, scalability A distributed hash table is an example –Uses two-phase commits for consistency –Partitioning for scalability

CS294, YelickDataStructs, p6 Scheduling Structures In serial code, most scheduling is done with a stack (often implicit), a FIFO queue, or a priority queue Do all of these makes sense in a distributed setting? Are there others?

CS294, YelickDataStructs, p7 Distributed Queues Load balancing (work stealing…) –Push new work onto a stack –Execute locally by popping from the stack –Steal remotely by removing from the bottom of the stack (FIFO)

CS294, YelickDataStructs, p8 Interfaces (1) Blocking atomic interfaces: operations happen between invocation and return –Internally each operation performs locking or other form of synchronization Non-blocking “atomic” interfaces: operation happens sometime after invocation –Often paired with completion synchronization Request/response for each operation Wait for all “my” operations to complete Wait for all operations in the world to complete

CS294, YelickDataStructs, p9 Interfaces (2) Non-atomic interface: use external synchronization –Undefined under certain kinds (or all) concurrency –May be paired with bracketing synchronization Aquire-insert-lock, insert, insert, Release-insert-lock Begin-transaction… Operations with no semantics (no-ops) –Prefetch, Flush copies, … Operations that allow for failures –Signal “failed”

CS294, YelickDataStructs, p10 DDS Interfaces Contrast: –RDBMS’s provide ACID semantics on transactions –Distributed files systems: NFS weak, Frangipani and AFS stronger DDS: –All operations on elements are atomic (indivisible, all or nothing) This seems to mean that the hash table operations that involve a single element are atomic –One-copy equivalence: replication of elements is invisible –No transaction across elements or operations

CS294, YelickDataStructs, p11 Implementation Strategies (1) Two simple techniques –Partitioning: Used when the d.s. is large Used when writes/updates are frequent –Replication: Used when writes are infrequent and reads are very frequent Used to tolerate failures Full static replication is extreme; dynamic partial replication is more common Many hybrids and variations

CS294, YelickDataStructs, p12 Implementation Strategies (2) Moving data to computation good for: –dynamic load balancing I.e., idle processors grab work –smaller objects in ops involving > 1 object Moving computation to data good for: –large data structures Other?

CS294, YelickDataStructs, p13 DDS: Distributed Hash Table Operations include: –Create, Destroy –Put, Get, and Remove Built with storage “bricks” –Each manage a single node, network-visible hash table –Contain a buffer cache, lock manager, network stubs and skeletons Data is partitioned, and partitions are replicated –Replica groups are used for each partition

CS294, YelickDataStructs, p14 DDS: Distributed Hash Table Operations on elements: –Get – use any replica in appropriate group –Put or remove – update all replicas in group using two-phase commit DDS library is commit coordinator If individual node crashes during commit phase, it is removed from replica If DDS fails during commit phase, individual nodes will coordinate: if any have committed, all must

CS294, YelickDataStructs, p15 DDS: Hash Table RG nameRG members 000dds1,dds2 100dds2 10dds5,dds4 01dds7 011dds5,dds3 111dds2 Key: DP map RG map

CS294, YelickDataStructs, p16 Example: Aleph Directory Maps names to mobile objects –Files, locks (?), processes,… Interested in performance at scale, not reliability Two basic protocols: –Home: each object has a fixed “home” PE that keeps track of cache copies –Arrow: based on path-reversal idea

CS294, YelickDataStructs, p17 Path Reversal Find

CS294, YelickDataStructs, p18 Path Reversal

CS294, YelickDataStructs, p19 Aleph Directory Performance Aleph is implemented as Java packages on top of RMI (and UDP?) Run on small systems (up to 16 nodes) –Assumed that “home” centralized solution would be faster at this scale 2 messages to request; 2 to retrieve –Arrow was actually faster Log 2 p to request; 1 to retrieve In practice, only 2 to request (counter ex.)

CS294, YelickDataStructs, p20 Hybrid Directory Protocol Essentially the same as the “home” protocol, except Link waiting processors into a chain (across the processors) –Each keeps the id of the processor ahead of it in the chain Under high contention, resource moves down the chain Performance: –Faster than home and arrow on counter benchmark and some others…

CS294, YelickDataStructs, p21 How Many Data Structures? Gribble et al claim: –“We believe that given a small set of DDS types (such as a hash table, a tree, and an administrative log), authors will be able to build a large class of interesting and sophisticated servers.” –Do you believe this? –What does it imply about tools vs. libraries?

CS294, YelickDataStructs, p22 Administrivia Gautam Kar and Joe L. Hellerstein speaking Thursday –Papers online –Contact me about meeting with them Final projects: –Send mail to schedule meeting with me Next week: –Tuesday: guest lecture by Aaron Brown on benchmarks; related to Kar and Hellerstein work. –Still to come: Gray, Lamport, and Liskov