November 17, 2015Department of Computer Sciences, UT Austin1 SDIMS: A Scalable Distributed Information Management System Praveen Yalagandula Mike Dahlin.

Slides:



Advertisements
Similar presentations
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Advertisements

Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Internet Indirection Infrastructure (i3 ) Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana UC Berkeley SIGCOMM 2002 Presented by:
Tapestry: Decentralized Routing and Location SPAM Summer 2001 Ben Y. Zhao CS Division, U. C. Berkeley.
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
SCRIBE A large-scale and decentralized application-level multicast infrastructure.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
1 Canon in G Major: Designing DHTs with Hierarchical Structure Prasanna Ganesan (Stanford University) Krishna Gummadi (U. of Washington) Hector Garcia-Molina.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Internet Indirection Infrastructure Ion Stoica UC Berkeley.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Presented by.
Efficient, Proximity-Aware Load Balancing for DHT-Based P2P Systems Yingwu Zhu, Yiming Hu Appeared on IEEE Trans. on Parallel and Distributed Systems,
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud Alexander G. Connor Panos K. Chrysanthis Alexandros Labrinidis Advanced Data Management.
A Scalable and Load-Balanced Lookup Protocol for High Performance Peer-to-Peer Distributed System Jerry Chou and Tai-Yi Huang Embedded & Operating System.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.
SCALLOP A Scalable and Load-Balanced Peer- to-Peer Lookup Protocol for High- Performance Distributed System Jerry Chou, Tai-Yi Huang & Kuang-Li Huang Embedded.
1 Canon in G Major: Designing DHTs with Hierarchical Structure Prasanna Ganesan (Stanford University) Krishna Gummadi (U. of Washington) Hector Garcia-Molina.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Or, Providing High Availability and Adaptability in a Decentralized System Tapestry: Fault-resilient Wide-area Location and Routing Issues Facing Wide-area.
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
An Evaluation of Scalable Application-level Multicast Using Peer-to-peer Overlays Miguel Castro, Michael B. Jones, Anne-Marie Kermarrec, Antony Rowstron,
Mario Čagalj supervised by prof. Jean-Pierre Hubaux (EPFL-DSC-ICA) and prof. Christian Enz (EPFL-DE-LEG, CSEM) Wireless Sensor Networks:
Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.
Internet Indirection Infrastructure (i3) Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana UC Berkeley SIGCOMM 2002.
SIMULATING A MOBILE PEER-TO-PEER NETWORK Simo Sibakov Department of Communications and Networking (Comnet) Helsinki University of Technology Supervisor:
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Towards Efficient Load Balancing in Structured P2P Systems Yingwu Zhu, Yiming Hu University of Cincinnati.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
1 A Scalable Information Management Middleware for Large Distributed Systems Praveen Yalagandula HP Labs, Palo Alto Mike Dahlin, The University of Texas.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
5.1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Overlay network concept Case study: Distributed Hash table (DHT) Case study: Distributed Hash table (DHT)
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Energy conservation in Wireless Sensor Networks Sagnik Bhattacharya, Tarek Abdelzaher University of Virginia, Department of Computer Science School of.
15 November 2005LCN Collision Detection and Resolution in Hierarchical Peer-to-Peer Systems Verdi March 1, Yong Meng Teo 1,2, Hock Beng Lim 2, Peter.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
Peer to Peer Network Design Discovery and Routing algorithms
January 19, 2016Department of Computer Sciences, UT Austin1 Shruti: Dynamically Adapting Aggregation Aggressiveness Praveen Yalagandula Mike Dahlin The.
An Architectural Approach to Managing Data in Transit Micah Beck Director & Associate Professor Logistical Computing and Internetworking Lab Computer Science.
NCLAB 1 Supporting complex queries in a distributed manner without using DHT NodeWiz: Peer-to-Peer Resource Discovery for Grids Sujoy Basu, Sujata Banerjee,
TreeCast: A Stateless Addressing and Routing Architecture for Sensor Networks Santashil PalChaudhuri, Shu Du, Ami K. Saha, and David B. Johnson Department.
Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
1 Along & across algorithm for routing events and queries in wireless sensor networks Tat Wing Chim Department of Electrical and Electronic Engineering.
A Case Study in Building Layered DHT Applications
Internet Indirection Infrastructure (i3)
Distributed Hash Tables
COS 461: Computer Networks
Improving and Generalizing Chord
Accessing nearby copies of replicated objects
Zhichen Xu, Mallik Mahalingam, Magnus Karlsson
COS 461: Computer Networks
EE 122: Lecture 22 (Overlay Networks)
Design and Implementation of OverLay Multicast Tree Protocol
Presentation transcript:

November 17, 2015Department of Computer Sciences, UT Austin1 SDIMS: A Scalable Distributed Information Management System Praveen Yalagandula Mike Dahlin Laboratory for Advanced Systems Research (LASR) University of Texas at Austin

November 17, 2015Department of Computer Sciences, UT Austin2 Goal  A Distributed Operating System Backbone  Information collection and management  Core functionality of large distributed systems  Monitor, query and react to changes in the system  Examples:  Benefits  Ease development of new services  Facilitate deployment  Avoid repetition of same task by different services  Optimize system performance System administration and management Service placement and location Sensor monitoring and control Distributed Denial-of-Service attack detection File location service Multicast tree construction Naming and request routing …………

November 17, 2015Department of Computer Sciences, UT Austin3 Contributions – SDIMS  Provides an important basic building block  Information collection and management  Satisfies key requirements  Scalability With both nodes and attributes Leverage Distributed Hash Tables (DHT)  Flexibility Enable applications to control the aggregation Provide flexible API: install, update and probe  Autonomy Enable administrators to control flow of information Build Autonomous DHTs  Robustness Handle failures gracefully Perform re-aggregation upon failures –Lazy (by default) and On-demand (optional)

November 17, 2015Department of Computer Sciences, UT Austin4 Outline  SDIMS: a distributed operating system backbone  Aggregation abstraction  Our approach  Design  Prototype  Simulation and experimental results  Conclusions

November 17, 2015Department of Computer Sciences, UT Austin5 Aggregation Abstraction  Attributes  Information at machines  Aggregation tree  Physical nodes are leaves  Each virtual node represents a logical group of nodes Administrative domains, groups within domains, etc.  Aggregation function, f, for attribute A  Computes the aggregated value A i for level-i subtree A 0 = locally stored value at the physical node or NULL A i = f(A i-1 0, A i-1 1, …, A i-1 k ) for virtual node with k children  Each virtual node is simulated by one or more machines  Example: Total users logged in the system  Attribute: numUsers  Aggregation Function: Summation a b c d A0A0 A1A1 A2A2 f(a,b) f(c,d) f(f(a,b), f(c,d)) Astrolabe [VanRenesse et al TOCS’03]

November 17, 2015Department of Computer Sciences, UT Austin6 Outline  SDIMS: a distributed operating system backbone  Aggregation abstraction  Our approach  Design  Prototype  Simulation and experimental results  Conclusions

November 17, 2015Department of Computer Sciences, UT Austin7 Scalability with nodes and attributes  To be a basic building block, SDIMS should support  Large number of machines Enterprise and global-scale services Trend: Large number of small devices  Applications with a large number of attributes Example: File location system –Each file is an attribute –Large number of attributes  Challenges: build aggregation trees in a scalable way  Build multiple trees Single tree for all attributes  load imbalance  Ensure small number of children per node in the tree Reduces maximum node stress

November 17, 2015Department of Computer Sciences, UT Austin8 Building aggregation trees: Exploit DHTs  A DHT can be viewed as multiple aggregation trees  Distributed Hash Tables (DHTs)  Each node assigned an ID from large space (160-bit)  For each prefix (k), each node has a link to another node Matching k bits Not matching on the k+1 bit  Each key from ID space mapped to a node with closest ID  Routing for a key: Bit correction Example: from 0101 for key  1110  1000  1000  1001  Lots of different algorithms with different properties Pastry, Tapestry, CAN, CHORD, SkipNet, etc. Load-balancing, robustness, etc.  A DHT can be viewed as a mesh of trees  Routes from all nodes to a particular key form a tree  Different trees for different keys

November 17, 2015Department of Computer Sciences, UT Austin9 DHT trees as aggregation trees  Key xx 11x 111

November 17, 2015Department of Computer Sciences, UT Austin10 DHT trees as aggregation trees  Keys 111 and xx 11x xx 00x 000

November 17, 2015Department of Computer Sciences, UT Austin11 API – Design Goals  Expose scalable aggregation trees from DHT  Flexibility: Expose several aggregation mechanisms  Attributes with different read-to-write ratios “CPU load” changes often: a write-dominated attribute –Aggregate on every write  too much communication cost “NumCPUs” changes rarely: a read-dominated attribute –Aggregate on reads  unnecessary latency  Spatial and temporal heterogeneity Non-uniform and changing read-to-write rates across tree Example: a multicast session with changing membership  Support sparse attributes of same functionality efficiently  Examples: file location, multicast, etc.  Not all nodes are interested in all attributes

November 17, 2015Department of Computer Sciences, UT Austin12 Design of Flexible API  New abstraction: separate attribute type from attribute name  Attribute = (attribute type, attribute name)  Example: type=“fileLocation”, name=“fileFoo”  Install: an aggregation function for a type  Amortize installation cost across attributes of same type  Arguments up and down control aggregation on update  Update: the value of a particular attribute  Aggregation performed according to up and down  Aggregation along tree with key=hash(Attribute)  Probe: for an aggregated value at some level  If required, aggregation done to produce this result  Two modes: one-shot and continuous

November 17, 2015Department of Computer Sciences, UT Austin13 Scalability and Flexibility  Install time aggregation and propagation strategy  Applications can specify up and down  Examples Update-Local up=0, down=0 Update-All up=ALL, down=ALL Update-Up up=ALL, down=0  Spatial and temporal heterogeneity  Applications can exploit continuous mode in probe API To propagate aggregate values to only interested nodes With expiry time, to propagate for a fixed time  Also implies scalability with sparse attributes

November 17, 2015Department of Computer Sciences, UT Austin14 Scalability and Flexibility  Install time aggregation and propagation strategy  Applications can specify up and down  Examples Update-Local up=0, down=0 Update-All up=ALL, down=ALL Update-Up up=ALL, down=0  Spatial and temporal heterogeneity  Applications can exploit continuous mode in probe API To propagate aggregate values to only interested nodes With expiry time, to propagate for a fixed time  Also implies scalability with sparse attributes

November 17, 2015Department of Computer Sciences, UT Austin15 Scalability and Flexibility  Install time aggregation and propagation strategy  Applications can specify up and down  Examples Update-Local up=0, down=0 Update-All up=ALL, down=ALL Update-Up up=ALL, down=0  Spatial and temporal heterogeneity  Applications can exploit continuous mode in probe API To propagate aggregate values to only interested nodes With expiry time, to propagate for a fixed time  Also implies scalability with sparse attributes

November 17, 2015Department of Computer Sciences, UT Austin16 Scalability and Flexibility  Install time aggregation and propagation strategy  Applications can specify up and down  Examples Update-Local up=0, down=0 Update-All up=ALL, down=ALL Update-Up up=ALL, down=0  Spatial and temporal heterogeneity  Applications can exploit continuous mode in probe API To propagate aggregate values to only interested nodes With expiry time, to propagate for a fixed time  Also implies scalability with sparse attributes

November 17, 2015Department of Computer Sciences, UT Austin17 Scalability and Flexibility  Install time aggregation and propagation strategy  Applications can specify up and down  Examples Update-Local up=0, down=0 Update-All up=ALL, down=ALL Update-Up up=ALL, down=0  Spatial and temporal heterogeneity  Applications can exploit continuous mode in probe API To propagate aggregate values to only interested nodes With expiry time, to propagate for a finite time  Also implies scalability with sparse attributes

November 17, 2015Department of Computer Sciences, UT Austin18 Autonomy  Systems spanning multiple administrative domains  Allow a domain administrator control information flow Prevent external observer from observing the information in the domain Prevent external failures from affecting the operations in the domain  Support for efficient domain wise aggregation  DHT trees might not conform  Autonomous DHT  Two properties Path Locality Path Convergence  Reach domain root first Domain 1 Domain 2 Domain 3

November 17, 2015Department of Computer Sciences, UT Austin19 Outline  SDIMS: a distributed operating system backbone  Aggregation abstraction  Our approach  Leverage Distributed Hash Tables  Separate attribute type from name  Flexible API  Prototype and evaluation  Conclusions

November 17, 2015Department of Computer Sciences, UT Austin20 Prototype and Evaluation  SDIMS prototype  Built on top of FreePastry [Druschel et al, Rice U.]  Two layers Bottom: Autonomous DHT Top: Aggregation Management Layer  Methodology  Simulation Scalability Flexibility  Prototype Micro-benchmarks on real networks –PlanetLab –CS Department

November 17, 2015Department of Computer Sciences, UT Austin21  Methodology  Sparse attributes: multicast sessions with small size membership  Node Stress = Amt. of incoming and outgoing info  Two points  Max Node Stress an order of magnitude less than Astrolabe  Max Node Stress decreases as the number of nodes increases Simulation Results - Scalability Astrolabe 256 Astrolabe 4096 Astrolabe SDIMS 256 SDIMS 4096 SDIMS 65536

November 17, 2015Department of Computer Sciences, UT Austin22 Simulation Results - Flexibility  Simulation with 4096 nodes  Attributes with different up and down strategies Update-Local Update-Up Update-All Up=5, down=0 Up=all, down=5

November 17, 2015Department of Computer Sciences, UT Austin23 Prototype Results  CS department: 180 machines (283 SDIMS nodes)  PlanetLab: 70 machines Department Network Update - AllUpdate - UpUpdate - Local Latency (ms) Planet Lab Update - AllUpdate - UpUpdate - Local

November 17, 2015Department of Computer Sciences, UT Austin24 Conclusions  SDIMS – basic building block for large-scale distributed services  Provides information collection and management  Our Approach  Scalability and Flexibility through Leveraging DHTs Separating attribute type from attribute name Providing flexible API: install, update and probe  Autonomy through Building Autonomous DHTs  Robustness through Default lazy reaggregation Optional on-demand reaggregation

November 17, 2015Department of Computer Sciences, UT Austin25 For more information: