Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.

Slides:



Advertisements
Similar presentations
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
P2PIR'06: "Distributed Cache Table (DCT)" Gleb Skobeltsyn, Karl Aberer D istributed T able: Efficient Query-Driven Processing of Multi-Term Queries in.
Scalable Content-Addressable Network Lintao Liu
PDPTA03, Las Vegas, June S-Chord: Using Symmetry to Improve Lookup Efficiency in Chord Valentin Mesaros 1, Bruno Carton 2, and Peter Van Roy 1 1.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Xiaowei Yang CompSci 356: Computer Network Architectures Lecture 22: Overlay Networks Xiaowei Yang
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, IsraelFebruary 19, 2007 Efficient implementation of BP in.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
1 One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems Prasanna Ganesan Beverly Yang Hector Garcia-Molina Stanford University.
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
Applications over P2P Structured Overlays Antonino Virgillito.
1 Distributed Hash Tables My group or university Peer-to-Peer Systems and Applications Distributed Hash Tables Peer-to-Peer Systems and Applications Chapter.
©NEC Laboratories America 1 Hui Zhang Samrat Ganguly Sudeept Bhatnagar Rauf Izmailov NEC Labs America Abhishek Sharma University of Southern California.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan.
Mercury: Scalable Routing for Range Queries Ashwin R. Bharambe Carnegie Mellon University With Mukesh Agrawal, Srinivasan Seshan.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
1 ISWC GridVine: Building Internet-Scale Semantic Overlay Networks Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth School of Computer.
Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.
Object Naming & Content based Object Search 2/3/2003.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Mercury: Supporting Scalable Multi-Attribute Range Queries A. Bharambe, M. Agrawal, S. Seshan In Proceedings of the SIGCOMM’04, USA Παρουσίαση: Τζιοβάρα.
EPFL-I&C-LSIR [P-Grid.org] Workshop on Distributed Data and Structures ’04 NCCR-MICS [IP5] presented by Anwitaman Datta Joint work with Karl Aberer and.
P2P Course, Structured systems 1 Skip Net (9/11/05)
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
P-Grid Presentation by Thierry Lopez P-Grid: A Self-organizing Structured P2P System Karl Aberer, Philippe Cudré-Mauroux, Anwitaman Datta, Zoran Despotovic,
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Ecole Polytechnique Fédérale de Lausanne, Switzerland Efficient processing of XPath queries with structured overlay networks Gleb Skobeltsyn, Manfred Hauswirth,
“Umbrella”: A novel fixed-size DHT protocol A.D. Sotiriou.
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
Structured P2P Network Group14: Qiwei Zhang; Shi Yan; Dawei Ouyang; Boyu Sun.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Other Structured P2P Systems CAN, BATON Lecture 4 1.
Multi-level Hashing for Peer-to-Peer System in Wireless Ad Hoc Environment Dewan Tanvir Ahmed and Shervin Shirmohammadi Distributed & Collaborative Virtual.
IEEE P2P, Aachen, Germany, September Ad-hoc Limited Scale-Free Models for Unstructured Peer-to-Peer Networks Hasan Guclu
On P2P Collaboration Infrastructures Manfred Hauswirth, Ivana Podnar, Stefan Decker Infrastructure for Collaborative Enterprise, th IEEE International.
An affinity-driven clustering approach for service discovery and composition for pervasive computing J. Gaber and M.Bakhouya Laboratoire SeT Université.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
CCAN: Cache-based CAN Using the Small World Model Shanghai Jiaotong University Internet Computing R&D Center.
© 2002, Magdalena Punceva, EPFL-IC, Laboratoire de systèmes d'informations répartis Self-Organized Construction of Distributed Access Structures: A Comparative.
AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin.
Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys ICDE 2007 Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys Ivana.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
P2P Group Meeting (ICS/FORTH) Monday, 28 March, 2005 A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp,
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
Bandwidth-Efficient Continuous Query Processing over DHTs Yingwu Zhu.
NCLAB 1 Supporting complex queries in a distributed manner without using DHT NodeWiz: Peer-to-Peer Resource Discovery for Grids Sujoy Basu, Sujata Banerjee,
An overlay for latency gradated multicasting Anwitaman Datta SCE, NTU Singapore Ion Stoica, Mike Franklin EECS, UC Berkeley
Malugo – a scalable peer-to-peer storage system..
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
Distributed Hash Tables
CHAPTER 3 Architectures for Distributed Systems
Data Structures and Algorithms
Presentation transcript:

Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl Aberer, Manfred Hauswirth, Roman Schmidt Ecole Polytechnique Fédérale de Lausanne (EPFL) Patrons: NCCR-MICS: Evergrow: Swiss National Centres of Competence in Research Mobile Information & Communication Systems EC FP6, IST priority “Complex System Research” Contract no (FET-IP) Ever-growing global scale-free networks, their provisioning, repair and unique functions.

Indexing data-oriented overlay networks September 1 st, 2005 Structured overlays ♫ Associate each peer with some part of the load, i.e., a partition of the key-space ♪ e.g. as in Distributed Hash Tables (DHT) ♫ Provide an efficient routing mechanism to locate peer responsible for a particular part of the key- space ♪ Various choice of topology possible

Indexing data-oriented overlay networks September 1 st, 2005 Structured overlay maintenance ♫ Dynamics ♪ Churn: Peers Join/Leave ♪ New data inserted ♫ Standard maintenance mechanisms ♪ Correspond to updating database index ♪ Traditionally: Overlay evolution has been studied for incremental peer population Challenge #1: Fast construction of structured overlay from scratch

Indexing data-oriented overlay networks September 1 st, 2005 ♫ Hash Tables give constant time look-ups ♪ At the cost of losing ordering information ♪ DHTs need log(n) network hops ♫ Can we preserve (semantic) ordering information? ♪ Skewed load-distribution Challenge #2: The structured overlay should deal with arbitrary skew of load Overlays for data-oriented applications

Indexing data-oriented overlay networks September 1 st, 2005 Toy example: Distributing skewed load 01 Load-distribution Key-space

Indexing data-oriented overlay networks September 1 st, 2005 ♫ Key-space can be divided in two partitions ♪ Assign peers proportional to the load in the two sub- partitions A globally coordinated recursive bisection approach Load-distribution

Indexing data-oriented overlay networks September 1 st, 2005 ♫ Recursively repeat the process to repartition the sub-partitions A globally coordinated recursive bisection approach Load-distribution

Indexing data-oriented overlay networks September 1 st, 2005 ♫ Partitioning of the key-space s.t. there is equal load in each partition ♪ Uniform replication of the partitions ♪ Important for fault- tolerance ♫ Note: A novel and general load- balancing problem. A globally coordinated recursive bisection approach 1 Load-distribution

Indexing data-oriented overlay networks September 1 st, 2005 Lessons from the globally coordinated algorithm ♫ The intermediate partitions may be such that they can not be perfectly repartitioned. ♪ There’s a fundamental limitation with any bisection based approach, as well as for any fixed key-space partitioned overlay network. ♫ Limit of dealing with load skews ♫ Nonetheless practical ♪ For realistic load-skews and peer populations Achieves an approximate load-balance.

Indexing data-oriented overlay networks September 1 st, step: Distributed proportional partitioning - for overlay construction ♫ Given: ♪ A mechanism to meet other random peers ♪ A parameter p for partitioning the space ♫ Proportional partitioning: Peers partition proportional to the load distribution ♪ In a ratio p:1-p ♪ Lets say: we call the sub-partitions as 0 and 1 ♫ Referential integrity: Obtain reference to the other partition ♪ Needed to enable overlay routing ♫ Sorting the load/keys: Peers exchange the locally stored keys in order to store only keys for its own partition. * 1 000,010,100 * 3 101,001 Random interaction 1: ,010,001 0: ,100 Routing table p id Keys (only part of the prefix is shown) Legend 01 partitioning

Indexing data-oriented overlay networks September 1 st, 2005 Heuristic 1: Autonomous partitioning (AUT) ♫ Make a priori probabilistic decision (parameterized by p) for a sub-partition ♪ proportionality constraint automatically met ♫ Find a peer from the other partition ♪ In order to meet referential integrity constraint ♫ Markovian asymptotic analysis of the process (for p = 0.5) ♪ 2 log(2) interactions (on an average) per peer

Indexing data-oriented overlay networks September 1 st, 2005 Heuristic 2: Eager partitioning ( for p = 0.5 ) ♫ Undecided peers initiate contact with other random peer ♪ If contacted peer is also undecided, contacting and contacted peers decide for different partitions (Balanced split) ♪ If contacted peer has already decided, contacting peer decides for the other partition (Unblanced split) ♫ Markovian asymptotic analysis of the process (for p = 0.5) ♪ log(2) interactions (on an average) per peer ♫ AUT is relatively inefficient ♪ AUT wastes interactions in order to find a suitable peer Challenge: Can we have a strategy which works for all values of p, and is as efficient as eager partitioning when p = 0.5?

Indexing data-oriented overlay networks September 1 st, 2005 Heuristic 2: Eager partitioning ( for p = 0.5 ) ♫ Undecided peers initiate contact with other random peer ♪ If contacted peer is also undecided, contacting and contacted peers decide for different partitions (Balanced split) ♪ Refer to each other ♪ If contacted peer has already decided, contacting peer decides for the other partition (Unblanced split) ♪ Contacting peer refers to the contacted peer ♫ Markovian asymptotic analysis of the process (for p = 0.5) ♪ log(2) interactions (on an average) per peer ♫ AUT is relatively inefficient ♪ Wastes interactions in order to meet referential integrity Challenge: Can we have a strategy which works for all values of p, and is as efficient as eager partitioning when p = 0.5?

Indexing data-oriented overlay networks September 1 st, 2005 AEP: Adaptive eager partitioning (w.l.g, p ≤ 0.5 ) ♫ Undecided peers initiate contact with other random peers ♪ If contacted peer is also undecided, perform Balanced split with probability: ♪ Since we need more peers (a fraction of 1-p ) in sub-partition 1 ♪ If the contacted peer has already decided for 0, contacting peer decides for 1 ♪ If the contacted peer has already decided for 1, contacting peer decides for 0 with a probability: ♪ 1 otherwise, since we need more peers in sub-partition 1

Indexing data-oriented overlay networks September 1 st, 2005 Adaptive eager partitioning: choice of parameters ♫ Markovian analysis of the interactions ♪ Parameterized equations for & ♫ 0 ≤ p ≤ 1-log(2) ♪ ♫ 1-log(2) ≤ p ≤ 0.5 ♪

Indexing data-oriented overlay networks September 1 st, 2005 AEP: Without global knowledge of p ♫ If we only have local estimates of p ♪ Error analysis: What’s the distribution of the estimates, and how does it affect the partitioning process? ♪ Introduces systematic skew ♪ Favors larger partition ♪ Compensating the skew

Indexing data-oriented overlay networks September 1 st, 2005 COR: Skew compensated for AEP

Indexing data-oriented overlay networks September 1 st, 2005 Algorithmic Issues: Overlay Construction ♫ Initiating the indexing process ♫ Synchronizing and terminating the process ♪ Synchronizing replicas ♫ Complexity ♪ Latency: O(log(n) 2 ) - linear for sequential processes ♪ Communication: O(n.log(n) 2 ) - same as in sequential processes

Indexing data-oriented overlay networks September 1 st, 2005 Simulation results ♫ Discrete time simulation ♪ Mathematica based proprietary simulator ♫ Workloads ♪ Uniform, Pareto, Normal, real text collection from IR apps. (EU project: Alvis) ♫ Evaluation ♪ Deviation w.r.to what is obtained by the globally coordinated algorithm ♪ Measured in terms of the Euclidian Distance

Indexing data-oriented overlay networks September 1 st, 2005 Simulation results: How useful is the theory? Theory vs. Heuristic (256 peers) deviation Load distribution Load-distribution U: Uniform P: Pareto N: Normal A: Alvis IR proj. text

Indexing data-oriented overlay networks September 1 st, 2005 Load-distribution U: Uniform P: Pareto N: Normal Quality of load-balancing w.r.to peer population Peer populations deviation Expts: Population & Load distribution

Indexing data-oriented overlay networks September 1 st, 2005 Scalability Load-distribution U: Uniform P: Pareto N: Normal A: Alvis IR proj. text Interactions required per peer for overlay construction interactions Expts: Population & Load distribution

Indexing data-oriented overlay networks September 1 st, 2005 From theory to practice: PlanetLab experiments ♫ PlanetLab Testbed ♪ 400+ computers spread over various organizations and continents ( lab.org) ♫ Java implementation integrated with P-Grid ♪ P-Grid is a full-fledged P2P software ( ♫ Workload ♪ Text from IR applications studied under EU project Alvis ( Bootstrap the peers and form an unstructured network Structured overlay constructionExperiments evaluating search performance Churn Simulation vs. Expts SimExpt deviation peers Expt period "All models are wrong, but some are useful." - George E.P. Box

Indexing data-oriented overlay networks September 1 st, 2005 Bandwidth consumption Overlay construction phase Overlay operational phase ♪ Construction process involves sorting keys. ♪ Initially it has higher bandwidth requirement. ♪ (Later) In operational phase, the queries dominate the bandwidth consumption. Expt period

Indexing data-oriented overlay networks September 1 st, 2005 Overlay performance ♪ Overlay construction was complete and peers discovered all their replicas ♪ Plots show absolute query latency ♪ In terms of overlay hops, experiments match theory ♪ Churn leads to larger deviation, but 95% to 100% success rate Expt period query latency Churn No churn

Indexing data-oriented overlay networks September 1 st, 2005 Related work ♫ Mostly sequential construction ♪ Recent work on fast overlay construction [SPAA 2005] ♪ Does not deal with load-balancing ♫ Load-balancing ♪ Mostly addresses uniform load-distribution case ♪ Some work on skewed loads [e.g., VLDB 2004] ♪ Incremental load/peer population changes ♪ No dynamic adaptation of replication

Indexing data-oriented overlay networks September 1 st, Java implementation source-code available for download ♫lso: Range query IEEE P2P 2005