ICDE 2004 1 A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.

Slides:



Advertisements
Similar presentations
CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Data Currency in Replicated DHTs Reza Akbarinia, Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.
Expediting Searching Processes via Long Paths in P2P Systems 05/30 IDEA Lab.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Thomas ZahnCST1 Seminar: Information Management in the Web Query Processing Over Peer- to-Peer Data Sharing Systems (UC Santa Barbara)
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
A Scalable Content Addressable Network (CAN)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
Fault-tolerant Routing in Peer-to-Peer Systems James Aspnes Zoë Diamadi Gauri Shah Yale University PODC 2002.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Distributed Lookup Systems
XtreemOS IP project is funded by the European Commission under contract IST-FP XtreemOS WP3.2 - T3.2.3 Scalable Directory Service Design State.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Other Structured P2P Systems CAN, BATON Lecture 4 1.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
CONTENT ADDRESSABLE NETWORK Sylvia Ratsanamy, Mark Handley Paul Francis, Richard Karp Scott Shenker.
A Distributed Architecture for Multi-dimensional Indexing and Data Retrieval in Grid Environments Athanasia Asiki, Katerina Doka, Ioannis Konstantinou,
Sylvia Ratnasamy (UC Berkley Dissertation 2002) Paul Francis Mark Handley Richard Karp Scott Shenker A Scalable, Content Addressable Network Slides by.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Caching and Data Consistency in P2P Dai Bing Tian Zeng Yiming.
Routing Indices For P-to-P Systems ICDCS Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.
Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
1 Slides from Richard Yang with minor modification Peer-to-Peer Systems: DHT and Swarming.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Content Addressable Networks CAN is a distributed infrastructure, that provides hash table-like functionality on Internet-like scales. Keys hashed into.
Page 1 MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services Shoji Nishimura (NEC Service Platforms Labs.), Sudipto Das,
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CSCI 599: Beyond Web Browsers Professor Shahram Ghandeharizadeh Computer Science Department Los Angeles, CA
Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
CS 268: Lecture 22 (Peer-to-Peer Networks)
Early Measurements of a Cluster-based Architecture for P2P Systems
EE 122: Peer-to-Peer (P2P) Networks
A Scalable content-addressable network
Presentation transcript:

ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science University of California at Santa Barbara

ICDE Outline Motivation Range mapping System overview Experimental results Conclusion and future work

University of California at Santa Barbara ICDE Motivation All queries are answered by the server Server is overloaded Scalability, availability Same/similar queries are evaluated multiple times Clients Central Data Server

University of California at Santa Barbara ICDE Motivation Users share their cached answers Server is contacted only if the P2P layer cannot find an answer Clients Central Data Server P2P Cache

University of California at Santa Barbara ICDE P2P Systems File sharing: Napster, Gnutella, KaZaA, … Central index or flooding Structured P2P systems: CAN, Chord, Pastry, Tapestry, … DHT/DOLR Efficient Routing: logarithmic/sublinear

University of California at Santa Barbara ICDE CAN Uses a d-dimensional virtual space for routing and object location Virtual space is partitioned into zones and each zone is maintained by a peer Every peer is responsible for the objects that are hashed into its zone 2-dimensional CAN

University of California at Santa Barbara ICDE Extending DHT functionality DHTs are designed for exact-match queries Piazza [Univ. of Washington], Hyperion [Univ. of Toronto], PIER [UC Berkeley] Extend DHTs for supporting range queries Selection of ranges is a primary operation for any kind of data analysis Main Goal: Utilize a DHT in order to materialize and locate cached answers of range queries

University of California at Santa Barbara ICDE Range Queries Given a range query, find the cached answers that can be used to compute the query answer Example: If the result of is already cached in the system, then the query can be answered using the cached result is subsumed by ; so the cached result is the super-set of the answer

University of California at Santa Barbara ICDE Use range string as key: Query: Hash string: “ ” DHTs for locating ranges Can we use original DHTs? Finds exact answers but not the similar ones!

University of California at Santa Barbara ICDE Extending CAN For single attribute, the virtual space is a 2- dimensional CAN The boundaries are determined by the domain of the range attribute x y x Virtual space when attribute domain is [20,80]

University of California at Santa Barbara ICDE Mapping Scheme Range is mapped to point (x,y) Super-ranges are only in the upper- left region (40,60) (20,20)(80,20) (80,80)(20,80) (40,60) (30,70) (30,50) Start value End value

University of California at Santa Barbara ICDE Space Partitioning Virtual space is partitioned into rectangular zones Each zone is assigned to an active peer With this mapping, the data source is responsible for the top-left zone

University of California at Santa Barbara ICDE Space Partitioning Active/Passive peers Passive Peers Active Peers S S Data Source

University of California at Santa Barbara ICDE S Space Partitioning Active/Passive peers Passive Peers Active Peers

University of California at Santa Barbara ICDE S Space Partitioning Active/Passive peers Passive Peers Active Peers

University of California at Santa Barbara ICDE S Space Partitioning Active/Passive peers Passive Peers Active Peers Each active peer keeps a list of passive peers Passive peers register with active peers

University of California at Santa Barbara ICDE Zone Split An active peer splits its zone when it is overloaded Load can be due to storage or bandwidth, etc. Split line is selected by the owner of the zone Even partitioning of the zone and the cached results New zone is assigned to a passive peer

University of California at Santa Barbara ICDE Routing Same as in CAN (Greedy routing) Each zone passes the message to the neighbor closest to the destination (50,55) Query:

University of California at Santa Barbara ICDE A Sharing cached answers Map the range to a point and send a notification message towards that point Destination peer keeps the index information P 1) caches 2) notify : P.. Local index at A 3) insert to index

University of California at Santa Barbara ICDE A Querying Map the range to a point and send a query message towards that point Destination peer searches the local index C requires 1) query : P.. Local index at A 2) return P P 3) transfer

University of California at Santa Barbara ICDE Forwarding The zones on the upper- left region may have super-ranges Destination zone forwards the request to upper-left zones (50,55) If no result is found at the destination, then…

University of California at Santa Barbara ICDE Acceptable Fit (50,55) How far to forward?  Forwarding is controlled by a parameter: AcceptableFit  It is a real value between [0,1]: offset = AcceptableFit x |domain|  Acceptable range for a range query is then: offset

University of California at Santa Barbara ICDE Forwarding Schemes Two schemes for forwarding: Flooding: Flood to all candidate zones Directed Forwarding: Iteratively forward to a single neighbor, that has the largest overlap with the acceptable region Stop if a result is found or a certain number of peers are contacted (DirectedLimit)

University of California at Santa Barbara ICDE Flooding vs. Directed Forwarding FloodingDirected Forwarding (Directed Limit=2)

University of California at Santa Barbara ICDE Updates Tuple with value 40 is updated! (40,40) Go to the corresponding point, (40,40), and flood to the upper-left region Costly, so we need better solutions Batching updates

University of California at Santa Barbara ICDE Forwarding Decreasing coordinates along odd dimensions Increasing coordinates along even dimensions Multiple range attributes Each attribute maps to two dimensions A range query over k attributes is mapped to a point in 2k-dimensional CAN ( 20<A<40, 50<B<60 ) (20,40,50,60) ( 10<A<50, 40<B<70 )  (10,50,40,70) ( -, -, -, - )

University of California at Santa Barbara ICDE Experiment Settings Single attribute with domain [0,500] The system is initially empty Range queries are selected uniformly at random For every zone: Split Point=5, Routing Threshold=3

University of California at Santa Barbara ICDE Flooding vs. Directed Forwarding Performance with flood forwardingPerformance with directed forwarding

University of California at Santa Barbara ICDE Routing is scalable Visited zones with Flood forwardingVisited zones with Directed forwarding

University of California at Santa Barbara ICDE Load Distribution 1000 peers, queries

University of California at Santa Barbara ICDE Conclusion and Future Work We presented a simple yet powerful mapping for ranges which allows us to leverage DHT infrastructure for range queries Limitations/Future Work Number of attributes should be fixed Does not work with other DHTs Assumes the existence of passive peers for load balancing

University of California at Santa Barbara ICDE Questions?