XtreemOS IP project is funded by the European Commission under contract IST-FP6-033576 XtreemOS WP3.2 - T3.2.3 Scalable Directory Service Design State.

Slides:



Advertisements
Similar presentations
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge
Advertisements

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
A P2P REcommender system based on Gossip Overlays (PREGO) ‏ R.Baraglia, P.Dazzi M.Mordacchini, L.Ricci A P2P REcommender system based on Gossip Overlays.
Building of P2P Overlay Networks via Voronoi and Gossip Ranieri Baraglia.
Multidimensional Indexing
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Searching on Multi-Dimensional Data
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Speaker: Cathrin Weiß 11/23/2004 Proseminar Peer-to-Peer Information Systems.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
Applications over P2P Structured Overlays Antonino Virgillito.
Scalable Resource Information Service for Computational Grids Nian-Feng Tzeng Center for Advanced Computer Studies University of Louisiana at Lafayette.
1 IMPROVING RESPONSIVENESS BY LOCALITY IN DISTRIBUTED VIRTUAL ENVIRONMENTS Luca Genovali, Laura Ricci, Fabrizio Baiardi Lucca Institute for Advanced Studies.
Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.
Overlay Networks EECS 122: Lecture 18 Department of Electrical Engineering and Computer Sciences University of California Berkeley.
Sericce and Resource Discovery Supports over P2P Overlays Emanuele Carlini, Massimo Coppola Patrizio Dazzi, Domenico Laforenza, Laura Ricci SERVICE AND.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
Object Naming & Content based Object Search 2/3/2003.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Focus on Distributed Hash Tables Distributed hash tables (DHT) provide resource locating and routing in peer-to-peer networks –But, more than object locating.
DOMAIN NAME SYSTEM. Introduction  There are several applications that follow client server paradigm.  The client/server programs can be divided into.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Achieving fast (approximate) event matching in large-scale content- based publish/subscribe networks Yaxiong Zhao and Jie Wu The speaker will be graduating.
Other Structured P2P Systems CAN, BATON Lecture 4 1.
A Distributed Architecture for Multi-dimensional Indexing and Data Retrieval in Grid Environments Athanasia Asiki, Katerina Doka, Ioannis Konstantinou,
Peer-to-Peer Support for Massively Multiplayer Games Zone Federation of Game Servers : a Peer-to-Peer Approach to Scalable Multi-player Online Games [INFOCOM.
SEMILARITY JOIN COP6731 Advanced Database Systems.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Document retrieval Similarity –Vector space model –Multi dimension Search –Range query –KNN query Query processing example.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
Handling Spatial Data In P2P Systems Verena Kantere, Timos Sellis, Yannis Kouvaras.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Peer-to-Peer Name Service (P2PNS) Ingmar Baumgart Institute of Telematics, Universität Karlsruhe IETF 70, Vancouver.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
1. Outline  Introduction  Different Mechanisms Broadcasting Multicasting Forward Pointers Home-based approach Distributed Hash Tables Hierarchical approaches.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
NCLAB 1 Supporting complex queries in a distributed manner without using DHT NodeWiz: Peer-to-Peer Resource Discovery for Grids Sujoy Basu, Sujata Banerjee,
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Chapter 29 Peer-to-Peer Paradigm Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Multidimensional Access Structures
Distributed Hash Tables
COS 461: Computer Networks
Spatial Indexing I Point Access Methods.
EE 122: Peer-to-Peer (P2P) Networks
5.2 FLAT NAMING.
Multidimensional Indexes
COS 461: Computer Networks
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Consistent Hashing and Distributed Hash Table
Presentation transcript:

XtreemOS IP project is funded by the European Commission under contract IST-FP XtreemOS WP3.2 - T3.2.3 Scalable Directory Service Design State of Arts and Proposals Martina Baldanzi, Massimo Coppola, Domenico Laforenza, Laura Ricci ISTI/CNR Pisa

U 2 WP3.2 Meeting, Amsterdam, 09/10/2007 DISTRIBUTED DIRECTORY SERVICES Consider a system where any resources R is defined by k attributes. R corresponds to a point in a k-dimensional case Directory services (DS) = A service returning the name(s) or address(es) of all the items (resources) characterized by a k given values of the attributes. Some functionalities of the DS: ● indexing of distributed resources ● complex queries requiring resources satisfying a set of constraints ● dynamic attributes ● pub/sub functionalities. notification of a resource state updates

U 3 WP3.2 Meeting, Amsterdam, 09/10/2007 COMPLEX QUERIES: CLASSIFICATION Queries may be classifed according to the number of attributes considered by the query  k-dimensional queries, k, k  2 (DHT support 1-dimensional queries only) type of attributes: static, dynamic, semi-dynamic constraint defined on each attribute  exact match query: Arch.='x86' and CPU-Speed='3 Ghz' and RAM='256MB'  partial match queries: CPU-Speed='3 Ghz' and RAM='256MB' (and Arch.=*)  range queries 1Ghz<CPU-Speed<'3Ghz' and 512MB<RAM<1Gb  similarity queries (o nearest neighbour queries)  require the definition of a metric in the attribute space  the user submit an exact match query, which defines a point P in the attribute space. P may not correspond to any resource.  Output: k resources nearest to P, according to the defined metric

U 4 WP3.2 Meeting, Amsterdam, 09/10/2007 COMPLEX QUERIES: CLASSIFICATION Query Uni-Dimensional Multi-Dimensional Exact Match Partial MatchRange Queries Similarity Queries  Data returned by these queries are close in the attribute space  Data locality is destroyed by the hash mapping defined in DHT. Points, i.e. resources, close in the attribute space may be mapped to nodes far from each other on the overlay  This is due to the uniform mapping defined by the hash function  Definition of an indexing layer above the DHT to recover loss of locality

U 5 WP3.2 Meeting, Amsterdam, 09/10/2007 NODE ARCHITECTURE Query Layer Indexing Layer TCP/IP DHT Layer DHT (Chord, Pastry, CAN, Kademlia...) put(key, data) Look-up (key)

U 6 WP3.2 Meeting, Amsterdam, 09/10/2007 EXISTING APPROACHES DHT based approaches Locality preserving hash functions  1-dimensional-range queries (MAAN,CHORD#)  k-dimesional range queries: locality preserving mapping from a k- dimensional space to a 1-dimensional space based on space filling curves (Squid) Space partitioning based approaches  The k-dimensional attribute space is partitioned into a set of zones  The granularity of the domain of the hash function is increased  Mapping of zones to peers (Gao-Steenkiste, Ratnasamy,....) Other approaches (not DHT based)  Voronoi based overlays definition Important Remark: No proposal defines a single framework for range queries, multidimensional queries, dynamic attributes

U 7 WP3.2 Meeting, Amsterdam, 09/10/2007 K-DIMENSIONAL SPACES ON DHT Red-blue Line= Space filling curve Space linearization: Each point of the k-dimensional space is mapped to a point of the blue-red line Points on the blue-red line are indexed by integer numbers. Mapping of the points of the line on the DHT: keys= indexes of points Example: Point (010, 010) (red point ) is mapped on Successor(001000). Points close on the blue-red line  are close in the k-dimensional space  are mapped to the same node of the DHT or to close nodes

U 8 WP3.2 Meeting, Amsterdam, 09/10/2007 SPACE FILLING CURVES: QUERY RESOLUTION ● Range Query Resolution points which are close on the red line are also close in the k- dimensional space Points close in the k-dimensional space may be not close on the red line ● Cluster = set of points belonging to the same segment of the red line ● Range Query Resolution Detection of clusters covering data covered by the query For each cluster, the query is sent to the nodes storing that cluster A single message for each cluster. Range query (101, ) covers the orange zone defines two different clusters

U 9 WP3.2 Meeting, Amsterdam, 09/10/2007 SPACE PARTITIONING APPROACHES ● Attributes space is partitioned into zones ● Each zone is assigned to a different peer ● Hash functions maps zones, instead that single points. ● Attribute space partitioning is described by a tree-like index structure which is distributed to the nodes of the system ● Given a k-dimensional query corresponding to a point P in the k-dimensional space, the tree is exploited to detect the zone Z including P ● The querying node exploits the hash function to map Z to a node

U 10 WP3.2 Meeting, Amsterdam, 09/10/2007 SPACE PARTITIONING APPROACHES Open research problems: Range query support Definition of highly distributed data structures Replication/consistency of the data structure Dynamic indexes Load Balancing techniques

U 11 WP3.2 Meeting, Amsterdam, 09/10/2007 FIRST PERIOD PROTOTYPE The prototype developed in the first period of the project must integrate k-dimensional and range queries within a single framework  based on locality preserving hash functions  dynamic attributes?  experiments on a real distributed platform (GRID 5000) Afterwards, investigate more complex solutions  Tree based indexes  Space filling curves

U 12 WP3.2 Meeting, Amsterdam, 09/10/2007 FIRST PERIOD PROTOTYPE A directory service supporting k-dimensional range queries: based on DHT exploiting locality preserving hash functions (Locality Preserving Bamboo, Chord#,...?) load balancing: insertion of new nodes in 'crowded regions' k-dimensional range queries  replication. A resource R defined by k attributes is registered under k different keys. One key for each attribute value  k-dimensional Range Queries  simple, but inefficient solution: intersection of k 1-dimensional range sub-queries (one for each attribute).  improvement: – definition of a dominant attribute. decreasing the size of the search space – A query for the dominant attribute, sub-query managed by each detected node

U 13 WP3.2 Meeting, Amsterdam, 09/10/2007 FIRST PERIOD PROTOTYPE

U 14 WP3.2 Meeting, Amsterdam, 09/10/2007 DYNAMIC ATTRIBUTES detect static and dynamic attributes define groups of resources characterized by the same values of a static attribute  Ex: all the host with CPU=2.5GHz define a multicast group G for each group of resources Hashing is applied to the static attribute. The resulting node returns a node R acting as the root of the a multicast tree associated to G. R forward the query to any peer P belonging to the multicast group. Each P checks the value of the dynamic attributes Exploit the DHT routing level (ex: Scribe application level multicast on Pastry) to define an application level multicast