Download presentation
Presentation is loading. Please wait.
Published byMarsha Paul Modified over 8 years ago
1
Data Indexing in Peer- to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004
2
DHT Structure P2P Distributed Hash Table mapping between the file identifier and location Ex: Search for file "Starwars.divx“ Convert "Starwars.divx" to a key, say "123456789“ Lookup "123456789" in the DHT, find out the file location Download the file
3
Indexing Indexes don’t contain key-to-data mapping Indexes provide a key-to-key service, or more precisely a query-to-query service Ex: Query q A list of more specific queries, covered by q Select a query q If q is the most specific query of a file, returns the file
5
Maintain In order to consists of query-to-query mappings, each node: Insert( q, q i ) function, with q 包含所有的 q i adds a mapping( q ; q i ) to the index of the node responsible for key q Lookup( q ) function, with q not being the most specific query of a file, returns a list of all the queries qi such there is a mapping(q;qi) in the index of the node responsible for key q
6
Example: bibliographic database Query-to-key Query-to-Query
9
Discussion Some interesting properties of this indexing techniques: Space efficient Scalability Loose coupling between data and indexes Versatility Adaptability Decentralized architecture Resilient to arbitrary linking
10
System point of view Search process should be simple Amount of network traffic should be minimized Storage space dedicated to the indexing metadata should remain within reasonable limits.
11
Evaluation Distributed Bibliographic Database Bibliographic database sites: BibFinder http://kilimanjaro.eas.asu.edu NetBib http://edas.info/S.cgi?search=1http://kilimanjaro.eas.asu.eduhttp://edas.info/S.cgi?search=1
12
Indexing scheme Simple indexing schemeFlat indexing scheme
13
Indexing scheme Complex indexing scheme
14
Indexing scheme Simple: A query for an author or a title returns a set of author and title pairs. The most space-efficient of the three, requiring 152MB of extra storage in the system. Flat: index query length is always 2. require 37% increase more space. Complex: some queries in the simple scheme are split into more specific queries. Require 25% increase more space.
15
Probability vs. Ranking
16
Caching Multi-cache: shortcuts are created on each node along the lookup path. Cache size is unbounded. Single-cache: shortcuts are created only on the first node that was contacted. Cache size is unbounded. LRU (least-recently used) : only a limited number of shortcuts can be stored on each node.
17
Average number of interactions required to find data.
18
Average network traffic (bytes) generated per query.
19
Cache efficiency: distributed hit ratio.
20
Conclusion Indexing the data stored in the peer-to- peer network. Indexes are distributed across the nodes of the network and contain key-to-key (or query-to-query) mappings. Given a broad query, a user can look up the more specific queries that match its original query.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.