Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Slides:



Advertisements
Similar presentations
CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Scalable Content-Addressable Network Lintao Liu
Efficient Search - Overview Improving Search In Peer-to-Peer Systems Presented By Jon Hess cs294-4 Fall 2003.
Improving Search in Peer-to-Peer Networks Beverly Yang Hector Garcia-Molina Presented by Shreeram Sahasrabudhe
1 Turning Heterogeneity into an Advantage in Overlay Routing Gisik Kwon Dept. of Computer Science and Engineering Arizona State University Published in.
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
Routing Indices For Peer-to-Peer Systems Svetlana Strunjas University of Cincinnati May,2002.
Routing Indices For Peer-to-Peer Systems Arturo Crespo, Hector Garcia-Molina Stanford ICDCS 2002.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
A Trust Based Assess Control Framework for P2P File-Sharing System Speaker : Jia-Hui Huang Adviser : Kai-Wei Ke Date : 2004 / 3 / 15.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
Distributed Lookup Systems
Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Efficient Search in Peer to Peer Networks By: Beverly Yang Hector Garcia-Molina Presented By: Anshumaan Rajshiva Date: May 20,2002.
Searching in Unstructured Networks Joining Theory with P-P2P.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.
1 Unstructured P2P overlay. 2 Centralized model  e.g. Napster  global index held by central authority  direct contact between requestors and providers.
09/07/2004Peer-to-Peer Systems in Mobile Ad-hoc Networks 1 Lookup Service for Peer-to-Peer Systems in Mobile Ad-hoc Networks M. Tech Project Presentation.
Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Routing Indices For P-to-P Systems ICDCS Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.
Network Computing Laboratory Scalable File Sharing System Using Distributed Hash Table Idea Proposal April 14, 2005 Presentation by Jaesun Han.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
03/19/02Scalab Seminar Series1 Routing in Peer-to-Peer Systems Ramaswamy N.Vadivelu Scalab, ASU.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.
Data Indexing in Peer- to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004.
1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
Spring 2000CS 4611 Routing Outline Algorithms Scalability.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
EE 122: Peer-to-Peer (P2P) Networks
DHT Routing Geometries and Chord
A Scalable content-addressable network
Presentation transcript:

Unstructure P2P Overlay

Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina

Current Techniques Gnutella –BFS with depth limit D. –Waste bandwidth and processing resources Freenet –DFS with depth limit D. –Poor response time.

Iterative Deepening Basic idea is to reduce the number of nodes that process a query Under policy P= { a, b, c} ;waiting time W See example.

Directed BFS A source send query messages to just a subset of its neighbors A node maintains simple statistics on its neighbors –Number of results received from each neighbor –Latency of connection

Candidate nodes Returned the Highest number of results Return response messages that have taken the lowest average number of hops High messages

Local Indices Each node n maintains an index over the data of all nodes within r hops radius. All nodes at depths not listed in the policy simply forward the query. Example: policy P= { 1, 5}

Experimental Setup For each response,we log: –Number of hops took –IP from which the Response message came –Response time –Individual results

Experimental result

Routing Indices For P-to-P Systems ICDCS 2002

Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes (centralized search) –Mechanisms with indices at each node Structure P2P network Unstructure P2P network Parallel v.s. sequentially search –Response time –Network traffic

Routing indices(RI) Query –Documents are on zero or more “topics”, and queries request documents on particular topics. –Documents topics are independent Local index RI –Each node has a local routing index which contains following information The number of documents along each path The number of documents on each topic of interest –Allow a node to select the “best” neighbors to send a query to

The RI may be “coarser” than the local indices

Goodness measure –Number of results in a path Using Routing indices

–Storage space N: number of nodes in the P2P network b: branching factor c: number of categories s: counter size in bytes Centralized index : s*( c+1) *N Distributed system: s*(c+1)*b (each node)

Creating routing indices

Maintaining Routing Indices –Trade off between RI freshness and update cost –No requiring the participation of a disconnecting node Discussion –If the search topics is dependent? –Can the number of “hops” necessary to reach a document be estimated?

Alternative Routing Indices Hop-count RI –Aggregated RIs for each “hop” up to a maximum number of hops are stored

–Search cost Number of messages –The goodness of a neighbor The ratio between the number of documents available through that neighbor and the number of messages required to get those documents –Regular tree with fanout F –It takes F h messages to find all documents at hop h –Storage cost?

Exponentially aggregated RI –Store the result of applying the regular-tree cost formula to a hop-count RI –How to compute the goodness of a path for the query containing several topics?

Cycles in the P2P network

Efficient Content Location Using Interest-Based Locality in Peer-to- Peer Systems Kunwadee Sripanidkulchai Bruce Maggs Hui Zhang IEEE INFOCOM 2003

motivation Although flooding is simple and robust, it is not scalable. A content location solution in which peers organized into an interest-based structure on top of Gnutella. The algorithm is called interest-based shortcuts

Interest-based locality

Shortcuts Architecture and Design Goals To create additional links on top of a peer- to-peer system’s overlay As a separate performance enhancement layer on top of existing content location mechanisms

Content location paths

Shortcut Discovery The first lookup returns a set of peers that store the content These are potential candidates. One peer is selected at random from the set and added For scalability, each peer allocates a fixed- size amount of storage to implement shortcuts. Alternatives for shortcut discovery –Exchanging shortcut lists between peers

Shortcut selection We rank shortcuts based on their perceived utility A peer sequentially asking all of the shortcuts on its list.

Ranking metrics Probability of providing content Latency of the path to the shortcut Load at the shortcut A combination of metrics can be used based on each peer’s preference

Potential and Limitations Adding 5 shortcuts at a time produces success rates that are close to the best possible. Slightly increase the shortest path length from 1 to 2 hops will perform better success rate.

Efficient and Scalable Query Routing for Unstructured Peer-to-Peer Networks A.Kumar, J. Xu and W.W. Zegura

Overview As the distance from the node hosting the object increases, fewer bits are used to represent information about the direction in which the object located

Design Exponential decay bloom filter (EDBF) –Bloom filter is a data structure for approximately answering set membership questions k hash functions, and an array A A[h i (x)]=1, for i=1…k  (x) =|{i|A[h i (x)]=1, i=1..k}| –# of 1’s in the filter –  (x) /k roughly indicates the probability of finding x along a specific link in the overlay –Noise? –When there is no noise one hop away from the object x,  (x) is approximately k bits two hops away from the object x,  (x) is approximately k/d –Decay implementation Decay rate is 1/d Nodes reset each of the bits in the EDBFs received from upstream neighbors with a probability (1-d)

Creation and Maintenance of routing tables

The initial advertisement is created by taking the union of all advertisements received from neighbors other than the target neighbor Decay the combined advertisement by the decay factor d Union the result with the local EDBF –The local EDBF is propagated without attenuation Loops –Split horizon with poisoned reverse Information received from a neighbor j will not be advertised back to j –Exponentially decay The count to infinity problem manifests itself as a “decay to infinitely small amount of information”

Query forwarding

If the query is satisfied locally, it is answered Otherwise, if the TTL of the query has not expired –If the query was previously seen, it is forwarded to a randomly chosen neighbor –Otherwise, the query is forwarded to the neighbor with the highest  (x)

Structure P2P Overlay

Similarity Discovery in structured P2P Overlays ICPP

Introduction Structured P2P network –Only support search with a single keyword Similarity between two documents –Keyword sets –Vector space –Measure Problems –Search problem –New keyword?

Meteorograph Absolute angle

Publishing and Searching Publish –Hash –Publish the item to a node n p with the hash key closest to hash value

Search problem –Nearest answers –K_nearest answers –  Partial Comprehensive Search strategy Discussions What happened when keyword vector is represented by  ?

Other issues Load balance Changes of vector space –Republished? –Comprehensive set of keywords –Other methods?

SWAM: A Family of Access Methods for Similarity-Search in Peer-to-Peer Data Networks Farnoush Banaei-Kashani Cyrus Shahabi (CIKM04)

PDN access method Defines How to organize the PDN topology to an index-like structure How to use the index structure

Hilbert space Hilbert space (V, Lp) Key k = (a1,a2, …, ad) – d: the dimension of a Vector space –The domain is a contiguous and finite interval of R The Lp norm with p belongs to Z+ –The distance function to measure the dissimilarity

Topology Topology of a PDN can be modelled as a directed graph G(N, E) A(n) is the set of neighbors for node n A node maintains –A limited amount of information about its neighbors Includes the key of the tuples maintained at neighbors The physical addresses of neighbors

The processing of the query is completed when all expected tuples in the relevant result set are visited Access methods –Join, leave for virtual nodes –Forward for using local information to process queries and make forwarding decisions

The small world example Grid component Random graph component The process of queries (exact, range, kNN) in the highly locality topology

Flat partitioning SWAM also employs the space partitioning idea: flat partitioning

Query Processing Exact-Match query processing Range query processing kNN Query processing

Similarity Search in Peer-to-Peer Databases IEEE International Conference on Distributed Computing Systems 2005

Data and Query Model All data objects are unit vectors in a d- dimensional Euclidean space Cosine distance Can

Design Details The indexing scheme –Locality sensitive hashing function is used to reduce the dimensionality r is a d-dimensional unit vector h(x) is the concatenation of the bits b r1 (X),b r2 (X)…b rk (X) –Objects with the same hash value belong to the same cluster and are stored at the node which owns the DHT key h(x) Group nearby objects to indices with low hamming distance To avoid the situation that nearby objects differ in some bit positions in their index –t hashing functions are used (replication) »To ensures that there is a high probability of two related objects hashing onto indices with low hamming distance in at least one of these sets

The search algorithm –Node u generate Query (x,  ) –Compute h(x) –Compute the set V of all indices whose hamming distance from h(x) is at most r. –Node u queries each of the node in V –Nodes in V return all data objects which match u’s query –How to determine r?

Adaptive replication –Ensure the number of copies of each key in the network is proportional to its popularity The number of copies of each key is proportional to the rate at which queries arrive for this key Randomized Lookup –The lookup for a specific key terminates uniformly at random at one of the copies of this key –Guarantee that the load is balanced uniformly across all copies of all keys in the system.

Discussion Search cost ? What is the cardinality of set V? Availability ?

Guaranteeing Correctness and Availability in P2P Range Indices SIGMOD 2005

Introduction Hashing destroys the value ordering among the search key values –Cannot be used to process range queries efficiently Solution –Range indices assign data items to peers directly based on their search key value –Load balance?

P-ring overview Two types of peers –Live peers Used to store data item The data stored in each live peer is between sf and 2*sf (sf: storage factor) –Free peers Overflow (> 2*sf) –Split its assigned range with a free peer Underflow (< sf) –Merge with its successor in the ring to obtain more entries

Incorrect query results Inconsistent Ring

Concurrency in the data store

Solution Handling ring inconsistency –Two states Joined and joining Peer p remains in the joining state until all relevant peers know about p Only store items in peers in the joined state Handling data store concurrency –P stays in a lock state until p succ locks its range

Supporting Complex Multi- dimensional Queries in P2P systems IEEE International Conference on Distributing Systems 2005 (HW)

Data Indexing in Peer-to-Peer DHT Networks ICDCS 2004

Locating data using incomplete information. –How to search data in a DHT Data descriptors and queries –Semi-structured XML data

–Query Most specific query for d Relationship between queries

Given the most specific query, finding the location of the file is simple How about less specific queries Solution –Provide query-to-query service For a given query q, the index service returns a list of more specific queries, covered by q –DHT storage system must be extended Insert(q.q i ), q->q i, adds a mapping (q;q i ) to the index of the node responsible for key q.