Efficient Search in Peer to Peer Networks By: Beverly Yang Hector Garcia-Molina Presented By: Anshumaan Rajshiva Date: May 20,2002.

Slides:



Advertisements
Similar presentations
C. Mastroianni, D. Talia, O. Verta - A Super-Peer Model for Resource Discovery Services in Grids A Super-Peer Model for Building Resource Discovery Services.
Advertisements

Chapter 5: Tree Constructions
Scalable Content-Addressable Network Lintao Liu
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
CMPE 521 Improving Search In P2P Systems by Yang and Molina Prepared by Ayhan Molla.
Efficient Search - Overview Improving Search In Peer-to-Peer Systems Presented By Jon Hess cs294-4 Fall 2003.
Improving Search in Peer-to-Peer Networks Beverly Yang Hector Garcia-Molina Presented by Shreeram Sahasrabudhe
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Routing Indices For Peer-to-Peer Systems Arturo Crespo, Hector Garcia-Molina Stanford ICDCS 2002.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
CPSC 322, Lecture 12Slide 1 CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12 (Textbook Chpt ) January, 29, 2010.
Graph COMP171 Fall Graph / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D E A C F B Vertex Edge.
A Trust Based Assess Control Framework for P2P File-Sharing System Speaker : Jia-Hui Huang Adviser : Kai-Wei Ke Date : 2004 / 3 / 15.
Improving Search in P2P Networks By Shadi Lahham.
Communication operations Efficient Parallel Algorithms COMP308.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
Blind Search-Part 2 Ref: Chapter 2. Search Trees The search for a solution can be described by a tree - each node represents one state. The path from.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Searching in Unstructured Networks Joining Theory with P-P2P.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Gnutella & Searching Algorithms in Unstructured Peer-to-Peer Networks CS780-3 Lecture Notes In Courtesy of David Bryan.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.
09/07/2004Peer-to-Peer Systems in Mobile Ad-hoc Networks 1 Lookup Service for Peer-to-Peer Systems in Mobile Ad-hoc Networks M. Tech Project Presentation.
Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions.
Distance Vector Routing Protocols W.lilakiatsakun.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 18: Data Management in Peer-to-Peer Systems Professor Chen Li Based on slides.
Routing Indices For P-to-P Systems ICDCS Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.
Switching breaks up large collision domains into smaller ones Collision domain is a network segment with two or more devices sharing the same Introduction.
Structuring P2P networks for efficient searching Rishi Kant and Abderrahim Laabid Abderrahim Laabid.
AODV: Introduction Reference: C. E. Perkins, E. M. Royer, and S. R. Das, “Ad hoc On-Demand Distance Vector (AODV) Routing,” Internet Draft, draft-ietf-manet-aodv-08.txt,
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
The new protocol of freenet Taken from Ian Clarke and Oskar Sandberg (The Freenet Project)
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search in Unstructured P2p.
a/b/g Networks Routing Herbert Rubens Slides taken from UIUC Wireless Networking Group.
Peer to Peer Network Design Discovery and Routing algorithms
Ricochet Robots Mitch Powell Daniel Tilgner. Abstract Ricochet robots is a board game created in Germany in A player is given 30 seconds to find.
1 Reading Report 3 Yin Chen 20 Feb 2004 Reference: Efficient Search in Peer-to-Peer Networks, Beverly Yang, Hector Garcia-Molina, In 22 nd Int. Conf. on.
Evaluation GUESS and Non-Forwarding Peer-to-Peer search ICDCS paper Beverly Yang Patrick Vinograd Hector Garcia-Molina Computer Science Department, Stanford.
1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,
School of Electrical Engineering &Telecommunications UNSW Cost-effective Broadcast for Fully Decentralized Peer-to-peer Networks Marius Portmann & Aruna.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Distance Vector Routing
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Internet Networking recitation #12
EE 122: Peer-to-Peer (P2P) Networks
Communication operations
Peer-to-Peer Information Systems Week 6: Performance
Presentation transcript:

Efficient Search in Peer to Peer Networks By: Beverly Yang Hector Garcia-Molina Presented By: Anshumaan Rajshiva Date: May 20,2002

P2P Networks Distributed systems in which nodes of equal roles and capabilities exchange information and services directly with each other.

Key Challenges Efficient techniques for search and retrieval of data. Best search techniques for a system depends on the needs of the application. Current search techniques in “loose” P2P systems tend to be very inefficient, either generating too much load on the system, or providing for a very bad user experience.

Current Reason for Inefficiency Queries are processed by more nodes than desired.

Suggested Improvement Processing queries through fewer nodes.

Techniques Iterative Deepening Directed BFS Local Indices

Problem Framework P2P: Undirected graph Vertices: nodes in the n/w Edges: Open connections between neighbors. Message will travel from A to B in hops. Length of the path: Number of hops Source of query: Node submitting the query

Problem Framework When a node receives a query it should process the query locally and respond to the query/forward/drop the query Address of the source node will be unknown to the responding node (scheme used by Gnutella)

Metrics Cost: Average Aggregate Bandwidth Average Aggregate Processing Cost Quality of Results: Number of Results Satisfaction of the query

Cost Message propagates across nodes,each node spend some processing resources on behalf of the query Main cost described in terms of bandwidth and processing cost

Cost Average Aggregate Bandwidth: The average over a set of representative queries of the aggregate BW consumed(in bytes) over each edge on behalf of the query Average Aggregate Processing Cost: The average over a set of representative queries of the aggregate processing power consumed at each node on behalf of the query

Quality of Results Results from the perspective of user Number of results: the size of total result set Satisfaction of the query:a query is satisfied if Z or more results are returned, where Z is some value specified by user Time to satisfaction: how long the user must wait for the Z th result to arrive

Current Techniques Gnutella: BFS technique is used with depth limit of D, where D= TTL of the message.At all levels <D query is processed by each node and results are sent to source and at level D query is dropped. Freenet: uses DFS with depth limit D.Each node forwards the query to a single neighbor and waits for a definite response from the neighbor before forwarding the query to another neighbor(if the query was not satisfied), or forwarding the results back to the query source(if query was satisfied).

Stop and Think… Quality of results measured only by number of results then BFS is ideal If Satisfaction is metrics of choice BFS wastes much bandwidth and processing power With DFS each node processes the query sequentially,searches can be terminated as soon as the query is satisfied, thereby minimizing cost.But poor response time due to the above

Broadcast Policy BFS and DFS falls on opposite extremes of bandwidth/processing cost and response time. Need to find some middle ground between the two extremes, while maintaining quality of results.

Iterative Deepening When satisfaction is the metric of choice Multiple BFS are initiated with successively larger depths, until query is satisfied or the maximum depth limit D is reached

Iterative Deepening System wide policy specifying at what depth the iterations are to occur Last depth in policy must be set to D A waiting period W ( time between successive iterations in the policy)must be specified

Working of Iterative Deepening Policy P{a,b,c} S initiates a BFS of depth a by sending out a query message with TTL=a to all its neighbors Once a node at depth a receives and process the message, instead of dropping it, the node will store the message temporarily Query becomes frozen there at all nodes a hops away from S (Frontier nodes) S receives response from those nodes that have processed the query so far.

Working of Iterative Deepening After waiting for time W if S finds that the query has already been satisfied, then it does nothing. Otherwise, if the query is not yet satisfied,s will start the next iteration, initiating BFS at depth b. S send out a resend message with TTL=a. A node that receives a resend message,simply unfreeze the query(stored temporarily) and forward the same with TTL= b-a to its neighbors. This process continues in the similar fashion till TTL=D is reached.At depth D, the query is dropped

Working of Iterative Deepening To identify queries with Resend messages, every query is assigned a system wide “unique identifier”. The resend message will contain the identifier of the query it is representing and nodes at the frontier of a search will know which query to unfreeze by inspecting this identifier.

Iterative Deepening Source Node1Node2Level 1 Node3Node4Level 2

Directed BFS If minimizing response time is important then Directed BFS. Strategy used is to send queries to a subset of nodes that will return many returns quickly by intelligently selecting those nodes based on some parameters. For this purpose, a node will maintain statistics on its neighbors

Directed BFS These statistics will be based on the number of results that were received through the neighbors for past queries By sending the queries to small subset of the nodes, the cost incurred will be reduced significantly The quality of results is not decreased significantly,provided we make neighbor selection intelligently

Local Indices A node maintains an index over the data of each node within r hops of itself, where r is a system wide variable called radius When a node receives a Query message, it can then process the query on behalf of every node within r hops of itself Collections of many nodes can be searched by processing the query at few nodes, while keeping the cost low

Local Indices R should be small. The index will be small- typically of the order of 50 KB- independent of the size of the network

Working of Local Indices Policy specifies at which depth query will be processed To create and maintain the indices at each node All nodes at depths not listed in the policy will simply forward the query to the next depth

Maintaining Indices Joining a new node: sends a join message with TTL=r and all the nodes within r hops update their indices. Join message contains the metadata about the joining node When a node receives this join message it, in turn, send join message containing its meta data directly to the new node.New node updates its indices

Maintaining Indices Node dies: Other nodes update their indices based on the timeouts Updating the node: When a node updates its collection, his node will send out a small update message with TTL= r, containing the metadata of the affected item.All nodes receiving this message subsequently update their index.

Results for Iterative Deepening

Results for Directed BFS

Results for Local Indices

Conclusions Compared to current techniques used in existing systems, the discussed techniques greatly reduce the aggregate cost of processing query over the entire system, while maintaining the quality of results Schemes are simple and practical to implement on the existing systems.

Conclusions

References Beverly Yang, Hector Garcia Molina Efficient search in peer to peer network, Available at