P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search in Unstructured P2p.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen Scott Shenker This is a modified version of the original presentation by the authors.
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Efficient Search - Overview Improving Search In Peer-to-Peer Systems Presented By Jon Hess cs294-4 Fall 2003.
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE, Rensselaer Polytechnic Institute.
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
1 Denial-of-Service Resilience in P2P File Sharing Systems Dan Dumitriu (EPFL) Ed Knightly (Rice) Aleksandar Kuzmanovic (Northwestern) Ion Stoica (Berkeley)
Evolution of P2P Content Distribution Pei Cao. Outline History of P2P Content Distribution Architectures History of P2P Content Distribution Architectures.
Teknik Routing Pertemuan 20 Matakuliah: H0484/Jaringan Komputer Tahun: 2007.
Building Low-Diameter P2P Networks Eli Upfal Department of Computer Science Brown University Joint work with Gopal Pandurangan and Prabhakar Raghavan.
A Hierarchical Energy-Efficient Framework for Data Aggregation in Wireless Sensor Networks IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 55, NO. 3, MAY.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)
Topics in Database Systems: Data Management in Peer-to-Peer Systems
Vassilios V. Dimakopoulos and Evaggelia Pitoura Distributed Data Management Lab Dept. of Computer Science, Univ. of Ioannina, Greece
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Replication.
Efficient Search in Peer to Peer Networks By: Beverly Yang Hector Garcia-Molina Presented By: Anshumaan Rajshiva Date: May 20,2002.
Searching in Unstructured Networks Joining Theory with P-P2P.
CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Cache Updates in a Peer-to-Peer Network of Mobile Agents Elias Leontiadis Vassilios V. Dimakopoulos Evaggelia Pitoura Department of Computer Science University.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions.
IEEE P2P, Aachen, Germany, September Ad-hoc Limited Scale-Free Models for Unstructured Peer-to-Peer Networks Hasan Guclu
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking BGP, Flooding, Multicast routing.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
CCAN: Cache-based CAN Using the Small World Model Shanghai Jiaotong University Internet Computing R&D Center.
Replication Strategies in Unstructured Peer-to-Peer Networks Edith CohenScott Shenker Some slides are taken from the authors’ original presentation.
Structuring P2P networks for efficient searching Rishi Kant and Abderrahim Laabid Abderrahim Laabid.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
Many random walks are faster than one Noga AlonTel Aviv University Chen AvinBen Gurion University Michal KouckyCzech Academy of Sciences Gady KozmaWeizmann.
Computer Networks Dr. Jorge A. Cobb The Performance of Query Control Schemes for the Zone Routing Protocol.
Lecture 3: Uninformed Search
Geo Location Service CS218 Fall 2008 Yinzhe Yu, et al : Enhancing Location Service Scalability With HIGH-GRADE Yinzhe Yu, et al : Enhancing Location Service.
K-Anycast Routing Schemes for Mobile Ad Hoc Networks 指導老師 : 黃鈴玲 教授 學生 : 李京釜.
An overview of Gnutella
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
1 Gossip-Based Ad Hoc Routing Zygmunt J. Haas, Joseph Halpern, LiLi Cornell University Presented By Charuka Silva.
On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.
Teknik Routing Pertemuan 10 Matakuliah: H0524/Jaringan Komputer Tahun: 2009.
Gerhard Haßlinger Search Methods in Dynamic Wireless Networks  Challenges for search in wireless networks  Random walks and flooding for search with.
Project funded by the Future and Emerging Technologies arm of the IST Programme Are Proliferation Techniques more efficient than Random Walk with respect.
Aug 22, 2002Sigcomm 2002 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR.
Design of a Robust Search Algorithm for P2P Networks
Project funded by the Future and Emerging Technologies arm of the IST Programme Search in Unstructured Networks Niloy Ganguly, Andreas Deutsch Center for.
School of Electrical Engineering &Telecommunications UNSW Cost-effective Broadcast for Fully Decentralized Peer-to-peer Networks Marius Portmann & Aruna.
Distance Vector Routing
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.
1 “Hybrid Search Schemes for Unstructured Peer- to-Peer Networks” “Random Walks in Peer-to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Unstructured Networks: Search Márk Jelasity. 2 Outline ● Emergence of decentralized networks ● The Gnutella network: how it worked and looked like ● Search.
Lecture 3: Uninformed Search
On Growth of Limited Scale-free Overlay Network Topologies
Joydeep Chandra, Santosh Shaw and Niloy Ganguly
CMSC 471 Fall 2011 Class #4 Tue 9/13/11 Uninformed Search
Presentation transcript:

p2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search in Unstructured P2p

p2p, Fall 06 2 Topics in Database Systems: Data Management in Peer-to-Peer Systems Q. Lv et al, “Search and Replication in Unstructured Peer-to- Peer Networks”, ICS02

p2p, Fall 06 3 Search and Replication in Unstructured Peer-to-Peer Networks Type of replication depends on the search strategy used (i)A number of blind-search variations of flooding (ii)A number of (metadata) replication strategies Evaluation Method: Study how they work for a number of different topologies and query distributions

p2p, Fall 06 4 Methodology Three aspects of P2P Performance of search depends on  Network topology: graph formed by the p2p overlay network  Query distribution: the distribution of query frequencies for individual files  Replication: number of nodes that have a particular file Assumption: fixed network topology and fixed query distribution Results still hold, if one assumes that the time to complete a search is short compared to the time of change in network topology and in query distribution

p2p, Fall 06 5 Network Topology (1) Power-Law Random Graph A 9239-node random graph Node degrees follow a power law distribution when ranked from the most connected to the least, the i-th ranked has ω/i a, where ω is a constant Once the node degrees are chosen, the nodes are connected randomly log scale x:i-st node, y: #linls

p2p, Fall 06 6 Network Topology (2) Normal Random Graph A 9836-node random graph

p2p, Fall 06 7 Network Topology (3) Gnutella Graph (Gnutella) A 4736-node graph obtained in Oct 2000 Node degrees roughly follow a two-segment power law distribution

p2p, Fall 06 8 Network Topology (4) Two-Dimensional Grid (Grid) A two dimensional 100x100 grid

p2p, Fall 06 9 Network Topology

p2p, Fall Query Distribution Assume m objects Let q i be the relative popularity of the i-th object (in terms of queries issued for it) Values are normalized Σ i=1, m q i = 1 (1) Uniform: All objects are equally popular q i = 1/m (2) Zipf-like q i  1 / i α

p2p, Fall Replication Each object i is replicated on r i nodes and the total number of objects stored is R, that is Σ i=1, m r i = R (1) Uniform: All objects are replicated at the same number of nodes r i = R/m (2) Proportional: The replication of an object is proportional to the query probability of the object r i  q i (3) Square-root: The replication of an object i is proportional to the square root of its query probability q i r i  √q i (next week: we shall show that this is optimal)

p2p, Fall Query Distribution & Replication When the replication is uniform, the query distribution is irrelevant (since all objects are replicated by the same amount, search times are equivalent for both hot and cold items) When the query distribution is uniform, all three replication distributions are equivalent (uniform!) Thus, three relevant combinations query-distribution/replication (1)Uniform/Uniform (2)Zipf-like/Proportional (3)Zipf-like/Square-root

p2p, Fall Metrics Performance aspects Pr(success): probability of finding the queried object before the search terminates (note: in structured success was guaranteed) #hops: delay in finding an object as measured in number of (overlay) hops

p2p, Fall Metrics Load metrics #msgs per node: (routing load) Overhead of an algorithm as measured in average number of search messages each node in the p2p has to process #nodes visited per query Percentage of message duplication (total_#msgs - #nodes visited)/ total_#msgs Peak #msgs: the number of messages that the busiest node has to process (to identify hot spots) All are workload metrics

p2p, Fall Metrics These are per-query measures An aggregated performance measure, each query convoluted with its probability Σ i q i * a(i)

p2p, Fall Simulation Methodology For each experiment, First select the topology and the query/replication distributions For each object i with replication r i, generate numPlace different sets of random replica placements (each set contains r i random nodes on which to place the replicas of object i) For each replica placement, randomly choose numQuery different nodes from which to initiate the query for object i Thus, we get numPlace x numQuery runs Run each experiment independent for each object In the paper, numPlace = 10 and numQuery = 100 -> 1000 different queries per object

p2p, Fall Limitation of Flooding Choice of TTL  Too low, the node may not find the object, even if it exists  Too high, burdens the network unnecessarily Search for an object that is replicated at 0.125% of the nodes (~11 nodes of total 9000) Note that TTL depends on the topology Also depends on replication (which is however unknown to the user who determines TTL)

p2p, Fall Limitation of Flooding Choice of TTL Overhead Also depends on the topology Maximum Load: Random graph: maximum load of any node is logarithmic to the total number of nodes that the search visits High degree nodes in PLRG and Gnutella much higher loads

p2p, Fall Limitation of Flooding There are many duplicate messages (due to cycles) particularly in high connectivity graphs Multiple copies of a query are sent to a node by multiple neighbors Duplicated messages can be detected and not forwarded BUT, the number of duplicate messages can still be excessive and worsens as TTL increases

p2p, Fall Limitation of Flooding Different nodes

p2p, Fall Limitation of Flooding: Comparison of the topologies Power-law and Gnutella-style graphs particularly bad with flooding Highly connected nodes means higher duplication messages, because many nodes’ neighbors overlap Random graph best, Because in truly random graph the duplication ratio (the likelihood that the next node already received the query) is the same as the fraction of nodes visited so far, as long as that fraction is small Random graph better load distribution among nodes

p2p, Fall Two New Blind Search Strategies 1.Expanding Ring – not a fixed TTL (iterative deepening) 2. Random Walks (more details) – reduce number of duplicate messages

p2p, Fall Expanding Ring or Iterative Deepening Note that since flooding queries node in parallel, search may not stop even if the object is located Use successive floods with increasing TTL  A node starts a flood with a small TTL  If the search is not successful, the node increases the TTL and starts another flood  The process repeats until the object is found Works well when hot objects are replicated more widely than cold objects

p2p, Fall Expanding Ring or Iterative Deepening (details) Need to define  A policy: at which depths the iterations are to occur (i.e. the successive TTLs)  A time period W between successive iterations  after waiting for a time period W, if it has not received a positive response (i.e. the requested object), the query initiator resends the query with a larger TTL Nodes maintain ID of queries for W + ε Α node that receives the same message as in the previous round does not process it, it just forwards it

p2p, Fall Expanding Ring Start with TTL = 1 and increase it linearly at each time by a step of 2 For replication over 10%, search stops at TTL 1 or 2

p2p, Fall Expanding Ring Comparison of message overhead between flooding and expanding ring Even for objects that are replicated at 0.125% of the nodes, even if flooding uses the best TTL for each topology, expending ring still halves the per-node message overhead

p2p, Fall Expanding Ring More pronounced improvement for Random and Gnutella graphs than for the PLRG partly because the very high degree nodes in PLGR reduce the opportunity for incremental retries in the expanding ring Introduce slight increase in the delays of finding an object: From 2 to 4 in flooding to 3 to 6 in expanding ring Duplicate messages still a problem

p2p, Fall Random Walks Forward the query to a randomly chosen neighbor at each step Each message a walker k-walkers The requesting node sends k query messages and each query message takes its own random walk k walkers after T steps should reach roughly the same number of nodes as 1 walker after kT steps So cut delay by a factor of k 16 to 64 walkers give good results

p2p, Fall Random Walks When to terminate the walks  TTL-based  Checking: the walker periodically checks with the original requestor before walking to the next node (again uses (a larger) TTL, just to prevent loops) Experiments show that checking once at every 4th step strikes a good balance between the overhead of the checking message and the benefits of checking

p2p, Fall Random Walks The 32-walker random walk When compared to flooding, reduces message overhead by roughly two orders of magnitude for all queries across all network topologies at the expense of a slight increase in the number of hops (increasing from 2-6 to 4-15) When compared to expanding ring, The 32-walkers random walk outperforms expanding ring as well, particularly in PLRG and Gnutella graphs

p2p, Fall Random Walks Keeping State  Each query has a unique ID and its k-walkers are tagged with this ID  For each ID, a node remembers the neighbor it has forwarded the query  When a new query with the same ID arrives, the node forwards it to a different neighbor (randomly chosen) Improves Random and Grid by reducing up to 30% the message overhead and up to 30% the number of hops Small improvements for Gnutella and PLRG

p2p, Fall Principles of Search  Adaptive termination is very important Must avoid message explosion at the requester Expanding ring or the checking method  Message duplication should be minimized Preferably, each query should visit a node just once  Granularity of the coverage should be small Increase of each additional step should not significantly increase the number of nodes visited

p2p, Fall Replication Next time