Search in Power-Law Networks Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems Slides also borrowed from the following paper Path Finding Strategies in Scale-Free Networks by Beom Jun Kim, Chang No Yoon, Seung Kee Han, and Hawoong Jeon By Lada A. Adamic, Amit R. Puniyani*, Rajan M. Lukose, and Bernardo A. Huberman HP Labs and Stanford University* P P PP P P SS SS Q R D P P PP P P Q R P Q Q NapsterGnutella
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:2 Review of Existing File Location Mechanisms Various design objectives –Gnutella: works well for short-lived nodes, highly dynamic environments Haystack –Freenet: provides “anonymity” –CAN, Chord, Pastry, Tapestry: Low, scalable response latency No false negatives Needles Independent of usage behavior –Applied to music sharing communities –What about others? Scientific collaborations Others?
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:3 Questions Do networks/collaboration/file sharing/etc exhibit any particular network patterns (small-world?Scale- Free?Power-Law?Poisson?)?
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:4 High-Energy Physics Collaboration Fermi National Accelerator Laboratory’s D0 experiment: 1000s physicists (not all actively accessing data at any moment), 18 countries, 70+ institutions Small world!
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:5 Exploit Small-World Behavior Kleinberg’s algorithm: –Search in (structured) small-world network –Greedy search: O(log 2 N) –However, works only for particular type of small world Assumes global knowledge
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:6 Search in Dynamic P2P Networks Not possible to know target host –Due to dynamic nature Node storing file not known until a real-time search is performed. No global information about position of target –Not possible to determine forward progress Solution –Central index (e.g. Napster) –Distributed Queries (e.g. Gnutella) –Distributed Directories+Queries (e.g. this paper)
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:7 Power-Law Many communication and social networks have power-law link distributions Containing a few nodes which have a very high degree and many with low degree. The high connectivity nodes play the important role of hubs in communication and networking
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:8 Power-Law Dynamic Generation Starting with a small number (m 0 ) of nodes Add node with m edges at each time step –probability i of being connected to the existing vertex i is proportional to the connectivity k i of that vertex –k i is the number of directly connected vertices to i. –k i = (k i + 1)/ j (k j + 1) with the summation over the whole network at a given instant. –Constructs scale-free network Ideas based on growth and preferential attachment Shows the power-law behavior in the connectivity distribution. Need path connecting two nodes in the network.
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:9 Search in Power-Law High connectivity can be exploited when designing efficient search algorithms. As shown by AT&T –the out-link degree distribution for a massive graph of telephone calls between individuals has a clean power- law form with an exponent of approximately 2.1. –Reflects the presence of central individuals who interact with many others on a daily basis and play a key role in relaying information.
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:10 Goals Propose local search strategies –Utilize high degree nodes in power-law graphs –Search costs scale sub-linearly with the size of the graph. –Better than random Demonstrate strategies on the Gnutella P2P network.
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:11 Search Algorithms Random –Choose a random neighbor to forward msg. Max –Choose neighbor with highest degree –Requires that local node knows neighbors degree –Effect Short initial climb Continue down degree sequence
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:12 Results I –Scaling of the average search time vs graph size –Max performs better than Random But not as good as analytical mode. Nodes revisted
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:13 Results II –Max finds 50% of nodes in 10 steps It takes 10+2=12 hops to reach 50% of graphs –Average number of hops 217! Large number of 1- or 2- degree nodes
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:14 Power-Law compared to Poisson Poisson graph –All links are randomly distributed (I.e. have same degree) Revisits more likely in power-law graphs. Power-Law Poisson
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:15 Summary of Power-Law Search Start at a random node Follow degree sequence –That is, node with richest links –Followed by node with second richest Scan the maximum number of nodes –Minimum number of steps
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:16 Gnutella Modifications Use Max instead of broadcast –Store index for neighbor files. Results in efficient search for 50% of files. Need rich nodes in CPU,Bandwidth,Storage capacity –The “rich get richer”
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:17 Notes about Power-Law Algorithm created local directories valid within a two hop radius. Network is resilient to random failures More resistant than central server Power-Law in general –More resistant to random failure than poisson –Less resilient to attack on high degree nodes.
P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleySearch in Power-Law Networks:18 Closing Questions How is file-sharing different than maintaining personal storage?