Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.

Slides:



Advertisements
Similar presentations
A Construction of Locality-Aware Overlay Network: mOverlay and Its Performance Found in: IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 22, NO.
Advertisements

Scalable Content-Addressable Network Lintao Liu
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Network Coding in Peer-to-Peer Networks Presented by Chu Chun Ngai
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Resilient Peer-to-Peer Streaming Paper by: Venkata N. Padmanabhan Helen J. Wang Philip A. Chou Discussion Leader: Manfred Georg Presented by: Christoph.
Expediting Searching Processes via Long Paths in P2P Systems 05/30 IDEA Lab.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Peer-to-peer Multimedia Streaming and Caching Service Jie WEI, Zhen MA May. 29.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
1 Denial-of-Service Resilience in P2P File Sharing Systems Dan Dumitriu (EPFL) Ed Knightly (Rice) Aleksandar Kuzmanovic (Northwestern) Ion Stoica (Berkeley)
A Trust Based Assess Control Framework for P2P File-Sharing System Speaker : Jia-Hui Huang Adviser : Kai-Wei Ke Date : 2004 / 3 / 15.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
1 CAPS: A Peer Data Sharing System for Load Mitigation in Cellular Data Networks Young-Bae Ko, Kang-Won Lee, Thyaga Nandagopal Presentation by Tony Sung,
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Object Naming & Content based Object Search 2/3/2003.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
Peer-to-peer Multimedia Streaming and Caching Service by Won J. Jeon and Klara Nahrstedt University of Illinois at Urbana-Champaign, Urbana, USA.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
P2P File Sharing Systems
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.
P2P Architecture Case Study: Gnutella Network
Privacy-Preserving P2P Data Sharing with OneSwarm -Piggy.
09/07/2004Peer-to-Peer Systems in Mobile Ad-hoc Networks 1 Lookup Service for Peer-to-Peer Systems in Mobile Ad-hoc Networks M. Tech Project Presentation.
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.
Peer-to-Peer Networks University of Jordan. Server/Client Model What?
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
The Start Shawn Fanning (19-yr-old student nicknamed Napster) developed the original Napster application and service in January 1999 while a freshman.
Othman Othman M.M., Koji Okamura Kyushu University 1.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Serverless Network File Systems Overview by Joseph Thompson.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
PRoPHET+: An Adaptive PRoPHET- Based Routing Protocol for Opportunistic Network Ting-Kai Huang, Chia-Keng Lee and Ling-Jyh Chen.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
SocialTube: P2P-assisted Video Sharing in Online Social Networks
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
A NOVEL SOCIAL CLUSTER-BASED P2P FRAMEWORK FOR INTEGRATING VANETS WITH THE INTERNET Chien-Chun Hung CMLab, CSIE, NTU, Taiwan.
SocialVoD: a Social Feature-based P2P System Wei Chang, and Jie Wu Presenter: En Wang Temple University, PA, USA IEEE ICPP, September, Beijing, China1.
Universitatea Politehnica Bucureşti - Facultatea de Automatică şi Calculatoare TOWARDS A SECURE DATA SHARING PEER-TO-PEER NETWORK BASED ON GEOMETRIC AND.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Video Caching in Radio Access network: Impact on Delay and Capacity
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
The Impact of Replacement Granularity on Video Caching
Presentation transcript:

Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary 2 Los Alamos National Laboratory 3 Michigan State University

Peer-to-Peer Search Two Performance Objectives –Individual peer: improve the search quality –Internet management: minimize the search cost Fast, fast, fast, and the more the better! P2P user Don’t be so greedy, the Internet is shared by all the people! Network manager

Existing Solutions Generally aims to one of the two objectives and have performance limits to the other Flooding: –Most effective for user’s experience –Least efficient for network resource utilization Random walk: –Traffic efficient, but –Long response time and limited number of search results

Super-Node Architecture Super-node –Index server for its leaf nodes Problems –Index based search has limits Hard for full-text search Impossible for encrypted content search – Not responsible for the content quality of its leaf nodes –The structure becomes large and inefficient. A leaf node has to connect to multiple super-nodes to avoid single point failure Generating an increasingly large number of super-nodes

Gnutella Population in One Day (2003) number of peers number of super peers One super node only connects to 3-4 peers in average!

Outline Our Measurement Study CAC: Constructing Content Abundant Cluster SPIRP: Selectively Prefetching Indices from Responding Peers CAC-SPIRP: Combining CAC and SPIRP Performance Evaluation Conclusion

Our Measurement Study Existing measurement studies –A small percentage of popular files account for most shared storage and transmissions in P2P systems –A small amount of peers contribute majority number of files in P2P. –They are only the indirect evidence of content locality Some files may be never accessed, or accessed rarely Our purpose –Fully understand the localities in the peer community and individual peers –Get first-hand traces for our simulation study

Trace Collection Four-day crawling on the Gnutella network –Open source code of LimeWire Gnutella –Session based collection (for the whole life time of peers) Query sending traces by different peers –25,764 peers –409,129 queries Content indices of different peers –Full indices of 18,255 peers –37% free riders

Top Content Providers (in percentage) Queries Replied by Top Query Responders (%) Results Replied by Top Result Providers (%) Content Locality in the Peer Community A small group of peers can reply nearly all queries and provide most of results Number of Queries Percentage of Peers (%) Percentage of Peers (%) Number of Results

The Localities of Search Interests of Individual Peers A peer can get search results from a small number of its top query responders: they share the same search interests Similar to the idea in Locality of Interest scheme, but our conclusion is based on real P2P systems Top Query Responders Top Result Providers top 1 top 10 top 5% top 10% top 20% Query Contributions (%) Result Contributions (%)

Reorganizing the P2P Management Structure Clustering those small number of content abundant peers Prefetching indices from those top query responders

CAC: Constructing Content Abundant Cluster Objectives –Clustering those small number of content abundant peers in P2P overlay –Providing high quality and fast service Content Abundant Cluster –An overlay on top of P2P network –Self-evaluate, self-identify, and self-organize –Persistent public service for all peers in the system –Strong content-based (not index-based)

ClusteringLeveling CAC: System Structure C A C X Dynamic Update

CAC: Search Operations Queries are sent to CAC first –Up-flowing operation –Flooding in CAC Unsatisfied queries are propagated from CAC to the whole system –Down-flooding operation –Propagated from low levels to high levels

Up-flowing C A C

Down-flooding C A C Unused links

SPIRP: Selectively Prefetching Indices from Responding Peers Basic operations –Peer I initiates a query q Query hits: displays the results Misses: sends q –Peer R responds query q sends query results as well as piggybacks indices of all shared files –Peer I receives response Display the searching results as well as stores piggybacked indices Indices updating –Active updating indices by responding peers –Updating indices demanded by requesting peers Replacement of file indices

Where are these files? Pop music Classic music SPIRP Technique ♫ ♫ R1 R2 Query = “Beethoven mp3” I

SPIRP Technique pop classic NULL R1 R2 Query = “Beetle mp3” Where are these files? I

SPIRP Technique classic pop R1 R2 Query = “Beetle mp3” I

SPIRP Technique classic pop R1 R2 Query = “Beetle mp3” No enough space to save indices I

SPIRP Technique classic pop ♫ ♫ R1 R2 Replace complete I Query = “Beetle mp3”

CAC-SPIRP CAC: application level infrastructure –Significantly reducing bandwidth consumption –Good response time when queries success in CAC –Long response time when queries fail in CAC SPIRP: client-oriented and overlay independent –Significantly reducing response time –Small traffic when queries can be satisfied in cache –Same traffic as flooding when cache misses CAC-SPIRP –Easy to combine the two techniques –Consider the trade-off between the two performance objectives –Has both merits of search quality and search cost

Simulation Environment Content trace and query trace –4 day Gnutella crawling in our measurement Overlay topology –Traces by Clip2 Distributed Search Solutions –Topology generated by GT-ITM Session duration –Pareto distribution fitted from measurement results P(x) = * x

Evaluation Metrics Query success rate relative to the success rate by flooding –cluster relative success rate for CAC –local relative success rate for SPIRP Overall network traffic –accumulated communication traffics for all queries, responses, and index transferring Average response time –use the number of routing hops Evaluate for different query satisfactions –1, 10, 50 results, representing different user demands

Performance Evaluation for CAC Cluster Size (In Percentage of P2P Network Size) 5% top content abundant peers are good enough for cluster construction Overall Traffic (Normalized) Cluster Size (In Percentage of P2P Network Size) Cluster Size (In Percentage of P2P Network Size) Cluster Size (In Percentage of P2P Network Size) Overall Traffic (Normalized) Overall Traffic (Normalized) Minimum Results = 1 Minimum Results = 10 Minimum Results = 50 Minimum Results = 1 Minimum Results = 10 Minimum Results = 50 Minimum Results = 1 Minimum Results = 10 Minimum Results = 50

CAC Member Selection Threshold of Content-Abundant Peers Relative Cluster Success Rate Minimum Results = 1 Minimum Results = 10 Minimum Results = 50 Average Response Time (Normalized) Overall Traffic (Normalized) Threshold of Content-Abundant Peers Threshold of Content-Abundant Peers Minimum Results = 1 Minimum Results = 10 Minimum Results = 50 Minimum Results = 1 Minimum Results = 10 Minimum Results = 50 Overall traffic is not sensitive to CAC member quality Traffic can be significantly reduced even for randomly selected CAC members CAC download flooding is very efficient

CAC-SPIRP Overall Performance Peers having 1 to 5 queries satisfied Peers having 10 to 20 queries satisfied Peers having 30 to 40 queries satisfied Peers having at least 50 queries satisfied Peers having 1 to 5 queries satisfied Peers having 10 to 20 queries satisfied Peers having 30 to 40 queries satisfied Peers having at least 50 queries satisfied Query Satisfaction = 1 Query Satisfaction = 10 Query Satisfaction = Size of Incoming Index Set Buffer (in M Bytes) Average Response Time (Normalized) Average Response Time (Normalized) Average Response Time (Normalized) Size of Incoming Index Set Buffer (in M Bytes) Size of Incoming Index Set Buffer (in M Bytes) 0 CAC-SPIRP reduces both the overall traffic and response time significantly

Conclusion CAC-SPIRP resolves the P2Psearch problem from two contradictive aspects –For Internet management CAC: a content abundant cluster to provide high quality and fast services without introducing too many traffic –For user experience SPIRP: a client prefetching technique to speed up search by avoiding unnecessary queries