Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.

Slides:



Advertisements
Similar presentations
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen Scott Shenker This is a modified version of the original presentation by the authors.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Network Coding in Peer-to-Peer Networks Presented by Chu Chun Ngai
Peer-to-Peer Distributed Search. Peer-to-Peer Networks A pure peer-to-peer network is a collection of nodes or peers that: 1.Are autonomous: participants.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Small-world Overlay P2P Network
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Autonomous Replication for High Availability in Unstructured P2P Systems Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen Department of.
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
CS 104 Introduction to Computer Science and Graphics Problems
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Database caching in MANETs Based on Separation of Queries and Responses Author: Hassan Artail, Haidar Safa, and Samuel Pierre Publisher: Wireless And Mobile.
Performance Evaluation of Peer-to-Peer Video Streaming Systems Wilson, W.F. Poon The Chinese University of Hong Kong.
Vassilios V. Dimakopoulos and Evaggelia Pitoura Distributed Data Management Lab Dept. of Computer Science, Univ. of Ioannina, Greece
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Searching in Unstructured Networks Joining Theory with P-P2P.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Wide-area cooperative storage with CFS
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
PROMISE: Peer-to-Peer Media Streaming Using CollectCast Presented by: Randeep Singh Gakhal CMPT 886, July 2004.
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
COCONET: Co-Operative Cache driven Overlay NETwork for p2p VoD streaming Abhishek Bhattacharya, Zhenyu Yang & Deng Pan.
PNear Combining Content Clustering and Distributed Hash-Tables Ronny Siebes Vrije Universiteit, Amsterdam The netherlands
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Distributing Layered Encoded Video through Caches Authors: Jussi Kangasharju Felix HartantoMartin Reisslein Keith W. Ross Proceedings of IEEE Infocom 2001,
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Full-Text Search in P2P Networks Christof Leng Databases and Distributed Systems Group TU Darmstadt.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Kaleidoscope – Adding Colors to Kademlia Gil Einziger, Roy Friedman, Eyal Kibbar Computer Science, Technion 1.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Practical LFU implementation for Web Caching George KarakostasTelcordia Dimitrios N. Serpanos University of Patras.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
What is Web Information retrieval from web Search Engine Web Crawler Web crawler policies Conclusion How does a web crawler work Synchronization Algorithms.
Data Indexing in Peer- to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
The Impact of Replacement Granularity on Video Caching
Statistical Optimal Hash-based Longest Prefix Match
Presentation transcript:

Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner

Content Introduction Related Works Adaptive Algorithms Experimental Results Optimization Theory Conclusion

Introduction (1) P2P file sharing is the dominant traffic type in the Internet Two types of P2P system – Unstructured, e.g. KaZaA and Gnutella Nodes are not organized into highly-structured overlays Content is randomly assigned to nodes – Structured, e.g. CAN, Chord Distributed hash table (DHT) substrates are used Nodes are organized into highly-structured overlays Keys are deterministically assigned to nodes

Introduction (2) Assume the system is DHT-based P2P file-sharing communities – P2P community: a collection of intermittently-connected nodes – Nodes: contribute storage, content and bandwidth to the rest of the community – A node in the community wants a file Retrieve the file from the other nodes in the community If the file is not found, the community retrieves the file from outside The file will be cached and a copy will be forwarded to the requesting node

Introduction (3) Address the problem of content management in P2P file sharing communities Propose algorithms to adaptively manage content – Minimize the average delay: the time from when a node makes a query for a file until the node receives the file in its entirety. – File transfer delay >> lookup delays – Intra-community file transfers occur at relatively fast rates as compared with file transfers into the community

Introduction (4) PROBLEM is equivalent to “adaptively managing content to maximize intra-community hit rates” – Replication: how should content be replicated to provide satisfactory hit rates – Replacement: how does a node determine to keep/evict the files Contributions – Algorithms for dynamically replicating and replacing files in a P2P community No a priori assumptions about file request rate or nodal up probabilities Simple, adaptive and fully distributed – Analytical optimization theory to benchmark the adaptive replication algorithms For complete-file replication For the case when files are segmented and erasure codes are used

Related Works Squirrel [8] – Distributed, server-less, P2P web caching system – Built on top of the Pastry DHT substrate – Focus on the protocol design and implementation – Not address the issues of replication and file replacement In [13] and [14], it studied optimal replication in an unstructured peer-to-peer network – Reduce random search times

DHT Substrate Node has access to the API of a DHT substrate The substrate takes a file j as input and determines an ordered list of the up nodes For a given value of K, (i 1, i 2,…,i K ) i 1 is the first-place winner for file j

LRU Algorithms (1) Fundamental Problem: – “How can we adaptively add and remove replicas, in a distributed manner and as a function of evolving demand, to maximize the hit probability?” Suppose X is a node that wants file j Basic LRU Algorithm – X uses the substrate to determine i 1, the first place winner for j – If i 1 doesn’t have j, i 1 retrieves j from outside the community and copies the file in storage – If i 1 needs to make room for j, LRU replacement policy is used – i 1 sends j to X – X does not put j in its storage

LRU Algorithms (2) Basic LRU Algorithm – A request can be a “miss” even when the file is cached in some up node within the community Top-K LRU Algorithm – When i 1 doesn’t have j, i 1 determines i 2,…,i K and pings each of these K-1 nodes to see if any of them have j – If so, i 1 retrieves j from one of the nodes and puts a copy in its storage – Otherwise, i 1 retrieves j from outside the community The algorithm replicates content – Without any a priori knowledge of request patterns or nodal up probabilities – Fully distributed

Observations Top-K LRU algorithm is simple but its performance is significantly below the theoretical optimal Observed that – LRU let unpopular file linger in nodes. Intuitively, if we do not store the less popular files, the popular files will have more replicas – Searching more than one node is needed to find files under the file-sharing system

MFR Algorithm (1) Most Frequently Requested (MFR) Algorithm – Has near optimal performance Each node i maintains an estimate of a j (i), the local request rate for file j a j (i) is the number of requests that node i has seen for file j divided by the amount of time node i has been up Each node i stores the files with the highest a j (i) values, packing in as many files as possible

MFR Algorithm (2) MFR retrieval and replacement policy – Node i receives a request for file j, it updates a j (i) – If i doesn’t have j and MFR say it should, i retrieves j from the outside and puts j in its storage – If i needs to make room for j, MFR replacement policy is used Searching more than one node is needed – “Ping” dynamics to influence a j (i) so that the number of replicas across all nodes become nearly optimal

MFR Algorithm (3) “Ping” the top-K winners in parallel – Retrieve the file from any node that has the file – Each “Ping” could be considered a request – Nodes update their request rate and manage their storage with MFR – However, this approach doesn’t give better performance Sequentially request j from the top-K winners – Stop the sequential requests once j is found

Experiment Results (1) Run simulation experiments – 100 nodes and files – Request probabilities follow a Zipf distribution with parameters 0.8 and 1.2 – All file sizes are the same – Each node contributes the same amount of storage Measure the hit performance of the algorithm

Experiment Results (2) LRU performs better than non-cooperative algorithm but significantly worse than the theoretical optimal

Experiment Results (3)

Experiment Results (4) Using a K greater than 1 improves the hit probability K beyond 5 gives insignificant improvement

Experiment Results (5) The number of replicas is changing over time, the graphs report the average values The optimal scheme replicates the more popular files much more aggressively The optimal scheme does not store the less popular files

Experiment Results (6)

Experiment Results (7) The MFR algorithm is very close to optimal Thus, the hit rates also are very close to optimal

Analysis of MFR (1) Analytical procedure for calculating the steady-state replica profile and hit probability for Top-K MFR for the case K=I The results still serve as excellent approximations for when K is small Assume – I is the number of nodes – J is the number of distinct files – p i is “up” probability of node i – S i is the amount of shared storage (in bytes) in node i – b j is the size (in bytes) of file j – q j is the request probability for file j The request probability for the J files are known

Analysis of MFR (2) The procedure sequentially places copies of files into the nodes – T i is the remaining unallocated storage – x ij is equal to 1 if a copy of file j has been placed in node i Initializes T i =S i, x ij =0 and v j =q j /b j – Find file j that has the largest value of v j – Sequentially examine the winning nodes for j until a node is found such that T i >=b j and x ij =0 Set x ij =1; Set v j =v j (1-p i ); Set T i =T i -b j – If there is no node such that T i >=b j and x ij =0, remove file j from further consideration – Return to Step 1 if all files have not been removed from consideration

Optimization Theory (1) Analytical theory for optimal replication in P2P communities – Complete-File Replication (No Fragmentation) – File are segmented and erasure coded No Fragmentation Subject to

Optimization Theory (2) The problem is NP-complete Consider a special case – p i =p – n j =number of replicas for file j The problem can be efficiently solved by dynamic programming Subject to

Optimization Theory (3) Upper bound on the performance of adaptive management algorithms for the case of erasures – File j is made up of R j erasures – Any M j of the R j erasures are needed to reconstruct the file – Size of each erasure is b j /M j – Assume homogenous “up” probabilities, p i =p – r th erasure of file j as erasure jr, r=1,…,R j – n jr is the number of erasures jr stored in the community of nodes

Optimization Theory (4) 0-1 random variable which is 1 if any of the n jr erasures jr is in some up node A hit for a request for file j if any M j of the R j erasures for file j are available

Optimization Theory (5) Theorem 2.2 of Boland et al [24], the function is Shur concave Subject to

Optimization Theory (6) Special case: No erasures – R j =M j =1 where q j /b j plays a key role in influencing the number of replicas It is upper bound on the true optimal because it is the optimal over continuous variables rather than integer variables

Conclusion Claim that structured/DHT-designs will potentially improve search and download performance Proposed the Top-K MFR algorithm, which is simple, fully distributed, adaptive, near-optimal Introduce an optimization methodology for bench- marking the performance of adaptive algorithms The methodology can also be applied to designs that use erasures