Information-Centric Networks05b-1 Week 5 / Paper 2 A survey of peer-to-peer content distribution technologies –Stephanos Androutsellis-Theotokis, Diomidis.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
University of Cincinnati1 Towards A Content-Based Aggregation Network By Shagun Kakkar May 29, 2002.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Gnutella, Freenet and Peer to Peer Networks By Norman Eng Steven Hnatko George Papadopoulos.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Wide-area cooperative storage with CFS
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Peer to Peer Networks By Cathy Chen CMSC 621, Fall 2007.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
FastTrack Network & Applications (KaZaA & Morpheus)
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
PEER TO PEER (P2P) NETWORK By: Linda Rockson 11/28/06.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Freenet “…an adaptive peer-to-peer network application that permits the publication, replication, and retrieval of data while protecting the anonymity.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
Information-Centric Networks Section # 5.2: Content Distribution Instructor: George Xylomenos Department: Informatics.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Bruce Hammer, Steve Wallis, Raymond Ho
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Freenet: Anonymous Storage and Retrieval of Information
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
P2P Networking: Freenet Adriane Lau November 9, 2004 MIE456F.
Peer-to-peer systems ”Sharing is caring”. Why P2P? Client-server systems limited by management and bandwidth P2P uses network resources at the edges.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Advanced Computer Networks: Part 1
CS 268: Lecture 22 (Peer-to-Peer Networks)
Peer-to-Peer Data Management
CHAPTER 3 Architectures for Distributed Systems
EE 122: Peer-to-Peer (P2P) Networks
Presentation transcript:

Information-Centric Networks05b-1 Week 5 / Paper 2 A survey of peer-to-peer content distribution technologies –Stephanos Androutsellis-Theotokis, Diomidis Spinellis –ΑCM Computing Surveys, Volume 36, Issue 4, 2004 Main point –Content distribution is one of many uses of P2P –Design decisions influence nonfunctional characteristics –A taxonomy of P2P systems for content distribution is given

Information-Centric Networks05b-2 What is P2P? Gnutella? Kazaa? Or Napster? –From purely distributed to based on centralized servers Some emphasize the use of resources at the edge –But Napster also does that, although it relies on servers Two defining characteristics –Sharing of resources by direct exchange Centralized servers only for specific tasks (e.g. bootstrap) Nodes need to participate in maintenance tasks –Instability and variable connectivity is the norm Fault-tolerant and self-organizing capacity Definition of P2P based on above characteristics –Covers Gnutella and Kazaa but not Napster However Napster is also mentioned as many see it as P2P

Information-Centric Networks05b-3 Classification of P2P architectures Communication and collaboration –Chat and instant messaging Distributed computation –Use of CPU cycles from the edge Internet service support –Multicast, indirection, security Database systems Content distribution –Focus of this study –Support data exchange in various ways Direct file exchange between nodes (Gnutella) Shared storage media (Oceanstore)

Information-Centric Networks05b-4 Content distribution with P2P P2P applications –Content distribution systems based on P2P technology –P2P file exchange systems Facilities to search and transfer files between peers Generally best-effort –P2P content publishing and storage Create a distributed storage medium Generally controlled access to the data P2P infrastructures –Not working applications but substrates for them –Routing and location –Anonymity –Reputation management

Information-Centric Networks05b-5 P2P object location and routing P2P nodes form an overlay network –Connections between nodes are paths on a physical network Typically the physical (underlay) network is IP with UDP or TCP Overlay network centralization types –Purely decentralized: all nodes are the same (servents) –Partially centralized: some supernodes are more important –Hybrid decentralized: central servers facilitate the network Network structure –Unstructured: content placement unrelated to topology Content location is tricky (flooding or random walks) –Structured: pointers to content are algorithmically placed Only works well with exact-match queries –Loosely structured: routing hints to simplify content location

Information-Centric Networks05b-6 Unstructured architectures Hybrid decentralized (e.g. Napster, Publius) –Central directory server Holds client info and file table –Clients report their files to server The server updates its tables –Clients send queries to server The server returns query results to clients –Clients exchange files directly –Inherently non-scalable, easy to take down It is sufficient to take down the centralized components

Information-Centric Networks05b-7 Unstructured architectures Purely decentralized (e.g. Gnutella) –All nodes are equal (SERVer ENTIties, servents) –Servents exchange 4 messages Ping: probe to a neighbor Pong: response to ping, contains number and size of files Query: search request (string) to neighbor Query Hit: response to query with matching file pointers –First locate some hosts via public server, then send pings –Then start sending queries Pings and queries are flooded up to a number of hops Duplicate detection mechanisms also limit flooding Responses are returned via the reverse path –Download files directly from the servents having them

Information-Centric Networks05b-8 Unstructured architectures Partially centralized (e.g. Kazaa) –Supernodes are dynamically assigned to assist others Supernodes are more powerful than other nodes (CPU, network) Index files shared by peers and proxy queries on their behalf All queries initially sent to supernodes –Supernodes make queries much faster –But they can be replaced if they fail Improving unstructured architectures –The biggest problem is scalability –Replace flooding by random walks –Proactively replicate data –Maintain local indices of data

Information-Centric Networks05b-9 Structured architectures Loosely structured systems (e.g. Freenet) –Estimate which node is responsible for content –Each node makes a local decision to forward queries –In Freenet file names are hashed to produce keys –There are 4 types of messages Data insert: includes data and key Data request: includes key and TTL Data reply: includes actual data Data failed: cannot find data –On insert, each node estimates where similar keys are stored New files are placed close to files with similar keys Each node maintains a table with routing hints –On query, local estimates towards possible targets are used Each successful response leaves behind a chain of pointers

Information-Centric Networks05b-10 Structured architectures Chord: uses a ring of nodes and keys –Nodes are placed on a ring –Each node knows its successor –Keys are placed on the same ring –Each key is stored at its successor –Insertions/deletions of nodes only have local effect Some keys need to move to the next/previous node –A finger table is used to speed up lookups Pointers to nodes at exponential distances Allows quick traversal of the ring

Information-Centric Networks05b-11 Structured architectures Tapestry (and Pastry, which is very similar) –Nodes and keys are treated as n digit numbers –Each node has a routing table with n rows –In each row there are pointers to nodes with matching prefixes At row k nodes match the first k digits of the present node –Routing proceeds by matching one digit at a time –There is some leeway in choosing routing entries Any node with a matching prefix will do Normally choose the closest one –Extensions allow replication of data

Information-Centric Networks05b-12 Structured architectures CAN (Content Addressable Network) –Multi-dimensional key space –Keys and nodes are mapped to points in the space –Nodes split the space between them based on proximity –Keys are stored at the node within which their point resides Insertions/deletions of nodes require splitting/merging areas –Routing proceeds between neighbor nodes Improving structured architectures –The biggest problem is inexact matches –Various proposals for building indexes on top Insert pointers to files with the right keywords as names –Maintenance is also quite heavy At least compared to unstructured architectures

Information-Centric Networks05b-13 Caching, replication and migration Replication improves availability and performance Passive replication –Occurs as nodes exchange content Cache-based replication –Cache responses passing through the node Active replication –Push data proactively Introspective replica management –Observe traffic to decide on replication Dynamic replica management –If queries are slow, place additional replicas

Information-Centric Networks05b-14 Security Secure storage of data Self-certification –File names contain a hash of the content Information dispersal –Encode the data as m blocks and spread them in the network –Require any n<m blocks to retrieve the data –Can be extended to secretly share keys Anonymous cryptographic relays –Data is sent to forwarders with send it to storers –The publisher destroys the data and publishes the storers

Information-Centric Networks05b-15 Security Access control, authentication and identity management –P2P systems are decentralized –The same entity can appear with many names This can subvert content replication, routing, etc. Reputation management also cannot work –The use of multiple fake identities is called a Sybil attack –Some systems use access control mechanisms Only users with specific keys can access content –Other systems use challenges to prevent Sybil attacks Need to spend some CPU cycles to be accepted Small numbers of nodes cannot mount large attacks

Information-Centric Networks05b-16 Anonymity and deniability Anonymity –May refer to publisher, storer, content or query –Dissassociation of content source and requestor Prevent linking files with origin or destination –Anonymous connection layers Onion routing and mix routing obscure the routing process –Censorship resistant lookup Deniability –Deniability of stored content Normally based on encryption or steganography –Deniability of content in transit Via anonymous connection layers

Information-Centric Networks05b-17 Incentives and accountability Trust-based versus trade-based incentive mechanisms –Prevent free-rider behavior in decentralized systems Reputation mechanisms –Need to propagate local reputation estimates globally –Quite tricky in a decentralized environment –Many simple mechanisms are easily subverted Micropayment mechanisms –Implement a micro currency scheme Resource trading schemes –Nodes issue receipts for services offered –Receipts indicate the quality of an entity

Information-Centric Networks05b-18 Content management capabilities Content deletion and update –Some systems only support immutable files –Others search for and remove old copies of files Content expiration –Easy way to get rid of stale data Content versioning –Used in Oceanstore for archival storage Directory structures –Mnemosyne provides UNIX like directories Content searching (see structured vs. unstructured) Storage and bandwidth management