Network Computing Laboratory Scalable File Sharing System Using Distributed Hash Table Idea Proposal April 14, 2005 Presentation by Jaesun Han.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
PDPTA03, Las Vegas, June S-Chord: Using Symmetry to Improve Lookup Efficiency in Chord Valentin Mesaros 1, Bruno Carton 2, and Peter Van Roy 1 1.
Technische Universität Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei Liao.
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Scalable Resource Information Service for Computational Grids Nian-Feng Tzeng Center for Advanced Computer Studies University of Louisiana at Lafayette.
Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications Stoica et al. Presented by Tam Chantem March 30, 2007.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Distributed Lookup Systems
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Introduction of P2P systems
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
PSI Peer Search Infrastructure. Introduction What are P2P Networks? The term "peer-to-peer" refers to a class of systems and applications that employ.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Malugo – a scalable peer-to-peer storage system..
Peer-to-Peer (P2P) File Systems. P2P File Systems CS 5204 – Fall, Peer-to-Peer Systems Definition: “Peer-to-peer systems can be characterized as.
Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Peer-to-Peer Information Systems Week 12: Naming
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
CS 268: Lecture 22 (Peer-to-Peer Networks)
Peer-to-Peer Data Management
Improving and Generalizing Chord
S-Chord: Using Symmetry to Improve Lookup Efficiency in Chord
EE 122: Peer-to-Peer (P2P) Networks
A Scalable content-addressable network
CS 162: P2P Networks Computer Science Division
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Consistent Hashing and Distributed Hash Table
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

Network Computing Laboratory Scalable File Sharing System Using Distributed Hash Table Idea Proposal April 14, 2005 Presentation by Jaesun Han

Network Computing Laboratory | 2 Korea Advanced Institute of Science and Technology Contents One Line Comment Motivation & Problems My Idea Key Idea Distributed Hash Table P2P file sharing system using DHT Technical challenges Conclusion

Network Computing Laboratory | 3 Korea Advanced Institute of Science and Technology One-line comment Achieving fully decentralized P2P file sharing system by distributing file indexing structure as distributed hash table (DHT)

Network Computing Laboratory | 4 Korea Advanced Institute of Science and Technology Scalability in file sharing is a practical key issue!!! Even worse is the request of infamous files network attack like DDoS Internet Explosion Internet Korean Users US Users Hot!! File sharing infrastructure

Network Computing Laboratory | 5 Korea Advanced Institute of Science and Technology Solution Approach Scalable solution for file sharing Investigate currently existing file sharing solutions Currently P2P based file sharing seems the most appropriate Investigate methods to provide scalability to P2P based approaches Fully decentralized architecture for P2P based file sharing

Network Computing Laboratory | 6 Korea Advanced Institute of Science and Technology Key Idea Decentralized indexing Existing schemes are either centralized or self-indexing E.g., … Self-indexing is not a index scheme. They have no indexing scheme. Solve the absence of indexing scheme by flooding-based search mechanism  High search overhead k.mp3 Search (k.mp3) Node3n.mp3 Node2s.mp3 Node4b.mp3 Node1k.mp3 nodefile Central Index table Search (k.mp3) Distributed Index table a-e f-m n-r s-z

Network Computing Laboratory | 7 Korea Advanced Institute of Science and Technology Key Idea Distributed Indexing Split index table & distribute each part to each node Hash Table for Distributed Indexing Possible to fast lookup Input to hash table : file name Output from hash table : node address Distributed Hash Table for Distributed Indexing Split hash table & distribute to each node Lookup through shortcut path P2P file sharing with DHT

Network Computing Laboratory | 8 Korea Advanced Institute of Science and Technology DHT based File sharing: Technical Challenges CHALLENGE Search (k.mp3) Distributed Index table a-e f-m n-r s-z Routing?! Nodes often join and leave!

Network Computing Laboratory | 9 Korea Advanced Institute of Science and Technology Related Works Peer-to-Peer File Sharing System Sharing files among personal computers [e.g.] Soribada, eDonkey, KaZaa, Gnutella 33.4% of Internet traffic in KT investigation (2004.2) Millions of simultaneous users Key technical issues in file indexing of existing P2P file sharing system Evolution of indexing scheme for improving scalability 1 st generation : centralized indexing 2 nd generation : fully decentralized self-indexing 3 rd generation : semi-centralized indexing

Network Computing Laboratory | 10 Korea Advanced Institute of Science and Technology Related Works First generation file sharing system Centralized indexing ([e.g.] Soribada, Napster) Problems : not scalable, single point of failure Centralized Directory Server (napster.com) N1 N2 N3 N4 N5 …… a.mp3N5 …… filenode Search(a.mp3) N5 IP addr. Request(a.mp3) File(a.mp3)

Network Computing Laboratory | 11 Korea Advanced Institute of Science and Technology Related Works Second generation file sharing system Fully decentralized self-indexing ([e.g.] Gnutella) Problems : flooding overhead, partial searching N1 N2 N3 N5 N4 N7 N6 N8 N9 Search(a.mp3) Search Result  N3, N5, N8 Selected Node  N5

Network Computing Laboratory | 12 Korea Advanced Institute of Science and Technology Related Works Third generation file sharing system Semi-centralized Indexing ([e.g.] eDonkey, KaZaa) Problems : partial searching, weak to DoS attack Supernode Search (a.mp3) File (a.mp3)

Network Computing Laboratory | 13 Korea Advanced Institute of Science and Technology Distributed Hash Table, Basic (1) Distributed Hash Table File name  H(x)  File ID, Node address  H(x)  Node ID Mapping File ID to Node ID hash key node 0 1 9, ,7, ,102 H(x) a.mp3 k.txt x.mpg g.doc File Name Node k b w n Node IDFile ID 30 (0-30) 71 (31-71) 89 (71-89) 127 (89-127) H(x) k b n w Node Address

Network Computing Laboratory | 14 Korea Advanced Institute of Science and Technology Distributed Hash Table, Basic (2) Key and Node are uniformly distributed and exist in the same ID space Each node is responsible to keys between predecessor node and itself g.txt(2,8) a.mp3(1) x.doc(4) s.mpg(1,4) k.mp3(2) H(g.txt) H(a.mp3) H(x.doc) H(s.mpg) H(k.mp3)

Network Computing Laboratory | 15 Korea Advanced Institute of Science and Technology Distributed Hash Table, Routing (1) Naïve approach Each node knows one’s successor node Lookup request is forwarded to the successor until (Node ID < File ID < Successor Node ID) Worse case performance : O(N) s.mpg(1,8) k.mp3(2) successor=010 successor=100 successor=111 successor=001 Lookup (H(k.mp3))  Lookup (101)

Network Computing Laboratory | 16 Korea Advanced Institute of Science and Technology Distributed Hash Table, Routing (2) Tree-based routing table Shortcut to nodes whose node ID have different bits in each bit position 2 m ID space  m entries Lookup performance  O(logN) d a b c x 1xx c a c x 0xx d d a Shortcut table Lookup (101)

Network Computing Laboratory | 17 Korea Advanced Institute of Science and Technology Distributed Hash Table, Routing (3) Complete example of routing table & routing algorithm Lookup from node 65a1fc with key d46a1c Lookup from node 65a1fc with key d46a1c Routing Tables

Network Computing Laboratory | 18 Korea Advanced Institute of Science and Technology Distributed Hash Table, Join Join process Try to lookup with one’s node ID as lookup key Gathering routing table entries in routing d46a1c Lookup from node d46a1c with key d46a1c Lookup from node d46a1c with key d46a1c a- b- c- e- f- d0- d1- d2- d3- d5- ….. dc- dd- de- df- d40- d41- d42- d43- d44- d45- ….. d4f- Routing Tables Creation Routing Tables Creation

Network Computing Laboratory | 19 Korea Advanced Institute of Science and Technology P2P File Sharing with DHT Storing file index into DHT Example : node a shares new file g.txt, node b lookup g.txt x 1xx c a c x 0xx d d a x 1xx a c c x 0xx d c a a b c d 1. Hash g.txt  File ID= Insert file info with ID=101 g.txt 3. Hash g.txt  File ID= Lookup with ID= g.txta ID File name Node addr File index table addr(a) 5. Download file g.txt g.txt

Network Computing Laboratory | 20 Korea Advanced Institute of Science and Technology File Sharing with DHT: Technical Challenges Frequent node join & leave  Index replication & fast routing table adaptation Exact matching search by hashing file name  Keyword search scheme Hotspot problem in node which is indexing a popular file  Load balancing mechanism

Network Computing Laboratory | 21 Korea Advanced Institute of Science and Technology Conclusion New approach for P2P file sharing system Using new distributed data structure, Distributed Hash Table (DHT) Fully decentralized indexing Guarantee lookup performance of O(logN) Possible to full search Robust to node failure & network attack like DoS attack