P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
P2P Systems and Distributed Hash Tables Section COS 461: Computer Networks Spring 2011 Mike Freedman
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Technische Universität Yimei Liao Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei.
Node Lookup in Peer-to-Peer Network P2P: Large connection of computers, without central control where typically each node has some information of interest.
The Chord P2P Network Some slides have been borowed from the original presentation by the authors.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Xiaowei Yang CompSci 356: Computer Network Architectures Lecture 22: Overlay Networks Xiaowei Yang
Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
Peer-to-Peer Content Sharing. P2P File Sharing Benefits Why use a P2P model for a file sharing application?
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Distributed Lookup Systems
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
P2P Course, Structured systems 1 Introduction (26/10/05)
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
CSE 461 University of Washington1 Topic Peer-to-peer content delivery – Runs without dedicated infrastructure – BitTorrent as an example Peer.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Introduction of P2P systems
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 18: Data Management in Peer-to-Peer Systems Professor Chen Li Based on slides.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1. Outline  Introduction  Different Mechanisms Broadcasting Multicasting Forward Pointers Home-based approach Distributed Hash Tables Hierarchical approaches.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CSE 486/586 Distributed Systems Distributed Hash Tables
Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
The Chord P2P Network Some slides taken from the original presentation by the authors.
Peer-to-Peer Information Systems Week 12: Naming
CS 268: Lecture 22 (Peer-to-Peer Networks)
CSE 486/586 Distributed Systems Distributed Hash Tables
The Chord P2P Network Some slides have been borrowed from the original presentation by the authors.
Peer-to-Peer Data Management
EE 122: Peer-to-Peer (P2P) Networks
DHT Routing Geometries and Chord
5.2 FLAT NAMING.
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
Peer-To-Peer Data Management
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

P2P Databases

Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute data and pointers (ex: Napster) 3. Query by flooding (ex: Gnutella) 4. DHTs (ex: Chord) 5. Problems with DHTs 6. Keyword queries in DHTs (Magnolia) 7. Popularity queries 8. Demo of system 9. (if time) Data transmission - Overlay vs DHT Multicast - Bittorrent / Splitstream 10. (if time) P2P file systems and versioning (precursor to undo/redo logging from later in the course)

P2P Today napster gnutella morpheus kazaa bearshare ebay limewire icq fiorana mojo nation jxta united devices open cola uddi process tree can chord ocean store farsite pastry tapestry ? grove netmeeting freenet popular power aim jabber bittorrent edonkey

Object representation and storage Attributes : Name, Artist, Album, Genre Objects Pointer to object

P2P vs. Distributed DBMS Transactions Distributed Query Optimization Interoperation of heterogeneous data sources Reliability/failure of nodes Complex features do not scale Traditional DDBMS Issues:

P2P vs. Distributed DBMS Example application: file-sharing Simple data model and query language –No complex query optimization –Easy interoperation No guarantee on quality of results –Individual site availability unimportant Local updates –No transactions –Network partitions OK Simple Amenable to large-scale network of PCs

Example: file sharing Challenge #1: Performance –Asking everyone is expensive! –If I am smart, I only need to ask one peer –How can I be smart? ? ? ? ? File X?

Search in P2P System can control: –Connections made by users/topology –Data placement –Query type Tight control: “Structured” –Efficient, comprehensive Loose control: “Unstructured” –Inefficient, not comprehensive, simple, expressive –Used in real life Both are useful to study

Centralized Napster model Benefits: –Efficient search –Limited bandwidth usage –No per-node state Drawbacks: –Central point of failure –Limited scale BobAlice JaneJudy

Unstructured – Query Flooding = forward query = processed query = query source = found result = forward response

Problems with unstructured Inefficient –Query messages are flooded –Even if routing is intelligent, worst case load is still O(n), where n is # nodes in system Not comprehensive –If I do not get a result for my query, is it because none exists? (Of course, many optimizations are possible…) Structured systems address these problems

Distributed Hash Table (DHTs) Model: –Key/Object pair, the key is hashed to get an ID –Example: Objects are files The key is the content of the file The ID is the hash of the file contents Single operation: Lookup(ID) –Input: integer ID –Output: the object with the corresponding ID

Identifiers IDs are m-bit integers Nodes are also assigned IDs –Commonly assigned by hashing a node’s IP address, although many problems with this An object is stored on the node with the smallest ID greater than the object’s ID –This node is called the successor of the object’s ID –IDs are arranged on a circle, so 0 > 2 m -1

Data Placement m = 3 Nodes: Data:

Connections “Finger pointers” Distance …. 2 m-1

Query Lookup(objectID) –objectID is typically the ID of the object you are looking for, but not necessarily Approach: –Find the predecessor of the object I.e. the node with the largest ID that is smaller than the object ID –Return the successor of the predecessor

Query Example Say node 0 wants to find the object with ID = 7 For simplicity, we will assume a node exists at every ID in the space

Query Example Node 0: Lookup(7) Node 0: FindPred (7)

Query Example Node 4: FindPred(7)

Query Example Node 6: FindPred(7) Node 6 is predecessor Return successor node 7

Query characteristics With high probability, a query can be answered by contacting O(log N) nodes –N total nodes in the network  Efficient! Also notice: if an object with the ID exists in the network, it will be found  Comprehensive! State is also O(log N) in size

Query characteristics Note that finger pointers are not required for correct operation –Only successor pointers are needed –But then cost of query increases O(N) in worst case

Advantages of Structured? Scalability/Efficiency –load grows with O(log N) Comprehensiveness

Disadvantages? (cont) Availability of Data –If a node dies suddenly, what happens to the data it was storing? –MUST replicate data across multiple nodes Query Language –How can we express keyword queries efficiently? –Many useful applications require different languages

Magnolia Current approach: Hash each keyword separately and store pointers at h(keyword) Seven Innovation Myths h(some) h(innovation) h(myths) “Seven Innovation Myths” h(title) “Innovation”

Resulting Distribution

Prefix hashing …………. m’ m bits Innovation h P (innovation)h P = m’ bit hash function Partitions network into ~ n/2 m’ separate sibling groups n = nodes, m’  partitioning factor For m’=12, n= 1 million, ~ 256 nodes will share same prefix Assumption: h is uniformly distributed 100 Prefix Hashing

100 Innovation Balanced over the sibling group Sibling group ID=100 Balancing All siblings in a group share the same prefix

Random Sibling Insert Keyword h P  SiblingGroup ID Locate a sibling node via SIFT Lookup Keyword O(1) Group Broadcast or Multicast Replies

Advantages Good Balancing Properties

Advantages Low Traffic Load on nodes for popular queries Quick Lookup Popularity Ranking of Objects Distributed Replication for resilience

Implementing Magnolia Developed on top of a chord clone written in Python –If you’re going to write a peer-to-peer app, why not leverage existing modules and libraries? Challenge: How do we implement group- based stores and queries without requiring additional network maintenance?

Chord’s Finger Table A chord node maintains a finger table of M IP’s pointing to nodes ahead of it in the ring. –A pointer at index i is the successor of node id + (2^i-1). This lets us reach any node in the network in O(log M) hops We use the M’ most significant bits in a node’s id to indicate it’s group. We want to reach any group in O(log M’) hops. –Do we need another table? –Nope. The last M’ entries in our finger table provide this.

Talking to Siblings How do we propagate queries through the group? Naïve solution: send to our predecessor and successor. A better solution: We can send a query throughout the group by treating the sibling group as a tree.

Sibling Tree ^3 8+2^21+2^2 2+2^1 5+2^19+2^112+2^1 14+2^ N/N’ = 16; M/M’ = 4 Every edge can be found in the finger table!

Sibling Tree Problems Problems: –Not every possible node will exist –Not every node will have results to report –The query maker needs to know when the search is done But we’re okay! –Nodes can determine if a child sub-tree is dead –Even if a child node in our sibling table is of a higher ID than expected its sub-tree contains all existing descendents of the expected id we can predict when a child is in a sibling our ancestor’s tree

Bigger Problems What if a pointer in our finger table fails? –We either have to find the successor to it’s id or fail to query the sub-tree What if the lowest ID node isn’t the root of our tree? –Some of our edges won’t be in our finger table

Popularity queries

Yulania, Demo

BitTorrent

SplitStream