Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval)

Slides:



Advertisements
Similar presentations
CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Scalable Content-Addressable Network Lintao Liu
Querying the Internet with PIER Article by: Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica, 2003 EECS Computer.
Somdas Bandyopadhyay Anirban Basumallik
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
A Scalable Content Addressable Network (CAN)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks.
P2p, Fall 05 1 Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) VLDB 2003 Ryan Huebsch, Joe Hellerstein, Nick Lanham,
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
Distributed Lookup Systems
Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.
1 PIER. 2 Presentation overview PIER Core functionality and design principles Core functionality and design principles Distributed join example. Distributed.
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network ACIRI U.C.Berkeley Tahoe Networks 1.
P2p, Fall 06 1 Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) VLDB 2003 Ryan Huebsch, Joe Hellerstein, Nick Lanham,
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network ACIRI U.C.Berkeley Tahoe Networks 1.
Structured P2P Network Group14: Qiwei Zhang; Shi Yan; Dawei Ouyang; Boyu Sun.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Other Structured P2P Systems CAN, BATON Lecture 4 1.
CONTENT ADDRESSABLE NETWORK Sylvia Ratsanamy, Mark Handley Paul Francis, Richard Karp Scott Shenker.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Applied Research Laboratory David E. Taylor A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker.
Sylvia Ratnasamy (UC Berkley Dissertation 2002) Paul Francis Mark Handley Richard Karp Scott Shenker A Scalable, Content Addressable Network Slides by.
Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) Ryan Huebsch † Joe Hellerstein †, Nick Lanham †, Boon Thau Loo.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 2 ARCHITECTURES.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
Content Addressable Networks CAN is a distributed infrastructure, that provides hash table-like functionality on Internet-like scales. Keys hashed into.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
P2P Group Meeting (ICS/FORTH) Monday, 28 March, 2005 A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp,
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
PIER: Peer-to-Peer Information Exchange and Retrieval Ryan Huebsch Joe Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica
Querying The Internet With PIER Nitin Khandelwal.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
PIER ( Peer-to-Peer Information Exchange and Retrieval ) 30 March 07 Neha Singh.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CSCI 599: Beyond Web Browsers Professor Shahram Ghandeharizadeh Computer Science Department Los Angeles, CA
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Ryan Huebsch, Joseph M. Hellerstein, Ion Stoica, Nick Lanham, Boon Thau Loo, Scott Shenker Querying the Internet with PIER Speaker: Natalia KozlovaTutor:
CHAPTER 3 Architectures for Distributed Systems
A Scalable content-addressable network
CONTENT ADDRESSABLE NETWORK
A Scalable, Content-Addressable Network
Reading Report 11 Yin Chen 1 Apr 2004
A Scalable Content Addressable Network
A Scalable, Content-Addressable Network
Presentation transcript:

Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval)

What is PIER? Peer-to-Peer Information Exchange and Retrieval Peer-to-Peer Information Exchange and Retrieval Query engine that runs on top of P2P network Query engine that runs on top of P2P network step to the distributed query processing at a larger scalestep to the distributed query processing at a larger scale way for massive distribution: querying heterogeneous dataway for massive distribution: querying heterogeneous data Architecture meets traditional database query processing with recent peer-to-peer technologies Architecture meets traditional database query processing with recent peer-to-peer technologies

Key goal is scalable indexing system for large- scale decentralized storage applications on the Internet Key goal is scalable indexing system for large- scale decentralized storage applications on the Internet P2P, Large scale storage management systems (OceanStore, Publius), wide-area name resolution services P2P, Large scale storage management systems (OceanStore, Publius), wide-area name resolution services

What is Very Large? Depends on Who You Are Single Site Clusters Internet Scale 1000’s – Millions Distributed 10’s – 100’s How to run DB style queries at Internet Scale! How to run DB style queries at Internet Scale! Database CommunityNetwork Community Internet scale systems vs. hundred node systems

What are the Key Properties? Lots of data that is: Lots of data that is: 1.Naturally distributed (where it’s generated) 2.Centralized collection undesirable 3.Homogeneous in schema 4.Data is more useful when viewed as a whole

Who Needs Internet Scale? Example 1: Filenames Simple ubiquitous schemas: Simple ubiquitous schemas: Filenames, Sizes, ID3 tagsFilenames, Sizes, ID3 tags Born from early P2P systems such as Napster, Gnutella etc. Born from early P2P systems such as Napster, Gnutella etc. Content is shared by “normal” non-expert users… home users Content is shared by “normal” non-expert users… home users Systems were built by a few individuals ‘in their garages’  Low barrier to entry Systems were built by a few individuals ‘in their garages’  Low barrier to entry

Example 2: Network Traces Schemas are mostly standardized: Schemas are mostly standardized: IP, SMTP, HTTP, SNMP log formatsIP, SMTP, HTTP, SNMP log formats Network administrators are looking for patterns within their site AND with other sites: Network administrators are looking for patterns within their site AND with other sites: DoS attacks cross administrative boundariesDoS attacks cross administrative boundaries Tracking virus/worm infectionsTracking virus/worm infections Timeliness is very helpfulTimeliness is very helpful Might surprise you how useful it is: Might surprise you how useful it is: Network bandwidth on PlanetLab (world-wide distributed research test bed) is mostly filled with people monitoring the network statusNetwork bandwidth on PlanetLab (world-wide distributed research test bed) is mostly filled with people monitoring the network status

Our Challenge Our focus is on the challenge of scale: Our focus is on the challenge of scale: Applications are homogeneous and distributedApplications are homogeneous and distributed Already have significant interest Already have significant interest Provide a flexible framework for a wide variety of applicationsProvide a flexible framework for a wide variety of applications

Four Design Principles (I) Relaxed Consistency Relaxed Consistency ACID transactions severely limits the scalability and availability of distributed databasesACID transactions severely limits the scalability and availability of distributed databases We provide best-effort resultsWe provide best-effort results Organic Scaling Organic Scaling Applications may start small, without a priori knowledge of sizeApplications may start small, without a priori knowledge of size

Four Design Principles (II) Natural habitat Natural habitat No CREATE TABLE/INSERTNo CREATE TABLE/INSERT No “publish to web server”No “publish to web server” Wrappers or gateways allow the information to be accessed where it is createdWrappers or gateways allow the information to be accessed where it is created Standard Schemas via Grassroots software Standard Schemas via Grassroots software Data is produced by widespread software providing a de-facto schema to utilizeData is produced by widespread software providing a de-facto schema to utilize

Physical NetworkOverlay Network Query Plan Declarative Queries >>based on Can

Applications P2P Databases P2P Databases Highly distributed and available data Highly distributed and available data Network Monitoring Network Monitoring Intrusion detection Intrusion detection Fingerprint queries Fingerprint queries

DHTs DHTs Implemented with CAN (Content Addressable Network). Implemented with CAN (Content Addressable Network). Node identified by hyper-rectangle in d-dimensional space Node identified by hyper-rectangle in d-dimensional space Key hashed to a point, stored in corresponding node. Key hashed to a point, stored in corresponding node. Routing Table of neighbours is maintained. O(d) Routing Table of neighbours is maintained. O(d)

(16,16) (16,0) (0,16)(0,0) Data Key = (15,14) Given a message with an ID, route the message to the computer currently responsible for that ID

DHT Design Routing Layer Routing Layer Mapping for keys Mapping for keys (-- dynamic as nodes leave and join) (-- dynamic as nodes leave and join) Storage Manager Storage Manager DHT based data DHT based data Provider Provider Storage access interface for higher levels Storage access interface for higher levels

DHT – Routing Routing layer maps a key into the IP address of the node currently responsible for that key. Provides exact lookups, callbacks higher levels when the set of keys has changed Routing layer API lookup(key)  ipaddr (Asynchronous Fnc) join(landmarkNode)leave()locationMapChange()

DHT – Storage Storage Manager stores and retrieves records, which consist of key/value pairs. Keys are used to locate items and can be any data type or structure supported Storage Manager API store(key, item) retrieve(key)  item remove(key)

DHT – Provider (1) Provider ties routing and storage manager layers and provides an interface Each object in the DHT has a namespace, resourceID and instanceID Each object in the DHT has a namespace, resourceID and instanceID DHT key = hash(namespace,resourceID) DHT key = hash(namespace,resourceID) namespace - application or group of object, table or relation namespace - application or group of object, table or relation resourceID – primary key or any attribute(Object) resourceID – primary key or any attribute(Object) instanceID – integer, to separate items with the same namespace and resourceID instanceID – integer, to separate items with the same namespace and resourceID Lifetime - item storage duration Lifetime - item storage duration CAN’s mapping of resourceID/Object is equivalent to an index

DHT – Provider (2) Provider API get(namespace, resourceID)  item put(namespace, resourceID, item, lifetime) renew(namespace, resourceID, instanceID, lifetime)  bool multicast(namespace, resourceID, item) lscan(namespace)  items newData(namespace, item) Node R1 (1..n) Table R (namespace) (1..n) tuples (n+1..m) tuples Node R2 (n+1..m) rID1item rID3item rID2item

Query Processor How it works? How it works? performs selection, projection, joins, grouping, aggregation -> Operatorsperforms selection, projection, joins, grouping, aggregation -> Operators Operators push and pull dataOperators push and pull data simultaneous execution of multiple operators pipelined togethersimultaneous execution of multiple operators pipelined together results are produced and queued as quick as possibleresults are produced and queued as quick as possible How it modifies data? How it modifies data? insert, update and delete different items via DHT interfaceinsert, update and delete different items via DHT interface How it selects data to process? How it selects data to process? dilated-reachable snapshot – data, published by reachable nodes at the query arrival timedilated-reachable snapshot – data, published by reachable nodes at the query arrival time

Join Algorithms Limited Bandwidth Limited Bandwidth Symmetric Hash Join: Symmetric Hash Join: - Rehashes both tables - Rehashes both tables Semi Joins: Semi Joins: - Transfer only matching tuples - Transfer only matching tuples At 40% selectivity, bottleneck switches from computation nodes to query sites At 40% selectivity, bottleneck switches from computation nodes to query sites

Future Research Routing, Storage and Layering Routing, Storage and Layering Catalogs and Query Optimization Catalogs and Query Optimization Hierarchical Aggregations Hierarchical Aggregations Range Predicates Range Predicates Continuous Queries over Streams Continuous Queries over Streams Sharing between Queries Sharing between Queries Semi-structured Data Semi-structured Data

Distributed Hash Tables (DHTs) What is a DHT? What is a DHT? Take an abstract ID space, and partition among a changing set of computers (nodes)Take an abstract ID space, and partition among a changing set of computers (nodes) Given a message with an ID, route the message to the computer currently responsible for that IDGiven a message with an ID, route the message to the computer currently responsible for that ID Can store messages at the nodesCan store messages at the nodes This is like a “distributed hash table”This is like a “distributed hash table” Provides a put()/get() API Provides a put()/get() API Cheap maintenance when nodes come and goCheap maintenance when nodes come and go

Distributed Hash Tables (DHTs) Lots of effort is put into making DHTs better: Lots of effort is put into making DHTs better: Scalable (thousands  millions of nodes)Scalable (thousands  millions of nodes) Resilient to failureResilient to failure Secure (anonymity, encryption, etc.)Secure (anonymity, encryption, etc.) Efficient (fast access with minimal state)Efficient (fast access with minimal state) Load balancedLoad balanced etc.etc.

PIER’s Three Uses for DHTs Single elegant mechanism with many uses: Single elegant mechanism with many uses: Search: IndexSearch: Index Like a hash index Like a hash index Partitioning: Value (key)-based routingPartitioning: Value (key)-based routing Like Gamma/Volcano Like Gamma/Volcano Routing: Network routing for QP messagesRouting: Network routing for QP messages Query dissemination Query dissemination Bloom filters Bloom filters Hierarchical QP operators (aggregation, join, etc) Hierarchical QP operators (aggregation, join, etc) Not clear there’s another substrate that supports all these uses Not clear there’s another substrate that supports all these uses

Metrics We are primarily interested in 3 metrics: We are primarily interested in 3 metrics: Answer quality (recall and precision)Answer quality (recall and precision) Bandwidth utilizationBandwidth utilization LatencyLatency Different DHTs provide different properties: Different DHTs provide different properties: Resilience to failures (recovery time)  answer qualityResilience to failures (recovery time)  answer quality Path length  bandwidth & latencyPath length  bandwidth & latency Path convergence  bandwidth & latencyPath convergence  bandwidth & latency Different QP Join Strategies: Different QP Join Strategies: Symmetric Hash Join, Fetch Matches, Symmetric Semi- Join, Bloom Filters, etc.Symmetric Hash Join, Fetch Matches, Symmetric Semi- Join, Bloom Filters, etc. Big Picture: Tradeoff bandwidth (extra rehashing) and latencyBig Picture: Tradeoff bandwidth (extra rehashing) and latency

Symmetric Hash Join (SHJ)

Fetch Matches (FM)

Symmetric Semi Join (SSJ) Both R and S are projected to save bandwidth Both R and S are projected to save bandwidth The complete R and S tuples are fetched in parallel to improve latency The complete R and S tuples are fetched in parallel to improve latency

Overview CAN is a distributed system that maps keys onto values CAN is a distributed system that maps keys onto values Keys hashed into d dimensional space Keys hashed into d dimensional space Interface: Interface: insert(key, value)insert(key, value) retrieve(key)retrieve(key)

Overview y x State of the system at time t Peer Resource Zone In this 2 dimensional space a key is mapped to a point (x,y)

DESIGN D-dimensional Cartesian coordinate D-dimensional Cartesian coordinate space (d-torus) Every Node owns a distinct Zone Every Node owns a distinct Zone Map Key k1 onto a point p1 using a Map Key k1 onto a point p1 using a Uniform Hash function (k1,v1) is stored at the node Nx (k1,v1) is stored at the node Nx that owns the zone with p1 that owns the zone with p1

Node Maintains routingNode Maintains routing table with neighbors Ex: A Node holds{B,C,E,D} Follow the straight line path throughFollow the straight line path through the Cartesian space

Routing y Peer Q(x,y) (x,y)  d-dimensional space with n zones  2 zones are neighbor if d-1 dim overlap  Routing path of length:  Algorithm: Choose the neighbor nearest to the destination Q(x,y) Query/ Resource key

CAN: construction* Bootstrap node new node

CAN: construction I Bootstrap node new node 1) Discover some node “I” already in CAN

CAN: construction 2) Pick random point in space I (x,y) new node

CAN: construction (x,y) 3) I routes to (x,y), discovers node J I J new node

CAN: construction new J 4) split J’s zone in half… new owns one half

Maintenance Use zone takeover in case of failure or leaving of a node Use zone takeover in case of failure or leaving of a node Send your neighbor table to neighbors to inform that you are alive at discrete time interval t Send your neighbor table to neighbors to inform that you are alive at discrete time interval t If your neighbor does not send alive in time t, takeover its zone If your neighbor does not send alive in time t, takeover its zone Zone reassignment is needed Zone reassignment is needed

Node Departure Some one has to take over the Zone Some one has to take over the Zone Explicit hand over of the zone to one of its Neighbors Explicit hand over of the zone to one of its Neighbors Merge to valid Zone if ”possible” Merge to valid Zone if ”possible” If not Possible ”then to Zones are temporary handled by the smallest neighbor If not Possible ”then to Zones are temporary handled by the smallest neighbor

Zone reassignment Zoning Partition tree

Zone reassignment Zoning Partition tree

Design Improvements Multi-DimensionMulti-Dimension Multi-Coordinate SpacesMulti-Coordinate Spaces Overloading the ZonesOverloading the Zones Multiple Hash FunctionsMultiple Hash Functions Topologically Sensitive ConstructionTopologically Sensitive Construction Uniform PartitioningUniform Partitioning CachingCaching

Multi-Dimension Increase in the dimension reduces the path length Increase in the dimension reduces the path length

Multi-Coordinate Spaces Multiple coordinate spaces Multiple coordinate spaces Each node is assigned different zone in each of them. Each node is assigned different zone in each of them. Increases the availability and reduces the path length Increases the availability and reduces the path length

Overloading the Zones More than one peer are assigned to one zone. More than one peer are assigned to one zone. Increases availability Increases availability Reduces path length Reduces path length Reduce per-hop latency Reduce per-hop latency

Uniform Partitioning Instead of splitting directly splitting the node occupant node Instead of splitting directly splitting the node occupant node Compare the volume of its zone with neighborsCompare the volume of its zone with neighbors The one to split is the one having biggest volumeThe one to split is the one having biggest volume