Querying The Internet With PIER Nitin Khandelwal.

Slides:



Advertisements
Similar presentations
SDN Controller Challenges
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Berkeley dsn declarative sensor networks problem David Chu, Lucian Popa, Arsalan Tavakoli, Joe Hellerstein approach related dsn architecture status  B.
Declarative sensor networks David Chu Computer Science Division EECS Department UC Berkeley DBLunch UC Berkeley 2 March 2007.
Querying the Internet with PIER Article by: Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica, 2003 EECS Computer.
CS162 Section Lecture 9. KeyValue Server Project 3 KVClient (Library) Client Side Program KVClient (Library) Client Side Program KVClient (Library) Client.
Small-Scale Peer-to-Peer Publish/Subscribe
P2p, Fall 05 1 Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) VLDB 2003 Ryan Huebsch, Joe Hellerstein, Nick Lanham,
Internet Indirection Infrastructure Ion Stoica UC Berkeley.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS Peer-to-Peer Systems 12/9/03.
1 PIER. 2 Presentation overview PIER Core functionality and design principles Core functionality and design principles Distributed join example. Distributed.
Object Naming & Content based Object Search 2/3/2003.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
SensIT PI Meeting, April 17-20, Distributed Services for Self-Organizing Sensor Networks Alvin S. Lim Computer Science and Software Engineering.
What Can Databases Do for Peer-to-Peer Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu Presented by: Ryan Huebsch CS294-4 P2P Systems.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
P2p, Fall 06 1 Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) VLDB 2003 Ryan Huebsch, Joe Hellerstein, Nick Lanham,
SIMULATING A MOBILE PEER-TO-PEER NETWORK Simo Sibakov Department of Communications and Networking (Comnet) Helsinki University of Technology Supervisor:
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) Ryan Huebsch † Joe Hellerstein †, Nick Lanham †, Boon Thau Loo.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Communication Paradigm for Sensor Networks Sensor Networks Sensor Networks Directed Diffusion Directed Diffusion SPIN SPIN Ishan Banerjee
REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
A Mechanized Model for CAN Protocols Context and objectives Our mechanized model Results Conclusions and Future Works Francesco Bongiovanni and Ludovic.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
1 Distributed Databases BUAD/American University Distributed Databases.
National Institute of Advanced Industrial Science and Technology Query Processing for Distributed RDF Databases Using a Three-dimensional Hash Index Akiyoshi.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
PIER: Peer-to-Peer Information Exchange and Retrieval Ryan Huebsch Joe Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica
Ing. Erick López Ch. M.R.I. Replicación Oracle. What is Replication  Replication is the process of copying and maintaining schema objects in multiple.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
PIER ( Peer-to-Peer Information Exchange and Retrieval ) 30 March 07 Neha Singh.
Click to edit Master title style Multi-Destination Routing and the Design of Peer-to-Peer Overlays Authors John Buford Panasonic Princeton Lab, USA. Alan.
1 Querying the Physical World Son, In Keun Lim, Yong Hun.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
NCLAB 1 Supporting complex queries in a distributed manner without using DHT NodeWiz: Peer-to-Peer Resource Discovery for Grids Sujoy Basu, Sujata Banerjee,
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Gorilla: A Fast, Scalable, In-Memory Time Series Database
Ryan Huebsch, Joseph M. Hellerstein, Ion Stoica, Nick Lanham, Boon Thau Loo, Scott Shenker Querying the Internet with PIER Speaker: Natalia KozlovaTutor:
Network Topologies for Scalable Multi-User Virtual Environments Lingrui Liang.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
MIT – Laboratory for Computer Science
Distributed Database Concepts
Open Source distributed document DB for an enterprise
Peer-to-Peer Data Management
Overview of SDN Controller Design
CHAPTER 3 Architectures for Distributed Systems
Multiple Processor Systems
Chapter 2: Operating-System Structures
Small-Scale Peer-to-Peer Publish/Subscribe
REED : Robust, Efficient Filtering and Event Detection
Chapter 2: Operating-System Structures
Presentation transcript:

Querying The Internet With PIER Nitin Khandelwal

Motivation Inject a degree of distribution into databases Internet scale systems vs. hundred node systems Large scale applications requiring database functionaity

Applications P2P Databases Highly distributed and available data Network Monitoring Intrusion detection Fingerprint queries

Design Principles Relaxed Consistency Sacrifice Consistency in face of Availability and Partition tolerance Organic Scaling Growth with deployment Natural Habitats for Data Data remains in original format with a DB interface Standard Schemas Achieved though common software

DHTs Implemented with CAN (Content Addressable Network). Node identified by hyper-rectangle in d-dimensional space Key hashed to a point, stored in corresponding node. Routing Table of neighbours is maintained. O(d)

DHT Design Routing Layer Mapping for keys (-- dynamic as nodes leave and join) Storage Manager DHT based data Provider Storage access interface for higher levels

Provider Couples the routing and storage layers namespace – relation resourceId – primary key namespace + resourceId >> key instanceId – distinguishes objects with same namespace and resourceID lifetime – item storage duration LScan, Multicast, Newdata

PIER Query Processor Operators: Selection, proj, joins, grouping, agg Operators push and pull data Relaxed Consistency and reachable snapshot: - working with nodes reachable at query issue. - Instead, use arrival of query multicast message.

Join Algorithm R, S – relations Nr, Ns – relation namespaces Nq - DHT-based temporary table Symmetric Hash Join: - Rehashes the relations - Scan and copy in new namespace Nq Fetch Matches - One relation(S) already hashed on join attribute - Selections on non-join attributes of S cannot be pushed into the DHT

Join Rewriting Aimed at lowering the bandwidth utilization Symmetric semi-join - Local projections to Resource ID + join keys - Symmetric Hash Join on two projections - Global fetch matches join using Resource Ids of R and S Bloom joins(Hashed semi-join) - Bloom filter is hashing based bit-vector - Local bloom filters are published into temporary namespaces - Filters are OR-ed and multicast to opposite relation’s nodes

Workload Parameters CAN configuration: d = 4 R 10 times larger than S Constants provide 50% selectivity f(x,y) evaluated after the join 90% of R tuples match a tuple in S Result tuples are 1KB each Symmetric hash join used

Simulation Setup Up to 10,000 nodes Network cross-traffic, CPU and memory utilizations ignored Data shipped from source to computation node for every query operation ms and 10Mbps fully connected links 2. GT-ITM transit-stub topology (similar results)

Join Algorithms Infinite Bandwidth (Observe Impact of just propagation delay) 1024 data and computation nodes Core Join Algorithms : Performs faster Rewrites: Bloom Filter: two multicasts Semi-join: two CAN lookups

Join Algorithms -- 2 Limited Bandwidth Symmetric Hash Join: - Rehashes both tables Semi Joins: - Transfer only matching tuples At 40% selectivity, bottleneck switches from computation nodes to query sites

Conclusions Scalability of PIER dervies from relaxed design principles - adoption of soft states - dilated snapshot semantics Limitation: Just equality predicates  Directions: - Pushdown of selections into DHT - Caching and replication of DHT data - Catalog Manager – Stringent consistency and availability requirements.

Sophia: An Information Plane Nitin Khandelwal

Shared Information Plane Distributed System running throughout the network. - Collects information about network elements Local state(load/memory usage), local perspective (reachability of other nodes) - Evaluate statements(questions) about the state - Reacting according to conclusions Killing misbehaving service

Challenges Information is widely distributed and dynamic Statements formulated at run-time – not a- priori Centralized analysis not practical Push analysis to the nodes(push into the network)

Approach Use logic programming model - In dynamic and distributed system, therefore temporal and positional logic Why? - Expressivity: Intuitive to make statements about the state of the system - Performance: :: Logic expression transformation for efficient evaluation :: Partial results caching

Time and Position in the Language Every term in the system has an environment containing time and location Eval( bandwidth( env (at(node(Node), time(Time), Time > , BwVar), BwVar > 40000))

Performance Aggressive Caching: - Evaluation results are cached - Sometimes latency is more important then freshness - Time environment used to control freshness Scheduling - Pre-scheduling results to be available when and where they may be needed. - Cache can be refreshed with fresh values

Evaluation Planning Given an expression, plan - where(close to data) - when (time when dependencies resolved) - what to evaluate Logic expressions can be transformed at runtime

Extensibility Users can add new functionality at run-time Capabilities : to protect modules, grant and revoke privileges. cap569354(Val) :- read sensor. cap435456(Val) :- cap569354(Val). bandwidth(Val) :- cap(435456(Val) Module Protection: All predicates transformed into capabilities, shared through master key capability Danger in caching – different interfaces

PIER and Sophia Sophia: location of code execution is both explicit in the language and can be evaluated in the course of evaluation. PIER: details of query execution left to underlying implementation to optimize. Consequence: Sophia queries are more sophisticated: both user and system participate in evaluation planning.