UC Berkeley and Intel Research Berkeley

Slides:



Advertisements
Similar presentations
Implementing Declarative Overlays From two talks by: Boon Thau Loo 1 Tyson Condie 1, Joseph M. Hellerstein 1,2, Petros Maniatis 2, Timothy Roscoe 2, Ion.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Berkeley dsn declarative sensor networks problem David Chu, Lucian Popa, Arsalan Tavakoli, Joe Hellerstein approach related dsn architecture status  B.
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Declarative sensor networks David Chu Computer Science Division EECS Department UC Berkeley DBLunch UC Berkeley 2 March 2007.
The Architecture of PIER: an Internet-Scale Query Processor (PIER = Peer-to-peer Information Exchange and Retrieval) Ryan Huebsch Brent Chun, Joseph M.
P2p, Fall 05 1 Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) VLDB 2003 Ryan Huebsch, Joe Hellerstein, Nick Lanham,
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS Peer-to-Peer Systems 12/9/03.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
P2p, Fall 06 1 Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) VLDB 2003 Ryan Huebsch, Joe Hellerstein, Nick Lanham,
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
PIER & PHI Overview of Challenges & Opportunities Ryan Huebsch † Joe Hellerstein † °, Boon Thau Loo †, Sam Mardanbeigi †, Scott Shenker †‡, Ion Stoica.
Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) Ryan Huebsch † Joe Hellerstein †, Nick Lanham †, Boon Thau Loo.
Overlay Network Physical LayerR : router Overlay Layer N R R R R R N.
Titanium/Java Performance Analysis Ryan Huebsch Group: Boon Thau Loo, Matt Harren Joe Hellerstein, Ion Stoica, Scott Shenker P I E R Peer-to-Peer.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
1 Distributed Databases BUAD/American University Distributed Databases.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
PIER: Peer-to-Peer Information Exchange and Retrieval Ryan Huebsch Joe Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
PIER ( Peer-to-Peer Information Exchange and Retrieval ) 30 March 07 Neha Singh.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Peer-to-Peer Information Systems Week 12: Naming
CPSC-310 Database Systems
The Post Windows Operating System
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Internet Indirection Infrastructure (i3)
Chapter 1: Introduction
Improving searches through community clustering of information
Containers: The new network endpoint
Self Healing and Dynamic Construction Framework:
Peer-to-Peer Data Management
Data Streaming in Computer Networking
CHAPTER 3 Architectures for Distributed Systems
Hierarchical Architecture
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Plethora: Infrastructure and System Design
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
The Top 10 Reasons Why Federated Can’t Succeed
EE 122: Peer-to-Peer (P2P) Networks
April 30th – Scheduling / parallel
Overlay Networking Overview.
Software Defined Networking (SDN)
QNX Technology Overview
Virtualization Techniques
Prof. Leonardo Mostarda University of Camerino
Presentation by Theodore Mao CS294-4: Peer-to-peer Systems
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
RIBs A mobile architecture that scales together with the team
Chapter 6 – Architectural Design
Chapter 5 Architectural Design.
Distributed Databases
(A Research Proposal for Optimizing DBMS on CMP)
Design Components are Code Components
Distributed Hash Tables
Towards Distributed Test-Lab for Planetary-Scale Services
Small-Scale Peer-to-Peer Publish/Subscribe
B. Ramamurthy Based on Paper by Werner Vogels and Chris Re
Lecture 10, Computer Networks (198:552)
Design Yaodong Bi.
EE 122: Lecture 22 (Overlay Networks)
Peer-to-Peer Information Systems Week 12: Naming
From Use Cases to Implementation
Logical Architecture & UML Package Diagrams
Presentation transcript:

UC Berkeley and Intel Research Berkeley The Architecture of PIER: an Internet-Scale Query Processor (PIER = Peer-to-peer Information Exchange and Retrieval) Ryan Huebsch Brent Chun, Joseph M. Hellerstein, Boon Thau Loo, Petros Maniatis, Timothy Roscoe, Scott Shenker, Ion Stoica, and Aydan R. Yumerefendi p2p@db.cs.berkeley.edu UC Berkeley and Intel Research Berkeley OS/Net/DB researchers Talk Geared for db audience, but all these viewpoints affected our research goals and system design CIDR 1/5/05

Outline Application Space and Context Design Decisions Overview of Distributed Hash Tables Architecture Native Simulation Non-blocking Iterator Dataflow Query Dissemination Hierarchical Operators System Status Future Work

What is Very Large? Depends on Who You Are Distributed 10’s – 100’s Internet Scale 1000’s – Millions Single Site Clusters With title/ Mention VLDB 03 paper DB pt of view: queries at Scale Nets pt of view: using Db ideas for a new generation of Internet Database Community Network Community Challenges How to run database style queries at Internet scale? Can DB concepts influence the next Internet architecture?

Application Space Key properties Data is naturally distributed Centralized collection undesirable (legal, social, etc.) Homogenous schemas Data is more useful when viewed as a whole This is the design space we have chosen to investigate Mostly systems/algorithms challenges As opposed to … Enterprise Information Integration Semantic Web Data semantics & cleaning challenges Homogeneous schema = we’ll come back to this

A Guiding Example: File Sharing Simple ubiquitous schemas: Filenames, Sizes, ID3 tags Early P2P file sharing apps Napster, Gnutella, KaZaA, etc. Simple Not the greatest example Often used to violate copyright Fairly trivial technology But… Points to key social issues driving adoption of decentralized systems Provide real workloads to validate more complex designs Define filesharing apps = file stealing Simple = Gnutella built by small teams, used by thousands of non-expert college student

Example 2: Network Traces Schemas are mostly standardized: IP, SMTP, HTTP, SNMP log formats, firewall log formats, etc. Network administrators are looking for patterns within their site AND with other sites: DDoS attacks cross administrative boundaries Tracking epidemiology of viruses/worms Timeliness is very helpful Might surprise you just how useful this is Networks do not work unless they are standardized DOS = to evade detection Tracking = Blaster worm spread rapidly, security people would love to see realtime info to determine threat levels Timeliness = centralization not an option Last = Network on PlanetLab (distributed research test bed) is mostly filled with people monitoring the network status/ all pairs ping

Demo app we built using PIER Shows the geographic locaiton of the source and destinations of firewall log events

Hard Systems Issues Here Scale Network churn Soft-state maintenance Timing/synchronization No central administration Debugging and software engineering Not to mention: Optimization Security Semantics Etc. Scale = query broadcast on a tree not a bus, more generally, need to due things sublinearly using trees rather than lineraly Churn = data and computation nodes changing constantly, must adapt particularly when executing CQ Soft-State = in-network data must be periodically refreshed, causes tradeoffs in processing Timing = use example, data before query No central admin = everything bound at runtime, must tolerate all exceptions Debugging and SW engineering = unexpected interactions in distributed system difficult to isolate and fix, will come up again in this talk

Context for this Talk Core Dataflow Engine Overlay Network Physical Network You starting with a Declarative Query which is optimized to a query plan. The query plan then must be executed, rather than running directly on the physical network, use a layer of abstraction, an overlay network, to hid the complexities of the physical world. This talk is about how we execute a query plan over a given overlay network Snapshot and continuous queries Recursive queries “Boxes and arrows” dataflow engine Declarative Queries Query Plan

Initial Design Assumptions Database Oriented: Data independence, from disk to network General-purpose dataflow engine Focus on relational operators Network Oriented: “Best Effort” P2P architecture All nodes are “equal” No explicit hierarchy No single owner Overlay Network/Distributed Hash Tables (DHT) Highly scalable per-operation overheads grow logarithmically Little routing state stored at each node Resilient to network churn

Design Decisions Decouple Storage Software Engineering PIER is just the query engine, no storage Query data that is in situ Give up ACID guarantees Why not a design p2p storage manager too? Not needed for many applications Hard problem in itself – leave for others to solve (or not) Software Engineering Distributed systems are complicated Important design decision: “native simulation” Simulated network, native application code Reuse of complex distributed logic Overlay network provides this logic, with narrow interface Design challenge: get lots of functionality from this simple interface Traditinoal DBs couple storage to help provide ACID => We have to give it up anyway => data is already there, for example in filesharing the files are already in a filesystem Leave for others = (examples are FarSight, Oceanstore, Ivy, Past, etc., these systems are still being developed) Distributed systems are complicated = debugging is hard, there a few tools, [[ WAR STORY ]] Native simulation = we’ll come back to these shortly11

Overview of Distributed Hash Tables (DHTs) DHT interface is just a “hash table”: Put(key, value), Get(key) K V K V (K1,V1) K V K V K V K V Last point: Other DHT researchers tend to build single-function “systems” on top of DHT (e.g. SIGCOMM 2004: systems on agg query, or multi-d range query) K V K V K V K V K V K V put(K1,V1) get(K1)

Integrating Network and Database Research Initial design goal was to use the DHT  became a major piece of the architecture On that simple interface, we can build many DBMS components Query Dissemination Broadcast (scan) Content-based Unicast (hash index) Content-based Multicast (range index) Partitioned Parallelism (Exchange, Map/Reduce) Operator Internal State Hierarchical Operators (Aggregation and Joins) Essentially, DHT is a data independence mechanism for Nets Our DB viewpoint led us to reuse DHTs far more broadly Last point: Other DHT researchers tend to build single-function “systems” on top of DHT (e.g. SIGCOMM 2004: systems on agg query, or multi-d range query)

Outline Application Space and Context Design Decisions Overview of Distributed Hash Tables Architecture Native Simulation Non-blocking Iterator Dataflow Query Dissemination Hierarchical Operators System Status Future Work

Native Simulation Idea: simulated network, but the very same application code No #ifdef SIMULATOR What’s it good for Simulation: Algorithmic logic bugs & scaling experiments Native simulation: implementation errors, large-system issues Architecture PIER use events not threads Nice for efficiency, asynchronous I/O More critical: fits naturally with discrete-event network simulator Virtual Runtime Interface (VRI) consists only of: System clock Event scheduler UDP/TCP network calls Local storage At runtime bind the VRI to either the simulator or the OS #ifdef = a matter of discipline not to do special case code -> insures you debug the real code Large-systems issues = many components, interaction of components Discrete-event = standard practice VRI = layer of indrection, challenging to decouple what needs to be simulated, we were able to make that very narrow consisting only of the system clock, scheduler, and network calls

Architecture Overview Same Code Just show off same code / I don’t have time to discuss the details below the VRI, but they are in our paper Physical Runtime Simulation

Non-Blocking Iterator Problem: Traditional iterator (pull) model is blocking This didn’t matter much in disk-based DBMSs Many have looked at this problem Turns out none of the literature fit naturally Recall: event-driven, network-bound system Our Solution: Non-blocking iterator Always decouple control flow from the data flow Pull for the control flow Push for the data flow Natural combination of DB and Net SW engineering E.g. “iterators” meets “active messages” Simple function calls except at points of asynchrony Disk-based DBMS -> w/ Network I/O which is usually compartively long latency and underutilized, the same or differenent I/O sources may be able to handle additional requests Our Solution = we designed a new solution, a non-blocking iterator Iterator = comes from databases, “active messages” = comes from networking Simple function call = which we illustrate in the next slide

Non-Blocking Iterator (cont’d) (Local) Index Join Result data probe * Data S Data R Selection 2 Result Join R & S Join R & S Selection 1 Join R & S Selection 2 Selection 1 Join R & S data probe * probe s=x data Data R Result Data S PIER Backend More detail in talk about function calls and point out whats going on, use laser Selection 1 Selection 2 Stack probe * data data probe s=x Data -- R Data -- S

Query Dissemination Problem: Need to get the query to the right nodes Which are they? How to reach just them? Akin to DB “access methods” steering queries to disk blocks Traditional DB indexes not well suited to Internet scale Networking view: content-based multicast A topic of research in overlay networks Note IP multicast not content-based: list of IP addresses Our solution: leverage DHT Queries disseminated by “put()-ing” them E.g., DHT can route equality selections natively For more complex queries, we add more machinery on top of DHTs E.g. range selections E.g. more complex queries TRANSITION FROM PREVIOUS SLIDE = local dataflow implementation => distributed issues We do not have time to go into how these techniques work.

Hierarchical Operators We use DHTs as our basic routing infrastructure A multi-hop network If all nodes route toward a single node, a tree is formed This provides a natural hierarchical distributed QP infrastructure Opportunities for optimization Hierarchical Aggregation Combine data early in path Spread in-bandwidth (fan-in) Hierarchical Joins Produce answers early Spread out-bandwidth 15 1 14 2 13 3 12 4 11 5 10 6 9 7 8

Hierarchical Aggregation 1 6 1 3 1 1 1 1

Hierarchical Joins A12 A21 A32 Assume a cross product 3 R tuples and 3 S tuples = 9 results A23 A22 A13 A31 R1 R3 R3 R2 R2 S2 S2 A33 S1 S3 A11 R1 R1 S1 S1 S3 S3

PIER Status Running 24x7 on 400+ PlanetLab nodes (Global test bed on 5 continents) Demo application of network security monitoring Gnutella proxy implementation [VLDB 04] Network route construction with recursive PIER queries [HotNets 04] Gnutella- 50 pier nodes, 100,000 nodes, 63k queries in one hour Last bullet – example of pulling database technology into network fabric

Future Work Continuing Research Optimization Security Static optimization vs. Distributed eddies Multi-Query optimization Security Result fidelity Resource management Accountability Politics and Privacy Discussion of these topics in the paper