Building diagnosable distributed systems Petros Maniatis Intel Research Berkeley ICSI – Security Crystal Ball.

Slides:



Advertisements
Similar presentations
P2: Implementing Declarative Overlays
Advertisements

Routing Complexity of Faulty Networks Omer Angel Itai Benjamini Eran Ofek Udi Wieder The Weizmann Institute of Science.
Intel Research Timothy Roscoe P2: Implementing Declarative Overlays Timothy Roscoe Boon Thau Loo, Tyson Condie, Petros Maniatis, Ion Stoica, David Gay,
Implementing Declarative Overlays Boon Thau Loo 1 Tyson Condie 1, Joseph M. Hellerstein 1,2, Petros Maniatis 2, Timothy Roscoe 2, Ion Stoica 1 1 University.
Declarative Networking Mothy Joint work with Boon Thau Loo, Tyson Condie, Joseph M. Hellerstein, Petros Maniatis, Ion Stoica Intel Research and U.C. Berkeley.
Intel Research Petros Maniatis, IRB Declarative Overlays Petros Maniatis joint work with Tyson Condie, David Gay, Joseph M. Hellerstein, Boon Thau Loo,
Delta Debugging and Model Checkers for fault localization
Implementing Declarative Overlays From two talks by: Boon Thau Loo 1 Tyson Condie 1, Joseph M. Hellerstein 1,2, Petros Maniatis 2, Timothy Roscoe 2, Ion.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Evaluation of a Scalable P2P Lookup Protocol for Internet Applications
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Berkeley dsn declarative sensor networks problem David Chu, Lucian Popa, Arsalan Tavakoli, Joe Hellerstein approach related dsn architecture status  B.
Fast Algorithms For Hierarchical Range Histogram Constructions
Object-Oriented Software Construction Bertrand Meyer 2nd ed., Prentice Hall, 1997.
VeriCon: Towards Verifying Controller Programs in SDNs (PLDI 2014) Thomas Ball, Nikolaj Bjorner, Aaron Gember, Shachar Itzhaky, Aleksandr Karbyshev, Mooly.
Issues of Security and Privacy in Networking in the CBA Karen Sollins Laboratory for Computer Science July 17, 2002.
Transaction.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Hash-Based IP Traceback Best Student Paper ACM SIGCOMM’01.
Small-world Overlay P2P Network
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.
1 University of Freiburg Computer Networks and Telematics Prof. Christian Schindelhauer Wireless Sensor Networks 22nd Lecture Christian Schindelhauer.
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Languages for Software-Defined Networks Nate Foster, Arjun Guha, Mark Reitblatt, and Alec Story, Cornell University Michael J. Freedman, Naga Praveen Katta,
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
MATE: MPLS Adaptive Traffic Engineering Anwar Elwalid, et. al. IEEE INFOCOM 2001.
SIMULATING A MOBILE PEER-TO-PEER NETWORK Simo Sibakov Department of Communications and Networking (Comnet) Helsinki University of Technology Supervisor:
Hashing it Out in Public Common Failure Modes of DHT-based Anonymity Schemes Andrew Tran, Nicholas Hopper, Yongdae Kim Presenter: Josh Colvin, Fall 2011.
Introduction to Systems Analysis and Design Trisha Cummings.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
GrIDS -- A Graph Based Intrusion Detection System For Large Networks Paper by S. Staniford-Chen et. al.
Using Queries for Distributed Monitoring and Forensics Atul Singh Rice University Peter Druschel Max Planck Institute for Software Systems Timothy Roscoe.
Identifying Reversible Functions From an ROBDD Adam MacDonald.
Hot Topics in OS Research Andy Wang COP 5611 Advanced Operating Systems.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Lecture #12 Distributed Algorithms (I) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.
Software Development. Software Developers Refresher A person or organization that designs software and writes the programs. Software development is the.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
Software Engineering Principles. SE Principles Principles are statements describing desirable properties of the product and process.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
1 Computer Communication & Networks Lecture 21 Network Layer: Delivery, Forwarding, Routing Waleed.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou.
Manish Kumar,MSRITSoftware Architecture1 Remote procedure call Client/server architecture.
Mix networks with restricted routes PET 2003 Mix Networks with Restricted Routes George Danezis University of Cambridge Computer Laboratory Privacy Enhancing.
Superstabilizing Protocols for Dynamic Distributed Systems Authors: Shlomi Dolev, Ted Herman Presented by: Vikas Motwani CSE 291: Wireless Sensor Networks.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
5 Copyright © 2008, Oracle. All rights reserved. Testing and Validating a Repository.
Chapter 10 Software quality. This chapter discusses n Some important properties we want our system to have, specifically correctness and maintainability.
Application architectures Advisor : Dr. Moneer Al_Mekhlafi By : Ahmed AbdAllah Al_Homaidi.
Copyright 1999 G.v. Bochmann ELG 7186C ch.1 1 Course Notes ELG 7186C Formal Methods for the Development of Real-Time System Applications Gregor v. Bochmann.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Martin Casado, Nate Foster, and Arjun Guha CACM, October 2014
Software Design and Architecture
Database management concepts
Database management concepts
Presentation transcript:

Building diagnosable distributed systems Petros Maniatis Intel Research Berkeley ICSI – Security Crystal Ball

…and how that’s going to solve all of our (security) problems Petros Maniatis Intel Research Berkeley ICSI – Security Crystal Ball

6/17/2015Petros Maniatis3 Why distributed systems are hard In every application, organizing distributed resources has unique needs Low latency, high bandwidth, high reliability, tolerance to churn, anonymity, long-term preservation, undeniable transactions, … So, for every application one must Figure out right properties Get the algorithms/protocols Implement them Tune them Test them Debug them Repeat

6/17/2015Petros Maniatis4 Why distributed systems are hard (among other things) In every application, organizing distributed resources has unique needs Low latency, high bandwidth, high reliability, tolerance to churn, anonymity, long-term preservation, undeniable transactions, … So, for every application one must Figure out right properties  No global view Get the algorithms/protocols  Bad algorithm choice Implement them  Incorrect implementation Tune them  Psychotic timeouts Test them  Partial failures Debug them  Biased introspection Repeat  Homicidal boredom

6/17/2015Petros Maniatis5 (Security) Problems in Distributed Systems Local problems contributing to general mess Stepping stones Statistics fudging –Generate traffic that reduces discrimination value of traffic analysis Information hiding –Decorrelate links in multi-hop causal analyses Distributed problems contributing to mess Graph properties inconsistent with detection assumptions –Statistical tools assume trees but graph is an expander

6/17/2015Petros Maniatis6 Diagnostic vs. Diagnosable Systems Diagnostic is good Complex query processing Probes into distributed system exporting data to be queried Nice properties –Privacy, efficiency, timeliness, consistency, accuracy, verifiability Diagnosable is even better Information flow can be observed Control flow can be observed Observations can be related to system specification –no need for “translation” If you are really lucky, you can get both

6/17/2015Petros Maniatis7 Strawman Design: P2 (Intel, UCB, Rice, MPI) Intended for user-level distributed system design, implementation, monitoring Specify high-level system properties Abstract topology Transport properties Currently: in a variant of Datalog Looks like event-condition-action rules Location specifiers place conditions in space Rendezvous happens under the covers

6/17/2015Petros Maniatis8 Strawman Design: P2 Intended for user-level distributed system design, implementation, monitoring Specify high-level system properties Abstract topology Transport properties Currently: in a variant of Datalog Looks like event-condition-action rules Location specifiers place conditions in space Rendezvous happens under the covers SAddr1) :- Key), SAddr), SAddr1), Key in (NodeID, Succ]. In English: Return a response to Req When node NAddr receives lookup for key Key And that key lies between NAddr’s identifier and that of its successor’s successor

6/17/2015Petros Maniatis9 Strawman Design: P2 Specification to Implementation Automatically compile specification into a dataflow graph Elements are little-functionality operators –Filters, multiplexers, loads, stores Elements are interlinked with asynchronous data flows –Data unit: a typed, flat tuple Reversible optimizations happen below

6/17/2015Petros Maniatis10 Strawman Design: P2 Diagnosis Flows are observable Treat them as any other data stream, store, query them… Element state transitions are observable Similarly, store them, query them, export them…

6/17/2015Petros Maniatis11 Strawman Design: P2 Execution Tracing Execution is observable Whenever a rule creates an output, I can log the inputs that caused that output Generate a causal stream at the granularity of the user’s specification steps

6/17/2015Petros Maniatis12 What we know we can do (e.g., in overlays) Monitoring invariants: a distributed watchpoint “A node’s in-degree is greater than k” “Routing is inconsistent” “A node oscillates into and out of a neighborhood’s routing tables” Execution tracing at “pseudo-code” granularity: logical stepping Obtain the causality graph of a failed lookup (what steps led to failure) Find failed lookups with state oscillations in its history (filter failure cause) Install additional sensors where state oscillations usually cause lookup failure Mining of causality graph on-line Complex programmatic system views Run a consistent snapshot on the state of the system Query the consistent snapshot on stable properties Take system checkpoints

6/17/2015Petros Maniatis13 Opportunities

6/17/2015Petros Maniatis14 What we’d love to be able to do but don’t quite know exactly how yet Small set of basic components Relational operators, flow operators, network operators Should be able to formally validate them Then compose to statically validate entire graph (termination, information flow, etc.) Verifiable dataflow graph execution Attest all dataflow elements in isolation Attest resulting dataflow graph Undeniable causality graphs Tamper-evident logs per dataflow element (Automatically) composable proofs of compliant execution of dataflow graph Smaller combinatorial problems: Chord overlay: ~10000 individual lines of code in C++ Compare that to our P2Chord: ~300 elements, ~12 reusable types, + engine

6/17/2015Petros Maniatis15 What we’d love to be able to do but don’t quite know exactly how yet What might be a useful checked invariant? Known protocol weakness we don’t know how to fix Design a distributed query checking for it When it triggers, abort, work less slowly, notify authorities, give random answer, … Graph rewrites that preserve specific properties Information declassification (look at Saberfeld et al) Routing path decorrelation Who cares about verifiable, user-level distributed applications? Monitoring within disparate organizations Who cares about undeniable, user-level distributed applications? Same, when liability is involved (e.g., anti-framing)

6/17/2015Petros Maniatis16 What do we have here? Constrained programming/execution environment Typed information flow Very modular design Small set of functional modules can build large set of combinations Only reversible optimizations allowed But we’re nowhere near solving the malicious code understanding prob Can we afford to take the performance hit? Fairly coarse granularity Can you write a general-purpose Distributed FS on this? Hmm… Thesis: Defining a usable (if not complete) programming paradigm that allows “effortless” due diligence might improve the situation

Thank You!