Disorderly Distributed Programming with Bloom

Slides:

Advertisements

Similar presentations

Declarative Networking: Language, Execution and Optimization Boon Thau Loo 1, Tyson Condie 1, Minos Garofalakis 2, David E. Gay 2, Joseph M. Hellerstein.

Advertisements

Recap: Mining association rules from large datasets

Edelweiss: Automatic Storage Reclamation for Distributed Programming Neil Conway Peter Alvaro Emily Andrews Joseph M. Hellerstein University of California,

CSCI 115 Chapter 6 Order Relations and Structures.

16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.

Implementing Declarative Overlays From two talks by: Boon Thau Loo 1 Tyson Condie 1, Joseph M. Hellerstein 1,2, Petros Maniatis 2, Timothy Roscoe 2, Ion.

BloomUnit Declarative testing for distributed programs Peter Alvaro UC Berkeley.

Cloud Programming: From Doom and Gloom to BOOM and Bloom Neil Conway UC Berkeley Joint work with Peter Alvaro, Ras Bodik, Tyson Condie, Joseph M. Hellerstein,

1 Combinational Logic Design&Analysis. 2 Introduction We have learned all the prerequisite material: – Truth tables and Boolean expressions describe functions.

Title: Intelligent Agents A uthor: Michael Woolridge Chapter 1 of Multiagent Systems by Weiss Speakers: Tibor Moldovan and Shabbir Syed CSCE976, April.

P ROGRAMMING L ANGUAGES : C ONTROL 1. S LIDES R EFERENCES Kenneth C. Louden, Control I: Expressions and Statements: Principles and Practice. 2.

Declarative Distributed Programming with Dedalus and Bloom Peter Alvaro, Neil Conway UC Berkeley.

Logic and Lattices for Distributed Programming Neil Conway, William R. Marczak, Peter Alvaro, Joseph M. Hellerstein UC Berkeley David Maier Portland State.

Logic and Lattices for Distributed Programming Neil Conway UC Berkeley Joint work with: Peter Alvaro, Peter Bailis, David Maier, Bill Marczak, Joe Hellerstein,

Distributed Programming and Consistency: Principles and Practice Peter Alvaro Neil Conway Joseph M. Hellerstein UC Berkeley.

Distributed databases

1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.

Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.

Distributed Systems Spring 2009

PTIDES: Programming Temporally Integrated Distributed Embedded Systems Yang Zhao, EECS, UC Berkeley Edward A. Lee, EECS, UC Berkeley Jie Liu, Microsoft.

7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Causality Interfaces for Actor Networks Ye Zhou and Edward A. Lee University of California,

Discrete Mathematics Lecture 4 Harper Langston New York University.

Programming Language Semantics Mooly SagivEran Yahav Schrirber 317Open space html://

Technische Universität München Institut für Informatik D München, Germany Realizability of System Interface Specifications Manfred Broy.

A Denotational Semantics For Dataflow with Firing Edward A. Lee Jike Chong Wei Zheng Paper Discussion for.

Data Flow Analysis Compiler Design Nov. 8, 2005.

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.

1 Ivan Lanese Computer Science Department University of Bologna Italy Concurrent and located synchronizations in π-calculus.

Orderings and Bounds Parallel FSM Decomposition Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 10 Update and modified by Marek.

Cs3431 Relational Algebra : #I Based on Chapter 2.4 & 5.1.

Chapter 14: Object-Oriented Data Modeling

1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.

Chapter 6: Objections to the Physical Symbol System Hypothesis.

Solving fixpoint equations

Disorderly programming for a distributed world Peter Alvaro UC Berkeley.

Database Management 9. course. Execution of queries.

Peter van Emde Boas: Games and Computer Science 1999 GAMES AND COMPUTER SCIENCE Theoretical Models 1999 Peter van Emde Boas References available at:

Boolean Algebra and Computer Logic Mathematical Structures for Computer Science Chapter 7.1 – 7.2 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Boolean.

CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.

1 Relational Algebra. 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of data from a database. v Relational model supports.

Cloud Programming: From Doom and Gloom to BOOM and Bloom Peter Alvaro, Neil Conway Faculty Recs: Joseph M. Hellerstein, Rastislav Bodik Collaborators:

1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)

1 Relational Algebra and Calculas Chapter 4, Part A.

Relational Algebra.

MQTT QoS2 Considerations Konstantin Dotchkoff. Challenges associated with implementing QoS 2 in large scale distributed systems Replication of QoS 2 messages.

Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.

Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.

A Lattice Model of Secure Information Flow By Dorothy E. Denning Presented by Drayton Benner March 22, 2000.

Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State.

CS 240A: Databases and Knowledge Bases Analysis of Active Databases Carlo Zaniolo Department of Computer Science University of California, Los Angeles.

Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo University of California, San Diego Euro-Par Conference, Lisbon,

LECTURE 4 Logic Design. LOGIC DESIGN We already know that the language of the machine is binary – that is, sequences of 1’s and 0’s. But why is this?

1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.

Bloom: Big Systems, Small Programs Neil Conway UC Berkeley.

Consistency Analysis in Bloom: a CALM and Collected Approach Authors: Peter Alvaro, Neil Conway, Joseph M. Hellerstein, William R. Marczak Presented by:

Lec 3: Object-Oriented Data Modeling

Distributed Systems, Consensus and Replicated State Machines

Relational Algebra : #I

Concurrent Models of Computation

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Lecture 20: Dataflow Analysis Frameworks 11 Mar 02

Background material.

EEC 688/788 Secure and Dependable Computing

Background material.

A task of induction to find patterns

Index Structures Chapter 13 of GUW September 16, 2019

Presentation transcript:

Disorderly Distributed Programming with Bloom Emily Andrews, Peter Alvaro, Peter Bailis, Neil Conway, Joseph M. Hellerstein, William R. Marczak UC Berkeley David Maier Portland State University

Conventional Programming

Distributed Programming

Different nodes might perceive different event orders Problem: Different nodes might perceive different event orders

Taking Order For Granted Data (Ordered) array of bytes Compute (Ordered) sequence of instructions Writing order-sensitive programs is too easy!

Enforce consistent event order at all nodes Alternative #1: Enforce consistent event order at all nodes Extensive literature: Replicated state machines Consensus, coordination Group communication “Strong Consistency”

Alternative #1: Enforce consistent event order at all nodes Problems: Latency Availability

Alternative #2: Analyze all event orders, ensure correct behavior

Alternative #2: Analyze all event orders to ensure correct behavior Problem: That is really, really hard

Alternative #3: Write order-independent (“disorderly”) programs

Alternative #3: Write order-independent (“disorderly”) programs Questions: How to write such programs? What can we express in a disorderly way?

Disorderly Programming Program analysis: CALM Where is order needed? And why? Language design: Bloom Order-independent by default Ordering is possible but explicit Mixing order and disorder: Blazes Order synthesis and optimization Algebraic programming

CALM: Consistency As Logical Monotonicity

History Roots in UCB database research (~2005) High-level, declarative languages for network protocols & distributed systems “Small programs for large clusters” (BOOM) Distributed programming with logic State: sets (relations) Computation: deductive rules over sets SQL, Datalog, etc.

Observation: Much of Datalog is order-independent.

As input set grows, output set does not shrink Monotonic Logic As input set grows, output set does not shrink “Mistake-free” Order independent e.g., map, filter, join, union, intersection Non-Monotonic Logic New inputs might invalidate previous outputs Order sensitive e.g., aggregation, negation

Agents learn strictly more knowledge over time Different learning order, same final outcome Deterministic outcome, despite network non-determinism (“Confluent”)

Confluent Programming

Consistency As Logical Monotonicity CALM Analysis (CIDR’11) Monotone programs are deterministic Simple syntactic test for monotonicity Result: Whole-program static analysis for eventual consistency Big deal!! Contrast with CRDT guarantees.

CALM: Beyond Sets Monotonic logic: growing sets Partial order: set containment Expressive but sometimes awkward Timestamps, version numbers, threshold tests, directories, sequences, …

Challenge: Extend monotone logic to support other flavors of “growth over time”

hS,t,?i is a bounded join semilattice iff: S is a set t is a binary operator (“least upper bound”) Induces a partial order on S: x ·S y if x t y = y Associative, Commutative, and Idempotent “ACID 2.0” Informally, LUB is “merge function” for S ? is the “least” element in S 8x 2 S: ? t x = x

Increasing Int (t = Max) Time Set (t = Union) Increasing Int (t = Max) Boolean (t = Or)

f : ST is a monotone function iff: 8a,b 2 S : a ·S b ) f(a) ·T f(b)

Increasing Int (t = Max) Time Monotone function: set  increase-int Monotone function: increase-int  boolean size() >= 3 Set (t = Union) Increasing Int (t = Max) Boolean (t = Or)

The Bloom Programming Language

Bloom Basics Communication Message passing between agents State Lattices Computation Functions over lattices “Disorderly” Computation Monotone functions

Bloom Operational Model

Quorum Vote Annotated Ruby class Program state Program logic QUORUM_SIZE = 5 RESULT_ADDR = "example.org" class QuorumVote include Bud state do channel :vote_chn, [:@addr, :voter_id] channel :result_chn, [:@addr] lset :votes lmax :vote_cnt lbool :got_quorum end bloom do votes <= vote_chn {|v| v.voter_id} vote_cnt <= votes.size got_quorum <= vote_cnt.gt_eq(QUORUM_SIZE) result_chn <~ got_quorum.when_true { [RESULT_ADDR] } Annotated Ruby class Communication interfaces: non-deterministic delivery order! Program state Lattice state declarations Accumulate votes into set Merge at non-deterministic future time Monotone function: set  max Monotone function: max  bool Program logic Merge new votes together with stored votes (set LUB) Merge using lmax LUB Threshold test on bool (monotone)

Some Features of Bloom Library of built-in lattice types Booleans, increasing/decreasing integers, sets, multisets, maps, sequences, … API for defining custom lattice types Support both relational-style rules and functions over lattices Model-theoretic semantics (“Dedalus”) Logic + state update + async messaging

Ongoing Work Runtime Tools Software stack Current implementation: Ruby DSL Next generation: JavaScript, code generation Also target JVM, CLR, MapReduce Tools BloomUnit: Distributed testing / debugging Verification of lattice type implementations Software stack Concurrent editing, version control Geo-replicated consistency control

CRDTs vs. Bloom Similarities Differences Focus on commutative operations Formalized via join semilattices Monotone functions  composition of CRDTs Similar design patterns (e.g., need for GC) Differences Approach: language design vs. ADTs Correctness: confluence vs. convergence Confluence is strictly stronger CRDT “query” is not necessarily monotone CRDTs more expressive?

Ongoing Work Runtime Tools Software stack Current implementation: Ruby DSL Next generation: JavaScript, code generation Also target JVM, CLR, MapReduce Tools BloomUnit: Distributed testing / debugging Verification of lattice type implementations Software stack Built: Paxos, HDFS, 2PC, lock manager, causal delivery, distributed ML, shopping carts, routing, task scheduling, etc. Working on: Concurrent editing, version control Geo-replicated consistency control

Blazes: Intelligent Coordination Synthesis

Mixing Order and Disorder Can these ideas scale to large systems? Ordering can rarely be avoided entirely Make order part of the design process Annotate modules with ordering semantics If needed, coordinate at module boundaries Philosophy Start with what we’re given (disorder) Create only what we need (order)

Tool Support Path analysis Coordination synthesis How does disorder flow through a program? Persistent vs. transient divergence Coordination synthesis Add “missing” coordination logic automatically

Coordination Synthesis Coordination is costly Help programmers use it wisely! Automatic synthesis of coordination logic Customize coordination code to match: Application semantics (logical) Network topology (physical)

Application Semantics Common pattern: “sessions” (Mostly) independent, finite duration During a session: Only coordinate among participants After a session: Session contents are sealed (immutable) Coordination is unnecessary!

Sealing Non-monotonicity  arbitrary change Very conservative! Common pattern in practice: Mutable for a short period Immutable forever after Example: bank accounts at end-of-day Example: distributed GC Once (global) refcount = 0, remains 0

Affinity Network structure affects coordination cost Example: m clients, n storage servers 1 client request  many storage messages Possible strategies: Coordinate among (slow?) clients Coordinate among (fast?) servers Related: geo-replication, intra- vs. inter-DC coordination, “sticky” sessions

Algebraic Programming

Adversarial Programming A confluent program must behave correctly for any network schedule Network as “adversary” What if we could control the network? Schedule only influences performance, not correctness Sounds like an optimizer!

Algebra vs. Ordering The developer writes two programs: Algebra defines program behavior Guaranteed to be order independent Language: high-level, declarative Ordering Spec controls input order Ordering, batching, timing Language: arbitrary (e.g., imperative) Need not be deterministic

Benefits Separate correctness and performance Might be developed independently! Wide range of freedom for optimization No risk of harming correctness Randomness, batching, parallelism, CPU affinity, data locality, … Auto-tuning/synthesis of ordering spec?

Examples Quicksort Algebra: Ordering Spec: Matrix Multiplication Input: values to sort, pivot elements Output: sorted list Ordering Spec: Ordering of pivots Matrix Multiplication Algebra: Input: sub-matrices Output: result of matrix multiply Ordering Spec: Tiling i.e., division of input matrices into pieces

In a distributed system, Order is precious! Let’s stop taking it for granted.

Recap The network is disorderly – embrace it! How can we write disorderly programs? State: join semilattices Computation: monotone functions When order is necessary, use it wisely A program’s ordering requirements should be a first-class concern!

Thank You! Questions Welcome