Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State.

Slides:



Advertisements
Similar presentations
Disorderly Distributed Programming with Bloom
Advertisements

Implementing Declarative Overlays From two talks by: Boon Thau Loo 1 Tyson Condie 1, Joseph M. Hellerstein 1,2, Petros Maniatis 2, Timothy Roscoe 2, Ion.
BloomUnit Declarative testing for distributed programs Peter Alvaro UC Berkeley.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Logic and Lattices for Distributed Programming Neil Conway, William R. Marczak, Peter Alvaro, Joseph M. Hellerstein UC Berkeley David Maier Portland State.
Distributed Programming and Consistency: Principles and Practice Peter Alvaro Neil Conway Joseph M. Hellerstein UC Berkeley.
Lecture 18-1 Lecture 17-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Hilfi Alkaff November 5, 2013 Lecture 21 Stream Processing.
PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
A Dependable Auction System: Architecture and an Implementation Framework
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Software Connectors. Attach adapter to A Maintain multiple versions of A or B Make B multilingual Role and Challenge of Software Connectors Change A’s.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Overview Distributed vs. decentralized Why distributed databases
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Distributed Systems Fall 2011 Gossip and highly available services.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
1 A Framework for Highly Available Services Based on Group Communication Alan Fekete Idit Keidar University of Sidney MIT.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
.NET Mobile Application Development Introduction to Mobile and Distributed Applications.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Exploiting Application Semantics: Harvest, Yield CS 444A Fall 99 Software for Critical Systems Armando Fox & David Dill © 1999 Armando Fox.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 13 Slide 1 Application architectures.
Accelerating Mobile Applications through Flip-Flop Replication
Managing Service Metadata as Context The 2005 Istanbul International Computational Science & Engineering Conference (ICCSE2005) Mehmet S. Aktas
Disorderly programming for a distributed world Peter Alvaro UC Berkeley.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
1. Big Data A broad term for data sets so large or complex that traditional data processing applications ae inadequate. 2.
Presented by Dr. Greg Speegle April 12,  Two-phase commit slow relative to local transaction processing  CAP Theorem  Option 1: Reduce availability.
Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) Ryan Huebsch † Joe Hellerstein †, Nick Lanham †, Boon Thau Loo.
13-1 Application Architecture Application architecture – a specification of the technologies to be used to implement information systems. The blueprint.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 
Cloud Programming: From Doom and Gloom to BOOM and Bloom Peter Alvaro, Neil Conway Faculty Recs: Joseph M. Hellerstein, Rastislav Bodik Collaborators:
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Disorderly programming. CALM analysis.. The future is already here Nearly all nontrivial systems are (or are becoming) distributed Programming distributed.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Software Engineering Issues Software Engineering Concepts System Specifications Procedural Design Object-Oriented Design System Testing.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Slide 1 Lecture 15 Enterprise Systems Development ( CSC447 ) COMSATS Islamabad Muhammad Usman, Assistant Professor.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Bloom: Big Systems, Small Programs Neil Conway UC Berkeley.
Consistency Analysis in Bloom: a CALM and Collected Approach Authors: Peter Alvaro, Neil Conway, Joseph M. Hellerstein, William R. Marczak Presented by:
Table General Guidelines for Better System Performance
SOFTWARE DESIGN AND ARCHITECTURE
CSCI5570 Large Scale Data Processing Systems
Software Design and Architecture
Building a Database on S3
Table General Guidelines for Better System Performance
Distributed Systems CS
Slides for Chapter 18: Replication
Presentation transcript:

Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State

Distributed systems are hard AsynchronyPartial Failure

Asynchrony isn’t that hard Logical timestamps Deterministic interleaving Ameloriation:

Partial failure isn’t that hard Replication Replay Ameloriation:

Asynchrony * partial failure is hard 2 Logical timestamps Deterministic interleaving Replication Replay

asynchrony * partial failure is hard 2 Replication Replay Today: Consistency criteria for fault-tolerant distributed systems Blazes: analysis and enforcement

This talk is all setup Frame of mind: 1.Dataflow: a model of distributed computation 2.Anomalies: what can go wrong? 3.Remediation strategies 1.Component properties 2.Delivery mechanisms Framework: Blazes – coordination analysis and synthesis

Little boxes: the dataflow model Generalization of distributed services Components interact via asynchronous calls (streams)

Components Input interfacesOutput interface

Streams Nondeterministic order

Example: a join operator R S T

Example: a key/value store put get response

Example: a pub/sub service publish subscribe deliver

Logical dataflow “Software architecture” Data source client Service X filter cache c a b

Dataflow is compositional Components are recursively defined Data source client Service X filter aggregator

Dataflow exhibits self-similarity

DBHDFS Hadoop Index Combine Static HTTP App1App2Buy Content User requests App1 answers App2 answers

Physical dataflow

Data source client Service X filter aggregator c a b

Physical dataflow Data source Service Xfilter aggregator client “System architecture”

What could go wrong?

Cross-run nondeterminism Data source client Service X filter aggregator c a b Run 1 Nondeterministic replays

Cross-run nondeterminism Data source client Service X filter aggregator c a b Nondeterministic replays Run 2

Cross-instance nondeterminism Data source Service X client Transient replica disagreement

Divergence Data source Service X client Permanent replica disagreement

Hazards Data source client Service X filter aggregator c a b Order  Contents?

Preventing the anomalies 1.Understand component semantics (And disallow certain compositions)

Component properties Convergence –Component replicas receiving the same messages reach the same state –Rules out divergence

InsertRead Convergent data structure (e.g., Set CRDT) Convergence Insert Read Commutativity Associativity Idempotence Reordering Batching Retry/duplication Tolerant to

Convergence isn’t compositional Data source client Convergent (identical input contents  identical state)

Component properties Convergence –Component replicas receiving the same messages reach the same state –Rules out divergence Confluence –Output streams have deterministic contents –Rules out all stream anomalies Confluent  convergent

Confluence output set = f(input set) { } =

Confluence is compositional output set = f  g(input set)

Preventing the anomalies 1.Understand component semantics (And disallow certain compositions) 2.Constrain message delivery orders 1.Ordering

Ordering – global coordination Deterministic outputs Order-sensitive

Ordering – global coordination Data source client The first principle of successful scalability is to batter the consistency mechanisms down to a minimum. – James Hamilton

Preventing the anomalies 1.Understand component semantics (And disallow certain compositions) 2.Constrain message delivery orders 1.Ordering 2.Barriers and sealing

Barriers – local coordination Deterministic outputs Data source client Order-sensitive

Barriers – local coordination Data source client

Sealing – continuous barriers Do partitions of (infinite) input streams “end”? Can components produce deterministic results given “complete” input partitions? Sealing: partition barriers for infinite streams

Sealing – continuous barriers Finite partitions of infinite inputs are common …in distributed systems –Sessions –Transactions –Epochs / views …and applications –Auctions –Chats –Shopping carts

Blazes: consistency analysis + coordination selection

Blazes: Mode 1: Grey boxes

Grey boxes Example: pub/sub x = publish y = subscribe z = deliver x y z Deterministic but unordered SeverityLabelConfluentStateless 1CRXX 2CWX 3OR gate X 4OW gate x->z : CW y->z : CWT

Grey boxes Example: key/value store x = put; y = get; z = response x y z Deterministic but unordered SeverityLabelConfluentStateless 1CRXX 2CWX 3OR gate X 4OW gate x->z : OW key y->z : ORT

Label propagation – confluent composition CW CR Deterministic outputs CW

Label propagation – unsafe composition OW CR Tainted outputs Interposition point

Label propagation – sealing OW key CR Deterministic outputs OW key Seal(key=x)

Blazes: Mode 1: White boxes

white boxes module KVS state do interface input, :put, [:key, :val] interface input, :get, [:ident, :key] interface output, :response, [:response_id, :key, :val] table :log, [:key, :val] end bloom do log <+ put log :key) response :key) do |s,l| [l.ident, s.key, s.val] end put  response: OW key get  response: OR key Negation (  order sensitive) Partitioned by :key

white boxes module PubSub state do interface input, :publish, [:key, :val] interface input, :subscribe, [:ident, :key] interface output, :response, [:response_id, :key, :val] table :log, [:key, :val] table :sub_log, [:ident, :key] end bloom do log <= publish sub_log <= subscribe response :key) do |s,l| [l.ident, s.key, s.val] end publish  response: CW subscribe  response: CR

The Blazes frame of mind: Asynchronous dataflow model Focus on consistency of data in motion –Component semantics –Delivery mechanisms and costs Automatic, minimal coordination

Queries?