Geo-Distribution for Orleans Phil Bernstein, Sebastian Burckhardt, Sergey Bykov, Natacha Crooks, José Faleiro, Gabriel Kliot, Alok Kumbhare, Vivek Shah,

Slides:

Advertisements

Similar presentations

Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.

Advertisements

Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.

Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.

Implementing A Simple Storage Case Consider a simple case for distributed storage – I want to back up files from machine A on machine B Avoids many tricky.

The SMART Way to Migrate Replicated Stateful Services Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell Microsoft Research.

Spark: Cluster Computing with Working Sets

Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :

CS 582 / CMPE 481 Distributed Systems

CS 582 / CMPE 481 Distributed Systems Replication.

Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.

Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.

Cambodia-India Entrepreneurship Development Centre - : :.... :-:-

Message-Oriented Communication Synchronous versus asynchronous communications Message-Queuing System Message Brokers Example: IBM MQSeries 02 – 26 Communication/2.4.

Case Study - GFS.

File Systems (2). Readings r Silbershatz et al: 11.8.

Irwin/McGraw-Hill Copyright © 2000 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS5th Edition.

6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.

Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.

1 The Google File System Reporter: You-Wei Zhang.

Software Engineer, #MongoDBDays.

CSCI 6962: Server-side Design and Programming Introduction to AJAX.

6.4 Data And File Replication Presenter : Jing He Instructor: Dr. Yanqing Zhang.

What is Architecture  Architecture is a subjective thing, a shared understanding of a system’s design by the expert developers on a project  In the.

© 2011 Cisco All rights reserved.Cisco Confidential 1 APP server Client library Memory (Managed Cache) Memory (Managed Cache) Queue to disk Disk NIC Replication.

By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.

Replication & EJB Graham Morgan. EJB goals Ease development of applications –Hide low-level details such as transactions. Provide framework defining the.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.

Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &

Advanced Computer Networks Topic 2: Characterization of Distributed Systems.

Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

Hadi Salimi Distributed Systems Lab, Computer Engineering School, Iran University of Schience and Technology, Tehran, Iran Winter 2011.

Server to Server Communication Redis as an enabler Orion Free

IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.

Databases Illuminated

Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.

GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.

Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.

Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.

Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.

Fault Tolerant Services

MBA 664 Database Management Systems Dave Salisbury ( )

AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?

Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.

Highly Available Services and Transactions with Replicated Data Jason Lenthe.

Copyright © 2004, Keith D Swenson, All Rights Reserved. OASIS Asynchronous Service Access Protocol (ASAP) Tutorial Overview, OASIS ASAP TC May 4, 2004.

Network ManagerConnection Manager Connectivity and Messaging block Protocol Marshaller Factory.

Clustering in OpenDaylight

Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.

Distributed Computing & Embedded Systems Chapter 4: Remote Method Invocation Dr. Umair Ali Khan.

Distributed Databases

REPLICATED DATA TYPES - APPLICATIONS?. Remaining Topics The Orleans Actor System Extending Orleans with support for Geo- Distribution. GSP: a generalization.

The Case for a Session State Storage Layer

AlwaysOn Mirroring, Clustering

Alternative system models

Cluster Communications

6.4 Data and File Replication

Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.

Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016

MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner

The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.

THE GOOGLE FILE SYSTEM.

The SMART Way to Migrate Replicated Stateful Services

CSE 486/586 Distributed Systems Consistency --- 2

Presentation transcript:

Geo-Distribution for Orleans Phil Bernstein, Sebastian Burckhardt, Sergey Bykov, Natacha Crooks, José Faleiro, Gabriel Kliot, Alok Kumbhare, Vivek Shah, Adriana Szekeres, Jorgen Thelin October 23, 2015

Goals for the meeting  Explain the four main components of the proposed architecture  Solicit feedback on user requirements and design ideas  Connect with early adopters Multi-Cluster Configuration Global Single-Instance Grains Queued Grains Replication Providers

Why Geo-Distribution?  Better Service Latency  use datacenter close to client for servicing that client  Better Fault Tolerance  application is available despite datacenter failure 3

 Administration and Configuration  How to connect multiple clusters?  How to add/remove clusters on-the-fly?  Choose & Implement a Programming Model  Independent grain directories?  One grain instance globally?  Multiple grain copies that synchronize? 4 Multi-Cluster Configuration Global Single-Instance Grains Queued Grains Replication Providers (default) Design Challenges / Proposed Components

Talk Outline & Component Dependencies 5 Multi-Cluster Configuration Global Single-Instance Grains Queued Grains Replication Providers

Part 1: 6 Multi-Cluster Configuration

Multi-Cluster  Orleans manages each cluster as usual  But now, there can be more than one such cluster. 7 ACB Multi-Cluster Configuration

Physical Network  We assume that all nodes can connect via TCP  This may require some networking configuration (outside scope of Orleans) 8 ACB Multi-Cluster Configuration

US-West Europe (Production) Europe (Staging) Azure Multi-Cluster Example 9 ACB VNET GATEWAY VNET Storage Multi-Cluster Configuration

Discovery, Routing, Administration ?  To share routing and configuration information, connect clusters by two or more gossip channels  No single point of failure  Currently, each gossip channel is an Azure table 10 ACB Multi-Cluster Configuration Channel 1 Channel 2

Example: Configuration  On Cluster A:  On Cluster B:  On Cluster C: 11 Multi-Cluster Configuration

Connected ≠ Joined  Connecting to the multi-cluster network does not mean a cluster is part of the active configuration yet.  Active configuration  is controlled by administrator  carries a timestamp and a comment  never changes on its own (i.e. is completely independent of what nodes are connected, working, or down) Multi-Cluster Configuration

Changing active configuration  Administrator injects timestamped configuration on any cluster  Automatically spreads through gossip channels 13 var systemManagement = GrainClient.GrainFactory.GetGrain ( RuntimeInterfaceConstants.SYSTEM_MANAGEMENT_ID); systemManagement.InjectMultiClusterConfiguration( new Orleans.MultiCluster.MultiClusterConfiguration( DateTime.UtcNow, "A,B,C", "I am now adding cluster C" )); Multi-Cluster Configuration

Part 2: 14 Global Single-Instance Grains

 Grains marked as single-instance are kept in a separate directory that is coordinated globally  Calls to the grain are routed to  the cluster containing the current activation (if it exists)  a new activation in the caller’s cluster (otherwise) 15 Global Single-Instance Grains [GlobalSingleInstance] public class MyGrain : Orleans.Grain, IMyGrain {... }

Typical indications for using a global-single-instance grain  The vast majority of accesses to the grain are in one cluster.  But the grain needs to be instantiable in different clusters at different times, e.g., due to:  Changing access patterns  Cluster failure 16 Global Single-Instance Grains

Example: A Game Grain 17 Global Single-Instance Grains DC1 DC2 Silo Frontend Silo Frontend Clients Silo Single-Instance Game Grain

Example: A Game Grain 18 Global Single-Instance Grains DC1 DC2 Silo Frontend Silo Frontend Clients Silo New Single- Instance Game Grain

Guarantees  If cross-cluster communication is functional, at most one activation globally.  If cross-cluster communication is broken, there may be multiple activations temporarily  Duplicate instances are automatically deactivated once communication is restored  Same as for regular single-cluster grains today! 19 Global Single-Instance Grains

Part 3: 20 Queued Grains

Concerning Performance... Global coordination is relatively slow and not always needed. often o.k. to read cached state instead of latest often o.k. to update state asynchronously Idea: improve read/write performance by means of an API that supports Caching and Queueing. Queued Grains

 One Local state cached per cluster  One Global state somewhere (e.g. storage)  States remain automatically synchronized 22 Queued Grains ACB

Grain 23 Queued Grains QueuedGrain State ReadStateAsync WriteStateAsync ClearStateAsync (new API) State Cache + Queue Storage (new API) Cache + Queue Storage

24 Queued Grains Confirmed State Update Tentative State Update TGrainState ConfirmedState { get; } void EnqueueUpdate(IUpdateOperation update); IEnumerable > UnconfirmedUpdates { get; } TGrainState TentativeState { get; } Local public interface IUpdateOperation { void Update(TGrainState state); }

Automatic Propagation: Local -> Global  background process applies queued updates to global state  keep retrying on failures (e.g. e- tag, offline)  updates applied in order, no duplication 25 Queued Grains Confirmed State Update Tentative State Update Global State Actual State Network Remote Local

Automatic Propagation: Global -> Local  all changes to global state are pushed to confirmed state  updates are removed from queue when confirmed 26 Queued Grains Confirmed State Update Tentative State Update Global State Actual State Network Remote Local

How to make a queued grain  Define the grain interface (as usual)  Define the grain state (as usual)  For each update operation, define a new class that describes the update (as for ES)  Write the grain implementation 27 Queued Grains

Example: Chat Application  Users connect to a chat room (one grain = one chat room)  Users enter chat messages  Chat room displays last 100 messages Queued Grains

Example: Chat Grain Interface + State 29 public interface IChatGrain : IGrain { Task > GetMessages(); Task AddMessage(string message); } [Serializable] public class ChatState : GrainState { // list that stores last 100 messages public List Messages { get; set; } public ChatState() { Message = new List (); } } Queued Grains

Example: Chat Grain Update Operation 30 [Serializable] public class MessageAddedEvent: IUpdateOperation { public string Content { get; set; } public void Update(ChatState state) { state.Messages.Add(Content); // remove oldest message if necessary if (state.Messages.Count > 100) Messages.RemoveAt(0); } Queued Grains

Example: Chat Grain Implementation 31 public class ChatGrain : QueuedGrain, IChatGrain { public Task > GetMessages() { return Task.FromResult(this.TentativeState.Messages); } public Task AddMessage(string message) { EnqueueUpdate(new MessageAddedEvent() { Content = message } ); return TaskDone.Done; } Queued Grains

Datacenter Europe Chat Room – Normal Operation Chat Room Object Datacenter US  Queues drain quickly, copies stay in sync Queued Grains

Datacenter Europe Failure Scenario : (1) Storage Offline Chat Room Object Datacenter US  Queues accumulate unconfirmed updates  Users can see only messages from same datacenter Queued Grains

Datacenter Europe Failure Scenario : (2) Storage Is Back Chat Room Object Datacenter US  Queues drain, messages get ordered consistently, normal operation resumes Queued Grains

Beyond geo-distribution...  Queued Grain helps to mitigate slow storage even in single cluster  Queued Grain is a good match for the event-sourcing pattern Queued Grains

Part 4: 36 Replication Providers

Under the Hood: Global State = Illusion  Many possible configurations, to support variable degrees of  persistence  fault tolerance  reactivity  Same programming API for all. 37 Shared Storage Storage Replica Storage Replica in RAM Replica in RAM Replica in RAM Replica in RAM Replica in RAM Replica in RAM Replica in RAM Replica in RAM Replica in RAM Replica in RAM Replica in RAM Replica in RAM

Status & Next Steps  All code, including samples and benchmarks, is currently in a internal VSO repository  Estimate 1-3 months to pull all pieces into GitHub and have them reviewed by community, in order of dependencies  Documents available by Monday:  Research paper draft  Documentation for developers 38 Multi-Cluster Configuration Global Single-Instance Grains Queued Grains Replication Providers

39

BACKUP SLIDES 40

Orleans Today  Programmer defines grains (virtual actors)  Configuration defines cluster of silos  Runtime manages grains (directory, load, communication, silo failures) 41 Cluster Grains Silo

Single-Activation Design  A request to activate grain G in cluster C runs a protocol to get agreement from all other clusters that C owns G.  First, optimistically activate the grain  Most accesses of G are in C, so a race where C is concurrently activating G is rare.  Allow duplicate activations when there’s a partition  Prioritize availability over consistency  Anti-entropy protocol to resolve duplicate activations  One activation is killed. Its application-specific logic in the Deactivate function merges state, as appropriate 42

Cross-Cluster Single-Activation Protocol  Cluster C receives a local activation request for grain G. If G is referenced in C’s grain directory, then return the reference.  Else activate G in C and tell every other cluster C that C requests ownership.  C replies “I own G” or it stores “C might own G” and replies “I don’t own G”  If all C reply “I don’t own G”, then C stores “I own G” and lazily replies “I own G” to other clusters, which change “C might own G” to “C owns G”  If C replies “I own G”, then a reconciliation oracle decides who wins.  The losing cluster’s Deactivate function merges or discards its state, as appropriate  Else, C’s ownership is in doubt until all clusters reply. 43

 For updates (mutating methods), choose  I want to wait for the update to globally propagate, or  just do it in the background  For reads (non-mutating methods), choose  I need the latest state, or  an approximate state is o.k. 44 Replicated Grain Programmer chooses availability/consistency.

Chat Room Example User1: Append(m1), Append(m2) User2: Append(m3), Append(m4) The global sequence is any shuffle of the user-local sequences. Possible global orders: S1= A(m1) A(m2) A(m3) A(m4) S2= A(m1) A(m3) A(m2) A(m4) S3= A(m1) A(m3) A(m4) A(m2) S4= A(m3) A(m4) A(m1) A(m2) S5= A(m3) A(m1) A(m4) A(m2) S6= A(m3) A(m1) A(m2) A(m4) 45

Chat Room Example User1: Append(m1), Append(m2) User2: Append(m3), Append(m4) Gossiping Protocol  U1 executes A(m1)  U1 sends A(m1) to U2  U1 executes A(m2)  U2 executes A(m3) and sends [A(m1), A(m3)] to U1  U1 must merge [A(m1), A(m2)] and [A(m1), A(m3)] 46 m1 = a question m2 = follow-on question m3 = explanation that covers both questions

Chat Room Example User1: Append(m1), Append(m2) User2: Append(m3), Append(m4) Local/Global Protocol  U1 executes A(m1)  U1 sends A(m1) to Global  Global returns [A(m1)] to U1  U1 executes A(m2)  U1 sends A(m2) to Global  U2 executes A(m3)  U2 sends A(m3) to Global  Global returns [A(m1), A(m2), A(m3)] to U1 47 m1 = a question m2 = follow-on question m3 = explanation that covers both questions

Example: Chat Application  Users connect to a chat room  Users enter chat messages  Chat room displays messages in order  The chat room object is instantiated in two datacenters.  Chat room state = set of connected users and sequence of recent messages  Each user interacts with its datacenter-local chat room object  Chat room program updates state using updateLocal  Updates propagate to storage in the background  No app code to handle failures or recoveries of storage or data centers