Yang Zhang, Eman Ramadan, Hesham Mekky, Zhi-Li Zhang

Slides:

Advertisements

Similar presentations

Split Brain Detection Version 00 Nigel Bragg September 4 th,

Advertisements

Networks & Communications

SDN Controller Challenges

Resilience Issues in Information Centric Networks Ning Wang University of Surrey.

Failure Recovery of Overlay Tree-based Structures Ing. Vladimír Dynda Doc. RNDr. Ing. Petr Zemánek, CSc. (supervisor) Czech Technical University in Prague.

Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)

Small-world Overlay P2P Network

Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

ZIGZAG A Peer-to-Peer Architecture for Media Streaming By Duc A. Tran, Kien A. Hua and Tai T. Do Appear on “Journal On Selected Areas in Communications,

CS 582 / CMPE 481 Distributed Systems

Scalable Application Layer Multicast Suman Banerjee Bobby Bhattacharjee Christopher Kommareddy ACM SIGCOMM Computer Communication Review, Proceedings of.

Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Handout # 4: Scaling Controllers in SDN - HyperFlow

Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.

1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.

Module 13: Network Load Balancing Fundamentals. Server Availability and Scalability Overview Windows Network Load Balancing Configuring Windows Network.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 6 Synchronization.

EITnotes.com For more notes and topics visit:

SPREAD TOOLKIT High performance messaging middleware Presented by Sayantam Dey Vipin Mehta.

Spring 2006Computer Networks1 Chapter 2 Network Models.

An Introduction to Consensus with Raft

Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.

Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.

CSci8211: Distributed Systems: Raft Consensus Algorithm 1 Distributed Systems : Raft Consensus Alg. Developed by Stanford Platform Lab  Motivated by the.

The Raft Consensus Algorithm Diego Ongaro and John Ousterhout Stanford University.

Clustering in OpenDaylight

Logical Architecture and UML Package Diagrams. The logical architecture is the large-scale organization of the software classes into packages, subsystems,

The Network Aware IoT Service at Edge Guoxi Wang.

William Stallings Data and Computer Communications

SDN and Security Security as a service in the cloud

Chapter 3: Packet Switching (overview)

On-Site PBX Vs Hosted PBX.

Coordination and Agreement

CCNA 3 Chapter 10 Virtual Trunking Protocol

The consensus problem in distributed systems

Integrating HA Legacy Products into OpenSAF based system

Chapter 4 Introduction to Network Layer

Author: Ragalatha P, Manoj Challa, Sundeep Kumar. K

Cluster Communications

CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.

CHAPTER 3 Architectures for Distributed Systems

COS 418: Distributed Systems

SWITCHING Switched Network Circuit-Switched Network Datagram Networks

Multipath Routing Using Distributed Proxy Servers

SCTP: Stream Control Transport Protocol

Chapter 4 Introduction to Network Layer

Group 6-SDN Based Prioritized Information Dissemination

Commit Protocols CS60002: Distributed Systems

Outline Announcements Fault Tolerance.

2018/12/10 Energy Efficient SDN Commodity Switch based Practical Flow Forwarding Method Author: Amer AlGhadhban and Basem Shihada Publisher: 2016 IEEE/IFIP.

GENI Summer Camp Project Resilient Networks with DAG

CS 4594 Broadband PNNI Signaling.

Active replication for fault tolerance

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Replication and Availability in Distributed Systems

EEC 688/788 Secure and Dependable Computing

Distributed Computing:

Our Final Consensus Protocol: RAFT

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Prof. Leonardo Mostarda University of Camerino

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Overview of Networking

Virtual LAN (VLAN).

Realizing a Peer-to-Peer System using a common API

Presentation transcript:

When Raft Meets SDN: How to Elect a Leader and Reach Consensus in an Unruly Network Yang Zhang, Eman Ramadan, Hesham Mekky, Zhi-Li Zhang University of Minnesota

Introduction Consensus Algorithm

Network Operating Systems Introduction Software Defined Network Consensus Algorithm Application Layer Applications API API API Network Operating Systems Control Plane Control to Data Plane Interface Data Plane Network Device Network Device Network Device Network Device Consensus algorithm is essential for SDN distributed control plane

Problems in SDN Distributed Control Plane Cyclic dependencies control network connectivity consensus algorithm control logic managing the network SDN control plane setup In consensus algorithm server failure has been studied for decades full mesh connectivity is assumed what if network fails? new failure scenarios arise in SDN

RAFT: a representative consensus algorithm At any given time, each server is either: – Leader: handles all client interactions, log replication, etc. – Follower: completely passive – Candidate: used to elect a new leader • Normal operation: 1 leader, N-1 followers Follower Candidate Leader

RAFT Leader Election times out, new election times out, receives votes start election receives votes from majority start Follower Candidate Leader or a higher term higher term Vote criteria: 1) highest term, 2) latest log *Term is defined as virtual time period in Raft

RAFT MEETS SDN R1 R2 R5 R3 R4 Control Cluster under Normal Operations.

RAFT MEETS SDN Oscillating Leadership. Control Cluster under Normal Operations. Condition. Up-to-date servers have a quorum, but they cannot communicate with each other.

RAFT MEETS SDN No Leader Exists. Condition. Some servers have a quorum, but they have obsolete logs, and servers having up-to-date logs, do not have a quorum.

POSSIBLE SOLUTIONS Solution Expectation Gossiping (overlay network) all-to-all connectivity among cluster members as long as the network is not partitioned. Gossiping (overlay network) Pros: easy to implement Cons: no guarantee to work in all scenarios; heavy overhead Routing via Preorders Pros: built-in resiliency in control plane; no modification to consensus algorithm Cons: requires path calculation ahead of time

Routing via Preorder: Failure Handling Failure Handling Process: Upon failures, use alternative outgoing links if exist Group table is used for implementing all possible alternative paths d = G E F C D B s = A proved It guarantees a network where two nodes are always reachable as long as there is no partition.

PRELIMINARY RESULTS Experiment Setup Raft Implementation: Raft C++ implementation in LogCabin Six Docker containers: 5 servers and 1 client Five Software switches: Open vSwitch Simulating the two failure scenarios: Oscillating Leadership No Existing Leader

PRELIMINARY RESULTS Raft: leadership keeps oscillating among servers (unstable). Raft: no viable leader (liveness lost). PrOG: leadership is stable. Vanilla Raft is not stable under failure scenarios, while PrOG-assisted Raft is stable.

PRELIMINARY RESULTS Client suffers much more failed attempts for accessing cluster leader in vanilla Raft. Latency of a request operation increases under failure scenarios

Summary SDN controller liveness depends on all-to-all message delivery between cluster servers Raft is used to illustrate the problem induced by interdependency in the design of SDN distributed control plane Possible solutions are discussed to circumvent interdependency issues. Preliminary results show the effectiveness of PrOG in improving the availability of leadership in Raft used by critical applications like ONOS.