Yang Zhang, Eman Ramadan, Hesham Mekky, Zhi-Li Zhang

Slides:



Advertisements
Similar presentations
Split Brain Detection Version 00 Nigel Bragg September 4 th,
Advertisements

Networks & Communications
SDN Controller Challenges
Resilience Issues in Information Centric Networks Ning Wang University of Surrey.
Failure Recovery of Overlay Tree-based Structures Ing. Vladimír Dynda Doc. RNDr. Ing. Petr Zemánek, CSc. (supervisor) Czech Technical University in Prague.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Small-world Overlay P2P Network
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
ZIGZAG A Peer-to-Peer Architecture for Media Streaming By Duc A. Tran, Kien A. Hua and Tai T. Do Appear on “Journal On Selected Areas in Communications,
CS 582 / CMPE 481 Distributed Systems
Scalable Application Layer Multicast Suman Banerjee Bobby Bhattacharjee Christopher Kommareddy ACM SIGCOMM Computer Communication Review, Proceedings of.
Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Handout # 4: Scaling Controllers in SDN - HyperFlow
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
Module 13: Network Load Balancing Fundamentals. Server Availability and Scalability Overview Windows Network Load Balancing Configuring Windows Network.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 6 Synchronization.
EITnotes.com For more notes and topics visit:
SPREAD TOOLKIT High performance messaging middleware Presented by Sayantam Dey Vipin Mehta.
Spring 2006Computer Networks1 Chapter 2 Network Models.
An Introduction to Consensus with Raft
Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
CSci8211: Distributed Systems: Raft Consensus Algorithm 1 Distributed Systems : Raft Consensus Alg. Developed by Stanford Platform Lab  Motivated by the.
The Raft Consensus Algorithm Diego Ongaro and John Ousterhout Stanford University.
Clustering in OpenDaylight
Logical Architecture and UML Package Diagrams. The logical architecture is the large-scale organization of the software classes into packages, subsystems,
The Network Aware IoT Service at Edge Guoxi Wang.
William Stallings Data and Computer Communications
SDN and Security Security as a service in the cloud
Chapter 3: Packet Switching (overview)
On-Site PBX Vs Hosted PBX.
Coordination and Agreement
CCNA 3 Chapter 10 Virtual Trunking Protocol
The consensus problem in distributed systems
Integrating HA Legacy Products into OpenSAF based system
Chapter 4 Introduction to Network Layer
Author: Ragalatha P, Manoj Challa, Sundeep Kumar. K
Cluster Communications
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
CHAPTER 3 Architectures for Distributed Systems
COS 418: Distributed Systems
SWITCHING Switched Network Circuit-Switched Network Datagram Networks
Multipath Routing Using Distributed Proxy Servers
SCTP: Stream Control Transport Protocol
Chapter 4 Introduction to Network Layer
Group 6-SDN Based Prioritized Information Dissemination
Commit Protocols CS60002: Distributed Systems
Outline Announcements Fault Tolerance.
2018/12/10 Energy Efficient SDN Commodity Switch based Practical Flow Forwarding Method Author: Amer AlGhadhban and Basem Shihada Publisher: 2016 IEEE/IFIP.
GENI Summer Camp Project Resilient Networks with DAG
CS 4594 Broadband PNNI Signaling.
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Replication and Availability in Distributed Systems
EEC 688/788 Secure and Dependable Computing
Distributed Computing:
Our Final Consensus Protocol: RAFT
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Prof. Leonardo Mostarda University of Camerino
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Overview of Networking
Virtual LAN (VLAN).
Realizing a Peer-to-Peer System using a common API
Presentation transcript:

When Raft Meets SDN: How to Elect a Leader and Reach Consensus in an Unruly Network Yang Zhang, Eman Ramadan, Hesham Mekky, Zhi-Li Zhang University of Minnesota

Introduction Consensus Algorithm

Network Operating Systems Introduction Software Defined Network Consensus Algorithm Application Layer Applications API API API Network Operating Systems Control Plane Control to Data Plane Interface Data Plane Network Device Network Device Network Device Network Device Consensus algorithm is essential for SDN distributed control plane

Problems in SDN Distributed Control Plane Cyclic dependencies control network connectivity consensus algorithm control logic managing the network SDN control plane setup In consensus algorithm server failure has been studied for decades full mesh connectivity is assumed what if network fails? new failure scenarios arise in SDN

RAFT: a representative consensus algorithm At any given time, each server is either: – Leader: handles all client interactions, log replication, etc. – Follower: completely passive – Candidate: used to elect a new leader • Normal operation: 1 leader, N-1 followers Follower Candidate Leader

RAFT Leader Election times out, new election times out, receives votes start election receives votes from majority start Follower Candidate Leader or a higher term higher term Vote criteria: 1) highest term, 2) latest log *Term is defined as virtual time period in Raft

RAFT MEETS SDN R1 R2 R5 R3 R4 Control Cluster under Normal Operations.

RAFT MEETS SDN Oscillating Leadership. Control Cluster under Normal Operations. Condition. Up-to-date servers have a quorum, but they cannot communicate with each other.

RAFT MEETS SDN No Leader Exists. Condition. Some servers have a quorum, but they have obsolete logs, and servers having up-to-date logs, do not have a quorum.

POSSIBLE SOLUTIONS Solution Expectation Gossiping (overlay network) all-to-all connectivity among cluster members as long as the network is not partitioned. Gossiping (overlay network) Pros: easy to implement Cons: no guarantee to work in all scenarios; heavy overhead Routing via Preorders Pros: built-in resiliency in control plane; no modification to consensus algorithm Cons: requires path calculation ahead of time

Routing via Preorder: Failure Handling Failure Handling Process: Upon failures, use alternative outgoing links if exist Group table is used for implementing all possible alternative paths d = G E F C D B s = A proved It guarantees a network where two nodes are always reachable as long as there is no partition. 

PRELIMINARY RESULTS Experiment Setup Raft Implementation: Raft C++ implementation in LogCabin Six Docker containers: 5 servers and 1 client Five Software switches: Open vSwitch Simulating the two failure scenarios: Oscillating Leadership No Existing Leader

PRELIMINARY RESULTS Raft: leadership keeps oscillating among servers (unstable). Raft: no viable leader (liveness lost). PrOG: leadership is stable. Vanilla Raft is not stable under failure scenarios, while PrOG-assisted Raft is stable.

PRELIMINARY RESULTS Client suffers much more failed attempts for accessing cluster leader in vanilla Raft. Latency of a request operation increases under failure scenarios

Summary SDN controller liveness depends on all-to-all message delivery between cluster servers Raft is used to illustrate the problem induced by interdependency in the design of SDN distributed control plane Possible solutions are discussed to circumvent interdependency issues. Preliminary results show the effectiveness of PrOG in improving the availability of leadership in Raft used by critical applications like ONOS.