Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.

Similar presentations


Presentation on theme: "A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft."— Presentation transcript:

1 A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft

2 Complex Infrastructure Variety of vendors/models/time 1 Number of2010 Data CenterA few Network Device 1,000s Network Capacity 10s of Tbps Number of20102014 Data CenterA few10s Network Device 1,000s10s of 1,000s Network Capacity 10s of TbpsPbps Microsoft Azure

3 Management Applications 2 Traffic Engineering Load Balancing Link Corruption Mitigation Device Firmware Upgrade ……

4 Our Question How to safely run multiple management applications on shared infrastructure 3

5 It does not work due to 2 problems Naïve Solution Run independently 4 Traffic Engineering Link Corruption Mitigation Firmware Upgrade Network Devices

6 Agg A ToRs Agg B Core12 Problem #1: Conflict 5 Link-corruption- mitigation adjusts traffic away from Core1 TE tunes traffic among links to Core1, 2

7 Agg A ToRs Agg B Core12 Problem #2: Safety Violation 6 Link-corruption- mitigation shuts down faulty Agg A Firmware-upgrade schedules Agg B to upgrade

8 Potential Solution #1 One monolithic application Central control of all actions 7 Traffic Engineering Firmware Upgrade Link Corruption Mitigation

9 Too Complex to Build Difficult to develop Combine all applications that are already individually complicated High maintenance cost for such huge software in practice 8

10 Potential Solution #2 Explicit coordination among applications Consensus over network changes 9 Traffic Engineering Firmware Upgrade Link Corruption Mitigation

11 Still Too Complex Hard to understand each other Diverse network interactions 10 ApplicationRouting Device Config Traffic Engineering Firmware upgrade

12 Main Enemy: Complexity Application development Application coordination 11 Monolithic Indepen- dent Explicitly coordinate SimpleComplex

13 What We Advocate Loose coupling of applications Design principle: Simplicity with safety guarantees Forgo joint optimization Worthwhile tradeoff for simplicity Applications could do it out-of-band 12

14 Overview of Statesman Network operating system for safe multi-application operation Uses network state abstraction Three views of network state Dependency model of states 13

15 The “State” in Statesman Complexity of dealing with devices Heterogeneity Device-specific commands 14 Network Devices Network State

16 State Variable Examples State VariableValue Device Power StatusUp, down Device FirmwareVersion number Device SDN Agent BootUp, down Device Routing StateRouting rules Link Admin StatusUp, down Link Control PlaneBGP, OpenFlow, … 15

17 Simplify Device Interaction Past Now 16 SNMP, OF, vendor API, … ReadWrite Network Devices Network State Application Device Statistics Application Device- specific cmds

18 Views of Network State 17 Network Devices Observed State Actual state of the whole network Target State Desired state to be updated on the whole network Target State Network State Application

19 Network Devices Two Views Are Not Enough 18 Observed State Target State One More View Proposed State A group of entity-variable-values desired by an application Proposed State Application

20 How Merging Works Combine multiple proposed states into a safe target state Conflict resolution Last-writer-wins Priority-based locking Sufficient for current deployment Safety invariant checking Partial rejection & Skip update 19

21 Choose Safety Invariants Our current choice Connectivity: Every pair of ToRs in one DC is connected Capacity: 99% of ToR pairs have at least 50% capacity 20 Hinder application too frequently TightLoose Cannot protect network operation

22 Recap of Three-View Model Simplify network management 21 Observed State Target State Proposed State What we see from the network What we want the network to be What can be actually done on the network Statesman Application

23 Yet Another Problem What’s in Proposed State Small number of state variables that application cares Implicit conflicts arises Caused by state dependency 22

24 A BC D Implicit Conflict 23 TE writes new value of routing state of B for tunneling traffic Firmware-upgrade writes new value of firmware state of B

25 Dependency Relations 24 PowerState FirmwareVersion ConfigurationState RoutingState AdminState ConfigurationState PathState Device Link bgpdSDN

26 Build in Dependency Model Statesman calculates it internally Only exposes the result for each state variable Whether the variable is controllable 25

27 Statesman System 26 Target State MonitorUpdater Checker Proposed State Observed State Storage Service

28 Deployment Overview Operational in Microsoft Azure for 10 months Cover 10 DCs of 20K devices 27

29 Production Applications 3 diverse applications built Device firmware upgrade Link corruption mitigation Traffic engineering Finish within months Only thousands of lines of code 28

30 Case #1: Resolve Conflict Inter-DC TE & Firmware-upgrade 29 BR 1 BR 2 DC 1 BR 8 BR 7 DC 4 BR 3 BR 4 DC 2 BR 5 DC 3 BR 6 DC = Data Center BR = Border Router

31 30 Firmware-upgrade acquires lock of BR1 TE fails to acquire lock, and moves traffic away BR1 firmware upgrade starts BR1 firmware upgrade ends. Lock released. TE re-acquires lock, and moves traffic back … … … …

32 Case #1 Summary Each application: Simple logic Unaware of the other Statesman enables: Conflict resolution Necessary coordination 31

33 Case #2: Maintain Capacity Invariant Firmware-upgrade & Link-corruption-mitigation 32 … ToR Agg … … … Core … Pod 4 4 1 1 n … Pod 1 4 1 1 n … Pod 10 4 1 1 n 14 Link corrupting packets

34 33 Upgrade proceeds in normal speed in Pod 3 and 5 Upgrade in Pod 4 is slowed down by checker due to lost capacity … … … … …

35 Case #2 Summary Statesman: Automatically adjusts application progresses Keeps the network within safety requirements 34

36 Conclusion Need network operating system for multiple management applications Statesman Loose coupling of applications Network state abstraction Deployed and operational in Azure 35

37 36 Thanks! Questions ? Check paper for related works


Download ppt "A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft."

Similar presentations


Ads by Google