Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D.

Similar presentations


Presentation on theme: "Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D."— Presentation transcript:

1 Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D. Nguyen and Bin Zhang Dept. of Computer Science, Rutgers University http://www.panic-lab.rutgers.edu

2 Talk Outline  Motivation  Design  Implementation  Benchmarks  Case Studies  Related Work  Future Work

3 Motivation  Ubiquitous network access  exponential growth in network services  Availability is one key challenge  Networked systems are comprised of large numbers of heterogeneous components  Faults are not uncommon  Complex interaction between components  Examples of costly failures: Ebay, Brittanica  Currently difficult to assess service availability  How to analyze impact of failures?  How to set up an appropriate test-bed?

4 Mendosus  Goal: provide infrastructure for service designers to assess the availability of network services  Overview:  Provide flexible infrastructure to accurately model a variety of different networking systems from the application’s point-of-view  Run application in real-time and inject faults to assess application’s behavior  Two key components:  Real-time emulation of a variety of interconnects  General fault injection infrastructure

5 Vision  Map available resources to emulated network

6 Design

7 Mendosus Architecture Applications Kernel Latency Routing Fault Inclusion Mendosus daemon Central Controller Network State User Level Fast & Reliable SAN Emulator Module Events

8 Design Decisions  Central controller  Advantage: consistent network and fault information  Disadvantage: limits scalability  Not involved in network emulation so should still scale well to targeted system sizes (thousands or tens of thousands of components)  Entire network state is maintained at each end node  Advantage: performance  Disadvantage: limits scalability  Only maintain state for LAN  Emulation module embedded within kernel  Advantage: no modifications to application code  Disadvantage: more difficult to modify and extend

9 Functional Components  Topology Maintenance  Fault Injection  Emulation

10 Topology Maintenance  Specification - simple ns-2 like topology scripts  Specify available resources  Central controller manages topology  Initializes original topology on each node  Consistent view  Real time topology changes  Specified as scripted events  Controller monitors network connectivity  Detects partitions

11 Fault Injection  Every n/w component can have a fault profile  Switches, hubs, NICs, links, end nodes  Fault specification:  trace files or theoretical distributions  Exponential, Weibull, constant  Simulate fail-stop components  MTTR - constant or follow a distribution  E.g. unplugging, port shutdown

12 Emulation  Completely distributed  Every node has enough network state  Emulation Messaging sequence  Application initiates communication  Routing – determine route  Fault Inclusion – effect of injected faults  Latency – corresponding to route taken  We do not implement the innards of network components  Switching

13 Implementation

14 Ethernet LAN Emulation  Routing  Emulate computation of Ethernet spanning tree  Controller chooses root of tree  Emulator on each node computes identical spanning tree  Reconfiguration performed periodically (every 2 secs)  Broadcast & Multicast  Emulate using sequence of unicast

15 Ethernet LAN Emulation - Faults  Network partitions  Controller monitors connectivity  Multiple roots - one for each partition  NIC fail-over  Multiple interfaces using IP aliasing support in Linux

16 Emulation completeness… Yes P-to-P Software (multiple unicast) HardwareBroadcast Not implementedSome advanced switches Layer 3, 4 services E.g.VLAN, IGMP Software (Broadcast w/ filters) HardwareMulticast Emulated Ethernet EthernetFeature

17 Micro-benchmarks

18 Emulation Limits 53.479.61 Emulator 54.879.18 130.066.00Gigabit Ethernet 88.911.81Fast Ethernet RTT usecThroughput MB/sec No. of Switches in Topology Network

19 Software Broadcast Scaling

20 Fault View Convergence

21 Case Studies

22 Group Membership  Test protocol behavior under faults  subtle interactions in distributed protocols  Three Round Membership algorithm  Robust against multiple node failures, packet drops and network partitions  Two modes of operation: normal and FCM

23 Membership Observations A C BD 5. Link L up 4. Packet drops at A 3. NIC at B recovers 2. Link L down 1. NIC failure at B 12345 L

24 Multi-Level Switched Network  Large enterprise LANs have multiple layers of network components  Access, core and aggregation switches  How to evaluate availability vs. cost vs. complexity?  Study service availability with increased redundancy  Faults following exponential distributions

25 Enterprise LAN

26 Availability Vs Redundancy

27 Related Work  Network Emulation  Distributed emulation  Emulab [Utah], DelayLine  Centralized emulation  NISTNET, Lancaster emulator  Fault injection  Script-based probing and fault injection  Orchestra, DOCTOR  Co-related faults  Loki [UIUC]  Simulation  NS-2, REAL[Cornell], SSFNet, x-sim[Arizona]

28 Future Work  Extend Mendosus to emulate other networks  WAN: Build in performance dynamics model  Wireless LAN - Realistic fault and performance models  Support pluggable modules within network components which add functionality and additional failures !  Intelligent Routing protocols (E.g. HSRP)  Dynamic DNS, RR DNS

29 Summary  Test-bed for service designers to systematically analyze network and protocol design against failures  Results show that real-time emulation is feasible given capability of current SAN networks  Demonstrated the flexibility and usefulness of Mendosus through 2 case studies  Another step towards building highly available services…


Download ppt "Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D."

Similar presentations


Ads by Google