Download presentation
Presentation is loading. Please wait.
Published byPolly Cummings Modified over 9 years ago
1
Proactive Network Configuration Validation with Batfish
Meg Walraed-Sullivan Ratul Mahajan Jitu Padhye I’m going to be discussing Batfish, a tool we developed to find bugs in network configurations offline. Currently available, open source Has been used to find bugs in real networks Purpose: get feedback on improving technology This is joint work.. Ari Fogel Todd Millstein Luis Pedrosa Ramesh Govindan
2
Misconfigurations are common
are expensive Last year bug caused outage for all time warner customers for multiple hours These misconfigurations are common and expensive
3
ospf interface int3_1 metric 1 ospf redistribute static metric 10
Configuration is Hard Low-Level Directives interface-level metrics protocol metrics per-network policy Multiple Protocols: BGP IS-IS OSPF Protocol Interactions: Route Redistribution Protocol Preference Re-advertisement ospf interface int3_1 metric 1 ospf redistribute static metric 10 bgp neighbor p1 AS P Accept ALL static route /24 drop, log Why do these misconfigurations happen? configurations contain low-level directives that we hope map to high-level intent …per-peer policy Correctly specify parameters for multiple protocols …static, connected Interact in complicated ways Diverse configuration languages in typical enterprise network Arista Cisco IOS Cisco NX-OS Juniper Quanta
4
Example Customer 10.0.0.0/24 n10 C c1 n2 Provider c2 P n1 n3 N p1 n4
/24 should be: Reachable from C Unreachable from P, n4 Customer /24 n10 C c1 n2 Provider c2 Example is idealized version of one of the networks we analyzed. /24 should be: Reachable from C, n1, n2 Unreachable from P, n3, n4 P n1 n3 N p1 n4
5
4 ospf redistribute connected metric 10
/24 should be: Reachable from C Unreachable from P, n4 Customer /24 n10 C c1 n2 Provider c2 P n1 n3 N p1 n4 3 interface int2_10 ip /24 4 ospf redistribute connected metric 10 4 static route /24 drop 5 ospf redistribute static metric 10 // Configuration of n 1 ospf interface int2_1 metric 1 2 ospf interface int2_3 metric 1 3 interface int2_10 ip /24 4 ospf redistribute connected metric 10 5 prefix-list PL_C /24 6 bgp neighbor c1 AS C apply PL_C out // Configuration of n 1 ospf interface int3_1 metric 1 2 ospf interface int3_2 metric 1 3 ospf interface int3_4 metric 1 4 static route /24 drop 5 ospf redistribute static metric 10 6 bgp neighbor p1 AS P Accept ALL - Simple intent with relatively simple configuration - Bug is hard-to-find until deployment Explain highlighted lines Traffic from c2 to /24 intermittently dropped Bug is violation of what we call multipath consistency Bug is based on real bug we found Buried in configs thousands of lines big Each line on its own looks reasonable. Need deep automated checker that understands forwarding state
6
Batfish Offline configuration safety checker
Available at Has found real bugs in real networks 4 stages: Configuration processing Configuration analysis Forwarding table generation Forwarding table analysis Enter batfish Offline Don’t need to run equipment Enables checking current network, proposed changes
7
Stage 1: Process router configurations
// Configuration of n 1 ospf interface int3_1 metric 1 2 ospf interface int3_2 metric 1 3 ospf interface int3_4 metric 1 4 static route /24 drop 5 ospf redistribute static metric 10 6 bgp neighbor p1 AS P Accept ALL n1 n2 n3 n4 N Our declarative model provides a set of relations, here are a couple Some simple relations come directly from configuration Fact about OSPF interface costs OspfCost( node:n3, interface:int3_1, cost:1). Fact about topology LanNeighbors( node1:n3 interface1:int3_1, node2:n1, interface2:int1_3).
8
Stage 2: Analyze configurations
// Parsing No parsing errors // Basic checks Undefined reference to route-map ‘loch_ness_policy’ // Custom checks // No IP reuse IP ‘ ’ assigned to both rtr1:int5 and rtr3:int6 // All loopback networks exported into OSPF rtr5:loopback0 neither active nor passive for any OSPF process Can ask custom questions using our query language
9
Stage 3: Compute forwarding tables
OspfExport( node=n2, network= /24, cost=10, type=ospfE2). InstalledRoute(route={ node=n1, network= /24, nextHop=n2 administrativeCost=110, protocolCost=10, protocol=ospfE2}). Fib( node=n1, network= /24, egressInterface=int1_2). TODO: Mention somehow that we understand intricacies of protocols TODO: merge installedroute and fib into table, remove ospfexport WAIT TO REVEAL - In addition to basic facts we also have derived relations - The data plane generator contains rules for deriving these relations - This stage is key contribution REVEAL Ospf computation e.g. is modeled in part by OspfExport
10
Stage 4a: Identify forwarding violations
Counterexample of multipath consistency { IngressNode=n1, SrcIp= , DstIp= , IpProtocol=0 } Now that you have data plane you can check any forwarding property using your favorite dp analyzer There are multiple data planes if you want to check properties expressible as difference between data planes
11
Stage 4b: Explain forwarding violations
Counterexample packet traces ViolationTraceRoute( flow={ node=n1, … ,dstIp= }, 1st hop:[ n1:int1_2 -> n2:int2_1 ] 2nd hop:[ n2:int2_10 -> n10:int10_2 ] fate=accepted). 1st hop:[ n1:int1_3 -> n3:int3_1 ] fate=nullRouted by n3). - Traces are automatically produced
12
New Consistency Properties
Multipath – disposition consistent on all paths /24 n10 n2 Recall that since we produced data plane, you can ask any question expressible as a forwarding invariant. Part of our contribution is the introduction of three novel invariants The first is multipath n1 n3
13
New Consistency Properties
Multipath – disposition consistent on all paths Differential reachability – reachability unaffected by change /24 C n2 c1 Reachability matrix should be unchanged after failure N1 accepts both N2 accepts only one After failure, /24 is no longer reachable c2 n3 n1 N /24 /24
14
New Consistency Properties
Multipath – disposition consistent on all paths Differential reachability – reachability unaffected by change Destination – at most one customer per delegated address /24 AS length=1 /24 AS length=2 TODO: decide how to change destination consistency mention/name/results Property intended for networks that delegate unique portion of address space to customers - Packets for any given destination address should only be sent to one customer under any network conditions CA CB B ca1 n1 cb1 b1 N
15
Implementation Support multiple configuration languages
IOS, NX-OS, Juniper, Arista, … Broad feature support Route redistribution, OSPF internal/external, BGP communities… Unified, vendor-neutral intermediate representation Languages: IOS, NX-OS, Juniper, Arista Features OSPF: external type-1&2, inter-/intra-area, route-reflection, redistribution, aggregation; community-, as-path matching; route-maps, policy-statements, interface ACLs, route-tagging Sufficient feature coverage to model several large networks In the face of broad feature support, we make analysis tractable by filtering down to unified, vendor-neutral format. As we’ve added more features and built out a sufficient set of primitives, adding new directives often only requires a conversion to unified format.
16
Demo Simplified version of Net1 Cisco configuration files
Multiple seeded bugs
17
Evaluation Two large university networks Net1 – 21 core routers
ISP1 Dept1 ISPm Deptn IGP,VLAN Net1 Core ISP1 Dept1 ISPm Deptn BGP Two large university networks Net1 – 21 core routers Federated network Each department is own AS Heavy use of BGP Net2 – 17 core routers Centrally controlled Heavy use of VLANs Single AS BGP communication only with ISPs Breather, transition
18
Results “P.S. WRT the prefix that was dual assigned from
yesterday, one of my NOC [network operations center] guys stopped by today to ask what voodoo I was using to find such things :)” [emphasis added] – from the head of the Net1 NOC We had some good feedback from the network operators of net1 (Read it)
19
Violations Confirmed By Operators Violations Fixed by Operators
Results Invariant Total Violations Violations Confirmed By Operators Violations Fixed by Operators Net1 Multipath 32 32(4) 21(3) Diff.Reach. 16 3(2) 0(0) Destination 55 55(6) 1(1) Net2 11 11(3) 77 18(7) No destination for net2 because no customer ASes Difference in confirmed violations failure consistency is due primarily to conscious decision not to deploy additional resources Difference in fixed violations is due to various factors: reluctance to disturb working system, need to coordinate with other operators, or need to install new equipment, which cannot be solved with configuration changes Failure and destination cannot be checked with dp analysis
20
Selected Violations (Multipath) Black-hole route cost too low (equal)
(Diff.Reach.) Only one interface underlying VLAN (Destination) Prefix assigned to multiple deptartments
21
Send feedback/questions to: arifogel@ucla.edu
Conclusion Take survey so we can support your network features and requirements in forthcoming versions: Send feedback/questions to:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.