Verifying & Testing SDN Data & Control Planes: Header Space Analysis
SDN Stack State layers hold a representation of the network’s configuration. Code layers implement logic to maintain the mapping between two state layers Firmware Network Hypervisor App State Layers Logical View Physical View Device State Hardware Policy Code Layers Network OS HW
Troubleshooting Workflow
Tools for Finding The Code Layer A: Actual Behavior ~ Policy? Automatic Test Packet Generation, Network Debugger B: Device State ~ Policy? Anteater, Header Space Analysis, VeriFlow C: Physical View ~ Device State? OFRewind D: Device State ~ Hardware? SOFT E: Logical View ~ Physical View? Corresponding Checking
Tools for Localizing within a Code Layer Within the controller (control plane) OFRewind, Retrospective Casual Inference, Synoptic, NICE, VeriCon, … Within the switch firmware (data plane) Network Debugger, Automatic Test Packet Generation, SOFT, … Systematic troubleshooting from design (policy specification & verification), configuration analysis, runtime dynamic analysis, trouble-shooting, … Verification vs. testing vs. debugging vs. trouble- shooting, … Summarize tools and reveal missing tools
Some Existing Approaches to SDN Verification & Testing Specification Policy Language, Semantics Testing Verification (e.g. reachabilty) Synthesis (e.g. forwarding) CTL, NoD, Klee Assert NetKAT, NetCore Data Plane: Anteater, VeriFlow, HSA, Atomic Predicates, First-order + Transitive Closure, local checks Control Plane: VeriCon, BatFish One big switch, VLAN, NetCore, NetKAT, FlowLog NICE, ATPG
SDN: Verification & Testing Data Plane Verification: –given data plane FIBs f, every packet p satisfies designed network spec/property –If not, generate one counterexample or all counterexamples –Or provides optimizations or help adding functionality Data Plane Testing: –detour via Symbolic Testing Control Plane Verification: –given configuration c, for every packet & network environment, verify control plane generates correct data plane states meeting desired network specs/properties Control Plane Testing: Synthesis: –design a Control Plane CP that will ∀ packet p, s (network state) satisfy Φ (policy specs), e.g., reachability, no loop, waypoint traversal ), by construction 7
Verification: Values and Obstacles HardwareSoftwareNetworks ChipsDevices (PC, phone)Service Bugs are:Burned into silicone Exploitable, workarounds Latent, Exposed Dealing with bugs: Costly recallsOnline updatesLive site incidents Obstacles to eradication: Design ComplexityCode churn, legacy, false positives Topology, configuration churn Value proposition Cut time to marketSafety/OS critical systems, Quality of code base Meet SLA, Utilize bandwidth, Enable richer policies
Goal: Guaranteeing network invariants to ensure correctness, safety, etc. Network should always satisfy some invariants VeriCon: Towards Verifying Controller Programs in SDNs 9 Difficult to write an SDN application that always guarantees such invariants
Limitations of Existing Approaches 1.Establish existence, but not absence, of bugs –NICE (finite-state model checking): unexplored topologies may cause bugs to be missed –HSA (check network snapshots): snapshots may not capture situations in which bugs exist 2.Runtime overhead –VeriFlow & NetPlumber (check in real-time): bugs only identified when app is actually running 10
VeriCon Verifies network-wide invariants for any event sequence and all admissible topologies 11 SDN application in Core SDN Topology constraints & invariants in first order logic Guarantee invariants are satisfied Concrete counter- example Verify conditions using the Z3 theorem prover + OR
Example: Stateful Firewall Always forward from trusted to untrusted hosts Only forward from untrusted to trusted hosts if a trusted host previously sent a packet to the untrusted host Trusted Hosts Untrusted Hosts
Core SDN (CSDN) Language Define and initialize relations –Topology: link (S, O, H)link(S 1, I 1, I 2, S 2 ) –Forwarding : S.ft(Src → Dst, I → O) S.sent(Src → Dst, I → O) Write event handlers: pktIn(S, Pkt, I) –Update relation –Install rule (insert into ft ) –Forward packet (insert into sent ) –If-then-else 13
CSDN: Built-In Relations Describing Network States 14
First-Order Formulas & Invariants 15 Three types of invariants: topo: defines admissible topology safety: hold initially & preserve after event executions trans: hold after an event execution Example: no black hole
Example: Learning Switch Controller Code 16
Learning Switch Controller Code: Some Invariants 17
Stateful Firewall in CSDN rel tr(SW, HO) = {} pktIn(s, pkt, prt(1)) → s.forward(pkt, prt(1), prt(2)) tr.insert(s, pkt.dst) s.install(pkt.src → pkt.dst, prt(1), prt(2)) pktIn(s, pkt, prt(2)) → if tr(s, pkt.src) then s.forward(pkt, prt(2), prt(1)) s.install(pkt.src→pkt.dst, prt(2), prt(1))
Invariants Topology: define admissible topologies Safety: define the required consistency of network-wide states Transition: define the effect of executing event handlers 19 assumed to hold initially checked initially & after each event
Topology: At least one switch with two ports, prt(1) & prt(2) ; a packet P is forwarded from an untrusted host U to a trusted host T Safety: For every packet sent from a host U to a host T there exists a packet sent to T’ from U Stateful Firewall Invariants 20
Counterexample I 1 is not inductive—not all executions starting from an arbitrary state satisfy the invariant 21 in out HO:0 prt(3) prt(2) prt(1) prt(0) SW:0 s flow-table HO:0 SrcDstInOut ** pkt.src pkt.dst
Additional Firewall Invariants Flow table entries only contain forwarding rules from trusted hosts Controller relation tr records the correct hosts I 1 ˄ I 2 ˄ I 3 is inductive 22
Non-buggy Verification Examples ProgramLOCsTopo Inv. Safety + Trans Inv. Time (sec) Firewall Stateless Firewall Firewall + Host Migration Learning Switch Learning Switch + Auth Resonance (simplified) Stratos (simplified)
Buggy Verification Examples BenchmarkCounterex Host + Sw Auth: Rules for unauth host not removed3 + 2 Firewall: Forgot part of consistency inv5 + 3 Firewall: No check if host is trusted6 + 4 Firewall: No inv defining trusted host6 + 4 Learning: Packets not forwarded1 + 1 Resonance: No inv for host to have one state StatelessFW: Rule allowing all port 2 traffic4 + 2
CSDN: Abstract Syntax
From Formulas to Theorems & Models Includes a separate utility for inferring inductive invariants using iterated weakest preconditions by Dijkstra’s alg. Automatic theorem proving (verification) by Z3
Further Work Needed Assume events are executed atomically –Enforceable using barriers, with performance hit –Consider out-of-order rule installs Rule timeouts –App handles timeout events to update its ft relation and check invariants –Need to reason about event ordering 27
Summary of VeriCon Verifies network-wide invariants for any event sequence and all admissible topologies Guarantees invariants are satisfied, or provides a concrete counterexample Application with 93 LOC and 13 invariants is verified in 0.21s 28
NDB: Debugging SDNs Bugs can be anywhere in the SDN stack –Hardware, control plane logic, race conditions Switch state might change rapidly Bugs might show up rarely 29 How can we exploit the SDN architecture to systematically track down the root cause of bugs?
30 Bug Story: Incomplete Handover A B Switch X WiFi AP Y WiFi AP Z
ndb : Network Debugger Goal –Capture and reconstruct the sequence of events leading to the errant behavior Allow users to define a Network Breakpoint –A (header, switch) filter to identify the errant behavior Produce a Packet Backtrace –Path taken by the packet –State of the flow table at each switch 31
Debugging software programs Function A(): i = …; j = …; u = B(i, j) Function A(): i = …; j = …; u = B(i, j) Function B(x, y): k = …; v = C(x, k) Function B(x, y): k = …; v = C(x, k) Function C(x, y): … w = abort() Function C(x, y): … w = abort() Breakpoint “line 25, w = abort() ” Backtrace File “A”, line 10, Function A () File “B”, line 43, Function B () File “C”, line 21, Function C () Breakpoint “line 25, w = abort() ” Backtrace File “A”, line 10, Function A () File “B”, line 43, Function B () File “C”, line 21, Function C () 32
Breakpoint “ICMP packets A->B, arriving at X, but not Z” Backtrace Switch X: { inport: p0, outports: [p1] mods: [...] matched flow: 23 [...] matched table version: 3 } Switch Y: { inport p1, outports: [p3] mods: } Breakpoint “ICMP packets A->B, arriving at X, but not Z” Backtrace Switch X: { inport: p0, outports: [p1] mods: [...] matched flow: 23 [...] matched table version: 3 } Switch Y: { inport p1, outports: [p3] mods: } Y X Debugging Networks A B Switch X WiFi AP Y WiFi AP Z
Using ndb to Debug Common Issues Reachability –Symptom: A is not able to talk to B –Breakpoint: “Packet A->B, not reaching B” Isolation –Symptom: A is talking to B, but it shouldn’t –Breakpoint: “Packet A->B, reaching B” Race conditions –Symptom: Flow entries not reaching on time –Breakpoint: “Packet-in at switch S, port P” 34
Control Plane Flow Table State Recorder Match ACT Match ACT Postcard Collector 35 How Does ndb Work?
Postcard Collector Control Plane Flow Table State Recorder … 7. … … 7. … … 7. … … 7. … … 7. … … 7. … … 7. … … 7. … 36
Postcard Collector Control Plane Flow Table State Recorder 37
Who Benefits Network developers –Programmers debugging control programs Network operators –Find policy errors –Send error report to switch vendor –Send error report to control program vendor 38
Performance and Scalability Control channel –Negligible overhead –No postcards –Extra flow-mods Postcards in the datapath –Single collector server for the entire Stanford backbone –Selective postcard generation to reduce overhead –Parallelize postcard collection 39
ndb : Network Breakpoint + Packet Backtrace Systematically track down root cause of bugs Practical and deployable today 40 Summary