Download presentation
Presentation is loading. Please wait.
Published byChristine Newman Modified over 9 years ago
1
NSDI (April 24, 2009) © 2009 Andreas Haeberlen, MPI-SWS 1 NetReview: Detecting when interdomain routing goes wrong Andreas Haeberlen MPI-SWS / Rice Ioannis Avramopoulos Deutsche Telekom Laboratories Peter Druschel MPI-SWS Jennifer Rexford Princeton
2
NSDI (April 24, 2009) 2 © 2009 Andreas Haeberlen, MPI-SWS Motivation This is just the tip of the iceberg A considerable fraction of Internet prefixes is affected by routing problems every day YouTube outage underscores big Internet problem BGP (Border Gateway Protocol) routing
3
NSDI (April 24, 2009) 3 © 2009 Andreas Haeberlen, MPI-SWS A A A A BGP and its problems ASes exchange routing information via BGP BGP routing suffers many problems: Misconfigurations L L M M I I J J N N E E K K G G C C B B D D F F H H I know how to get to AS A Autonomous systems (ASes) hijacks, oscillation, equipment failures, policy conflicts,..., bugs, attacks by spammers, instabilities,
4
NSDI (April 24, 2009) 4 © 2009 Andreas Haeberlen, MPI-SWS Related Work Fault prevention Secure routing protocols: S-BGP, soBGP, SPV,... Trusted monitors: N-BGP The drawbacks of “Fault prevention system”: is effective against many problems, but not enough. Need significant buy-in and require Internet-wide Public key infrastructure (PKI).
5
NSDI (April 24, 2009) Basic ideas of this paper If we cannot prevent every routing problem, why not at least ensure that each problem is detected and linked to the ISP that caused it? 5 © 2009 Andreas Haeberlen, MPI-SWS
6
NSDI (April 24, 2009) 6 © 2009 Andreas Haeberlen, MPI-SWS Approach: Fault detection Goals: 1. Reliably detect each routing problem, and 2. link it to the AS that caused it Benefits: ASes can respond to problems quickly No need to diagnose faults manually Works for a very broad class of problems Provides an incentive for reliable routing Easy to deploy incrementally
7
NSDI (April 24, 2009) 7 © 2009 Andreas Haeberlen, MPI-SWS Challenges in BGP fault detection Idea: Upload all router logs to a central entity, who inspects them for problems Sufficient to find almost any routing problem Why wouldn't this work in practice? Privacy: Logs contain sensitive information Reliability: Logs may be inaccurate (bugs, hackers) Automation: Can't manually inspect that much data Deployability: Can't assume global deployment Decentralization: ASes wouldn't accept a single detector entity
8
NSDI (April 24, 2009) 8 © 2009 Andreas Haeberlen, MPI-SWS NetReview from 10,000 feet Border routers maintain logs of all BGP messages Logs are tamper-evident can reliably detect & obtain proof if faulty routers omit, forge, or modify log entries Neighbors periodically audit each other's logs and check them for routing problems If a problem is found, auditor can prove its existence to a third party A C D E F B Logs of BGP messages
9
NSDI (April 24, 2009) 9 © 2009 Andreas Haeberlen, MPI-SWS Outline Introduction Motivation: Internet routing problems Approach: Fault detection What is a BGP fault? The NetReview system Practical challenges Evaluation Summary
10
NSDI (April 24, 2009) 10 © 2009 Andreas Haeberlen, MPI-SWS BGP routing policies How do ASes decide what to announce via BGP? Each AS has a routing policy, which is based on: Peering agreements: Customer/provider,... Best practices: Limited path length,... Internal goals: Choose the shortest/cheapest path,... Address assignments: IP address prefixes,... A C D E F B G C's provider
11
NSDI (April 24, 2009) 11 © 2009 Andreas Haeberlen, MPI-SWS What is a BGP fault? Expected behavior of the AS := Combination of its peering agreements, best practices, internal goals,... BGP fault := The BGP messages sent by the AS do not conform to its expected behavior How do we know what BGP messages the AS sent? Need a complete+accurate message trace even if some routers are faulty in arbitrary, unknown ways Requires a robust+secure tracing mechanism How do we know what its expected behavior is? Different for every AS need a specification
12
NSDI (April 24, 2009) 12 © 2009 Andreas Haeberlen, MPI-SWS BGP rules For example, D might specify the following: "I will filter out routes with excessive paths" (best practice) "I will act as C's provider" (peering agreement) "I will prefer routes through B, if available" (internal goals) Some rules may be confidential, but the AS need not reveal all of them to each auditor A C D E F B G "Rules"
13
NSDI (April 24, 2009) 13 © 2009 Andreas Haeberlen, MPI-SWS Outline Introduction What is a BGP fault? The NetReview system Practical challenges Evaluation Summary
14
NSDI (April 24, 2009) 14 © 2009 Andreas Haeberlen, MPI-SWS The tamper-evident log A B Hash chain Send(X) Recv(Y) Send(Z) Recv(M) H0H0 H1H1 H2H2 H3H3 H4H4 B's log Message ACK Based on the tamper-evident log in PeerReview [SOSP'07] If router omits, modifies, or forges entries, neighbors can detect this and obtain evidence Log entries form a hash chain Messages include signed hash Tampering breaks the hash chain and is thus detectable Messages are acknowledged Detects if message is ignored Neighbors gossip about the hash values they've seen Hash(log)
15
NSDI (April 24, 2009) 15 © 2009 Andreas Haeberlen, MPI-SWS Writing rules Rules are predicates on the AS's routing state Declarative; easy to get correct Even simple rules can be very powerful Describes everything that S-BGP can check, and more! D AS 1 AS 2 ownPrefixes D AS 1 AS 2 D D D D D AS 3
16
NSDI (April 24, 2009) 16 © 2009 Andreas Haeberlen, MPI-SWS Auditing and rule evaluation To audit a neighboring AS: 1. Auditor requests the logs from each border router 2. Auditor checks logs for inconsistencies and tampering 3. Auditor locally replays the logs series of routing states 4. Auditor evaluates the rules over each routing state 5. If a rule is violated during some time interval, auditor extracts verifiable evidence from the logs D E Auditor Time Routing state Rule violated in this interval
17
NSDI (April 24, 2009) 17 © 2009 Andreas Haeberlen, MPI-SWS Outline Introduction What is a BGP fault? The NetReview system Practical challenges Incentives for incremental deployment Partial deployment Working without a certificate authority Using existing routers Evaluation Summary
18
NSDI (April 24, 2009) 18 © 2009 Andreas Haeberlen, MPI-SWS Incremental deployment What is the smallest useful deployment? One AS can find bugs, misconfigurations,... Two adjacent ASes can check peering agreements,... What are the incentives for deployment? Reliable ASes can attract more customers Logs can be used for root-cause analysis ?
19
NSDI (April 24, 2009) 19 © 2009 Andreas Haeberlen, MPI-SWS Outline Introduction What is a BGP fault? The NetReview system Practical challenges Evaluation Summary
20
NSDI (April 24, 2009) 20 © 2009 Andreas Haeberlen, MPI-SWS Experimental setup Synthetic network of 35 Zebra BGP daemons Default routing policies (Gao-Rexford) Injected real BGP trace (Equinix) to get scale Results in this talk are from AS 5 (92% of Internet ASes have degree five or less) Tier 1 Tier 2 Stubs Internet AS 2 AS 1 AS 3 AS 4 AS 5 AS 6 AS 7 AS 8 AS 9 AS 10
21
NSDI (April 24, 2009) 21 © 2009 Andreas Haeberlen, MPI-SWS Evaluation: Functionality check Fault injection experiment with five rules based on common routing problems: No origin misconfiguration Export customer routes Honor NO_ADVERTISE community Consistent path length Backup link NetReview detected all the injected faults Also produced diagnostic information, such as time when the fault occurred, and prefixes that were affected
22
NSDI (April 24, 2009) 22 © 2009 Andreas Haeberlen, MPI-SWS Evaluation: Overhead Processing power: 15-minute log segment can be checked in 41.5s on a P4 A single commodity PC is sufficient for small networks Storage space: 710kB/minute, ≈356 GB/year Fits comfortably on a single hard disk Bandwidth: 420kbps, including BGP updates Insignificant compared to typical traffic volume
23
NSDI (April 24, 2009) 23 © 2009 Andreas Haeberlen, MPI-SWS Summary NetReview: A fault detection system for interdomain routing Automatically detects a wide variety of routing problems Links each problem to the responsible AS Not a heuristic - produces proof of each fault NetReview is practical Easy to deploy incrementally No PKI required Reasonable overhead Thank you!
24
NSDI (April 24, 2009) Discussions Some assumptions in the NetReview: Each AS has at least one diligent neighbor. Does it hold? Each AS can eventually send control messages to any other AS. If the services are in real-time such as video streams, does it work? No attacker can invert the hash function or break cryptographic keys. Is it always true? If the number of rules and neighbors is much greater than five rules and five neighbors, can the processing time be tolerated for us? Can we afford so huge the storage space ? 24 © 2009 Andreas Haeberlen, MPI-SWS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.