Download presentation
Presentation is loading. Please wait.
1
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Exterior Gateway Protocols: EGP, BGP-4, CIDR Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu http://www.ecse.rpi.edu/Homepages/shivkuma Based in part upon slides of Tim Griffin (AT&T), Ion Stoica (UCB), J. Kurose (U Mass), Noel Chiappa (MIT)
2
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2 q Cores, Peers, and the limit of default routes q Autonomous systems & EGP q BGP4 q CIDR: reducing router table sizes q Refs: Chap 10,14,15. Books: “Routing in Internet” by Huitema, “Interconnections” by Perlman, “BGP4” by Stewart, Sam Halabi, Danny McPherson, Internet Routing ArchitecturesInternet Routing Architectures q Reading: Geoff Huston, Commentary on Inter-domain Routing in the InternetCommentary on Inter-domain Routing in the Internet q Reference: BGP-4 Standards Document: In TXTIn TXT q Reading: Norton, Internet Service Providers and PeeringInternet Service Providers and Peering q Reading: Labovitz et al, Delayed Internet Routing ConvergenceDelayed Internet Routing Convergence q Reference: Paxson, End-to-End Routing Behavior in the Internet,End-to-End Routing Behavior in the Internet, q Reading: Interdomain Routing: Additional Notes: In PDF | In MS WordIn PDF In MS Word q Reference Site: Griffin, Interdomain Routing LinksInterdomain Routing Links Overview
3
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3 History: Default Routes: limits q Default routes => partial information q Routers/hosts w/ default routes rely on other routers to complete the picture. q In general routing “signposts” should be: q Consistent, I.e., if packet is sent off in one direction then another direction should not be more optimal. q Complete, I.e., should be able to reach all destinations
4
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4 Core q A small set of routers that have consistent & complete information about all destinations. q Outlying routers can have partial information provided they point default routes to the core q Partial info allows site administrators to make local routing changes independently. CORE S1S2Sm...
5
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5 Peer Backbones q Initially NSFNET had only one connection to ARPANET (router in Pittsburg) => only one route between the two. q Addition of multiple interconnections => multiple possible routes => need for dynamic routing q Single core replaced by a network of peer backbones => more scalable q Today there are over 30 backbones! q Routing protocol at cores/peers: GGP -> EGP-> BGP-4
6
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6 Exterior Gateway Protocol (EGP) q A mechanism that allows non-core routers to learn routes from core (external routes) routers so that they can choose optimal backbone routes q A mechanism for non-core routers to inform core routers about hidden networks (internal routes) q Autonomous System (AS) has the responsibility of advertising reachability info to other ASs. q One+ routers may be designated per AS. q Important that reachability info propagates to core routers
7
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7 Purpose of EGP R border router internal router EGP R2R1R3 A AS1 AS2 you can reach net A via me traffic to A table at R1: dest next hop AR2 Share connectivity information across ASes
8
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8 EGP Operation q Neighbor Acquisition: Reliable 2-way handshake q Neighbor Reachability: q Hellos: j out of m hellos OK => Neighbor UP q k out of n hellos NOT OK => Neighbor DOWN q Updates/Queries: q EGP is an incremental protocol. New info => send updates q Each router can query neighbors as well q Reachability advertized; metrics ignored q Requires a tree topology of ASes to avoid loops (eg: see next slide)
9
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9 Why EGP Requires a Tree Structure..
10
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10 EGP weaknesses q EGP does not interpret the distance metrics in routing update messages => cannot be compute shorter of two routes q As a result it restricts the topology to a tree structure, with the core as the root q Rapid growth => many networks may be temporarily unreachable q Only one path to destination => no load sharing q Need new protocol => BGP-4
11
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11 Today’s Big Picture Large ISP Dial-Up ISP Access Network Small ISP Stub Large number of diverse networks
12
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12 Internet AS Map: caida.org
13
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13 Autonomous System(AS) q Internet is not a single network q Collection of networks controlled by different administrations q An autonomous system is a network under a single administrative control q An AS owns an IP prefix q Every AS has a unique AS number q ASes need to inter-network themselves to form a single virtual global network q Need a common protocol for communication
14
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14 Intra-AS and Inter-AS routing inter-AS, intra-AS routing in gateway A.c network layer link layer physical layer a b b a a C A B d Gateways: perform inter-AS routing amongst themselves perform intra-AS routers with other routers in their AS A.c A.a C.b B.a c b c
15
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15 Who speaks Inter-AS routing? R border routerinternal router BGP R2R1R3 AS1 AS2 Two types of routers Border router(Edge), Internal router(Core) Two border routers of different ASes will have a BGP session
16
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16 Intra-AS vs Inter-AS q An AS is a routing domain q Within an AS: q Can run a link-state routing protocol q Trust other routers q Scale of network is relatively small q Between ASes: q Lack of information about other AS’s network (Link- state not possible) q Crossing trust boundaries q Link-state protocol will not scale q Routing protocol based on route propagation
17
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17 Autonomous Systems (ASes) An autonomous system is an autonomous routing domain that has been assigned an Autonomous System Number (ASN). All parts within an AS remain connected. RFC 1930: Guidelines for creation, selection, and registration of an Autonomous System … the administration of an AS appears to other ASes to have a single coherent interior routing plan and presents a consistent picture of what networks are reachable through it.
18
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18 IP Address Allocation and Assignment: Internet Registries IANA www.iana.org RFC 2050 - Internet Registry IP Allocation Guidelines RFC 1918 - Address Allocation for Private Internets RFC 1518 - An Architecture for IP Address Allocation with CIDR ARIN www.arin.org APNIC www.apnic.org RIPE www.ripe.org Allocate to National and local registries and ISPs Addresses assigned to customers by ISPs
19
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19 AS Numbers (ASNs) ASNs are 16 bit values. 64512 through 65535 are “private” Genuity: 1 MIT: 3 Harvard: 11 UC San Diego: 7377 AT&T: 7018, 6341, 5074, … UUNET: 701, 702, 284, 12199, … Sprint: 1239, 1240, 6211, 6242, … … ASNs represent units of routing policy Currently over 11,000 in use.
20
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20 Nontransit vs. Transit ASes ISP 1 ISP 2 Nontransit AS might be a corporate or campus network. Could be a “content provider” NET A Traffic NEVER flows from ISP 1 through NET A to ISP 2 Internet Service providers (ISPs) have transit networks
21
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21 Selective Transit NET B NET C NET A provides transit between NET B and NET C and between NET D and NET C NET A NET D NET A DOES NOT provide transit Between NET D and NET B Most transit ASes allow only selective transit key impact of commercialization
22
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22 Customers and Providers Customer pays provider for access to the Internet provider customer IP traffic provider customer
23
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23 Customer-Provider Hierarchy IP traffic provider customer
24
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24 The Peering Relationship peer customerprovider Peers provide transit between their respective customers Peers do not provide transit between peers Peers (often) do not exchange $$$ traffic allowed traffic NOT allowed
25
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 25 Peering Wars q Reduces upstream transit costs q Can increase end-to-end performance q May be the only way to connect your customers to some part of the Internet (“Tier 1”) q You would rather have customers q Peers are usually your competition q Peering relationships may require periodic renegotiation Peering struggles are by far the most contentious issues in the ISP world! Peering agreements are often confidential. PeerDon’t Peer
26
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 26 Requirements for Inter-AS Routing q Should scale for the size of the global Internet. q Focus on reachability, not optimality q Use address aggregation techniques to minimize core routing table sizes and associated control traffic q At the same time, it should allow flexibility in topological structure (eg: don’t restrict to trees etc) q Allow policy-based routing between autonomous systems q Policy refers to arbitrary preference among a menu of available routes (based upon routes’ attributes) q Fully distributed routing (as opposed to a signaled approach) is the only possibility. q Extensible to meet the demands for newer policies.
27
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27 q Topology information is flooded within the routing domain q Best end-to-end paths are computed locally at each router. q Best end-to-end paths determine next-hops. q Based on minimizing some notion of distance q Works only if policy is shared and uniform q Examples: OSPF, IS-IS q Each router knows little about network topology q Only best next-hops are chosen by each router for each destination network. q Best end-to-end paths result from composition of all next- hop choices q Does not require any notion of distance q Does not require uniform policies at all routers q Examples: RIP, BGP Link StateVectoring Recall: Distributed Routing Techniques
28
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 28 BGP-4 q BGP = Border Gateway Protocol q Is a Policy-Based routing protocol q Is the de facto EGP of today’s global Internet q Relatively simple protocol, but configuration is complex and the entire world can see, and be impacted by, your mistakes. 1989 : BGP-1 [RFC 1105] –Replacement for EGP (1984, RFC 904) 1990 : BGP-2 [RFC 1163] 1991 : BGP-3 [RFC 1267] 1995 : BGP-4 [RFC 1771] –Support for Classless Interdomain Routing (CIDR)
29
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 29 BGP Operations (Simplified) Establish session on TCP port 179 Exchange all active routes Exchange incremental updates AS1 AS2 While connection is ALIVE exchange route UPDATE messages BGP session
30
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 30 Four Types of BGP Messages q Open : Establish a peering session. q Keep Alive : Handshake at regular intervals. q Notification : Shuts down a peering session. q Update : Announcing new routes or withdrawing previously announced routes. announcement = prefix + attributes values
31
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 31 Border Gateway Protocol (BGP) q Allows multiple cores and arbitrary topologies of AS interconnection. q Uses a path-vector concept which enables loop prevention in complex topologies q In AS-level, shortest path may not be preferred for policy, security, cost reasons. q Different routers have different preferences (policy) => as packet goes thru network it will encounter different policies q Bellman-Ford/Dijkstra don’t work! q BGP allows attributes for AS and paths which could include policies (policy-based routing).
32
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 32 BGP (Cont’d) q When a BGP Speaker A advertises a prefix to its B that it has a path to IP prefix C, B can be certain that A is actively using that AS-path to reach that destination q BGP uses TCP between 2 peers (reliability) q Exchange entire BGP table first (50K+ routes!) q Later exchanges only incremental updates q Application (BGP)-level keepalive messages q Hold-down timer (at least 3 sec) locally config q Interior and exterior peers: need to exchange reachability information among interior peers before updating intra- AS forwarding table.
33
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 33 Two Types of BGP Neighbor Relationships External Neighbor (eBGP) in a different Autonomous Systems Internal Neighbor (iBGP) in the same Autonomous System AS1 AS2 eBGP iBGP iBGP is routed (using IGP!)
34
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 34 I-BGP and E-BGP R border router internal router R1 AS1 R4 R5 B AS3 E-BGP R2R3 A AS2 announce B IGP: Interior Gateway Protocol. Examples: IS-IS, OSPF I-BGP IGP
35
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 35 I-BGP q Why is IGP (OSPF, ISIS) not used ? q In large ASs full route table is very large (100K routes!) q Rate of change of routes is frequent q Tremendous amount of control traffic q Not to mention Dijkstra computation being evoked for any change… q BGP policy information may be lost q I-BGP :Within an AS q Same protocol/state machines as EBGP q But different rules about advertising prefixes q Prefix learned from an I-BGP neighbor cannot be advertised to another I-BGP neighbor to avoid looping => need full IBGP mesh ! q AS-PATH cannot be used internally. Why ?
36
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 36 IBGP vs EBGP q I-BGP nodes: typically ABRs, or other nodes where default routes terminate q I-BGP peering sessions between every pair of routers within an AS: full mesh. A B D C Physical link IBGP session AS1
37
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 37 iBGP Peers: Fully Meshed eBGP update iBGP updates iBGP is needed to avoid routing loops within an AS Full Mesh => Independent of physical connectivity. Single link may see same update multiple times! iBGP neighbors do not announce routes received via iBGP to other iBGP neighbors. Is iBGP an IGP? NO! Set of neighbor relationships to transfer BGP info
38
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 38 IBGP Scaling: Route Reflection q Add hierarchy to I-BGP q Route reflector: A router whose BGP implementation supports the re-advertisement of routes between I-BGP neighbors q Route reflector client: A router which depends on route reflector to re-advertise its routes to entire AS and learn routes from the route reflector
39
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 39 Route Reflection RR-C1 RR-C2 RR1 RR2 RR3 RR-C3 RR-C4 AS1 AS2 ER 10.0.0.0/24 128.23.0.0/16 EBGP IBGP
40
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 40 AS Confederations q Divide and conquer: Divides a large AS into sub- ASs AS-1 12 10 14 11 13 R1 R2 Sub-AS
41
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 41 CIDR q Shortage of class Bs => give out a set of class Cs instead of one class B address q Problem: every class C n/w needs a routing entry ! q Solution: Classless Inter-domain Routing (CIDR). q Also called “supernetting” q Key: allocate addresses such that they can be summarized, I.e., contiguously. q Share same higher order bits (I.e. prefix) q Routing tables and protocols must be capable of carrying a subnet mask. Notation: 128.13.0/23 q When an IP address matches multiple entries (eg 194.0.22.1), choose the one which had the longest mask (“longest-prefix match”)
42
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 42 RFC 1519: Classless Inter-Domain Routing (CIDR) IP Address : 12.4.0.0 IP Mask: 255.254.0.0 0000110000000100 00000000 1111111111111110 00000000 Address Mask for hostsNetwork Prefix Pre-CIDR: Network ID ended on 8-, 16, 24- bit boundary CIDR: Network ID can end at any bit boundary Usually written as 12.4.0.0/15, a.k.a “supernetting”
43
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 43 Understanding Prefixes and Masks (Recap) 0000110000000100 00000000 1111111111111110 00000000 12.4.0.0/15 0000110000000101 0000100100010000 0000110000000111 0000100100010000 12.5.9.16 12.7.9.16 12.5.9.16 is covered by prefix 12.4.0.0/15 12.7.9.16 is not covered by prefix 12.4.0.0/15
44
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 44 Service Provider Global Internet Routing Mesh 204.71.0.0 204.71.1.0 204.71.2.0 204.71.255.0 …...……. 204.71.0.0 204.71.1.0 204.71.2.0 204.71.255.0 …...……. Inter-domain Routing Without CIDR Service Provider Global Internet Routing Mesh 204.71.0.0 204.71.1.0 204.71.2.0 204.71.255.0 …...……. 204.71.0.0/16 Inter-domain Routing With CIDR
45
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 45 Longest Prefix Match (Classless) Forwarding Destination =12.5.9.16 ------------------------------- payload PrefixInterfaceNext Hop 12.0.0.0/810.14.22.19ATM 5/0/8 12.4.0.0/15 12.5.8.0/23attached Ethernet 0/1/3 Serial 1/0/7 10.1.3.77 IP Forwarding Table 0.0.0.0/010.14.11.33ATM 5/0/9 even better OK better best!
46
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 46 What is Routing Policy q Policy refers to arbitrary preference among a menu of available routes (based upon routes’ attributes) q Public description of the relationship between external BGP peers q Can also describe internal BGP peer relationship q Eg: Who are my BGP peers q What routes are q Originated by a peer q Imported from each peer q Exported to each peer q Preferred when multiple routes exist q What to do if no route exists?
47
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 47 Routing Policy Example q AS1 originates prefix “d” q AS1 exports “d” to AS2, AS2 imports q AS2 exports “d” to AS3, AS3 imports q AS3 exports “d” to AS5, AS5 imports
48
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 48 Routing Policy Example (cont) q AS5 also imports “d” from AS4 q Which route does it prefer? q Does it matter? q Consider case where q AS3 = Commercial Internet q AS4 = Internet2
49
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 49 Import and Export Policies q Inbound filtering controls outbound traffic q filters route updates received from other peers q filtering based on IP prefixes, AS_PATH, community q Outbound Filtering controls inbound traffic q forwarding a route means others may choose to reach the prefix through you q not forwarding a route means others must use another router to reach the prefix q Attribute Manipulation q Import: LOCAL_PREF (manipulate trust) q Export: AS_PATH and MEDs
50
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 50 Attributes are Used to Select Best Routes 192.0.2.0/24 pick me! 192.0.2.0/24 pick me! 192.0.2.0/24 pick me! 192.0.2.0/24 pick me! Given multiple routes to the same prefix, a BGP speaker must pick at most one best route (Note: it could reject them all!)
51
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 51 BGP Policy Knob: Attributes Value Code Reference ----- --------------------------------- --------- 1 ORIGIN [RFC1771] 2 AS_PATH [RFC1771] 3 NEXT_HOP [RFC1771] 4 MULTI_EXIT_DISC [RFC1771] 5 LOCAL_PREF [RFC1771] 6 ATOMIC_AGGREGATE [RFC1771] 7 AGGREGATOR [RFC1771] 8 COMMUNITY [RFC1997] 9 ORIGINATOR_ID [RFC2796] 10 CLUSTER_LIST [RFC2796] 11 DPA [Chen] 12 ADVERTISER [RFC1863] 13 RCID_PATH / CLUSTER_ID [RFC1863] 14 MP_REACH_NLRI [RFC2283] 15 MP_UNREACH_NLRI [RFC2283] 16 EXTENDED COMMUNITIES [Rosen]... 255 reserved for development From IANA: http://www.iana.org/assignments/bgp-parameters We will cover a subset of these attributes Not all attributes need to be present in every announcement
52
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 52 BGP Route Processing Best Route Selection Apply Import Policies Best Route Table Apply Export Policies Install forwarding Entries for best Routes. Receive BGP Updates Best Routes Transmit BGP Updates Apply Policy = filter routes & tweak attributes Based on Attribute Values IP Forwarding Table Apply Policy = filter routes & tweak attributes
53
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 53 Import and Export Policies q For inbound traffic q Filter outbound routes q Tweak attributes on outbound routes in the hope of influencing your neighbor’s best route selection q For outbound traffic q Filter inbound routes q Tweak attributes on inbound routes to influence best route selection outbound routes inbound routes inbound traffic outbound traffic In general, an AS has more control over outbound traffic
54
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 54 Policy Implementation Flow Main BGP RIB Adj RIB Out Outgo- ing Adj RIB In Incom- ing Main RIB/ FIB IGPs Static & HW Info
55
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 55 Conceptual Model of BGP Operation q RIB : Routing Information Base q Adj-RIB-In: Prefixes learned from neighbors. As many Adj-RIB-In as there are peers q Loc-RIB: Prefixes selected for local use after analyzing Adj-RIB-Ins. This RIB is advertised internally. q Adj-RIB-Out : Stores prefixes advertised to a particular neighbor. As many Adj-RIB-Out as there are neighbors
56
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 56 UPDATE message in BGP q Primary message between two BGP speakers. q Used to advertise/withdraw IP prefixes (NLRI) q Path attributes field : unique to BGP q Apply to all prefixes specified in NLRI field q Optional vs Well-known; Transitive vs Non-transitive Withdrawn Routes Length 2 octets Withdrawn Routes (variable length) Total Path Attributes Length Path Attributes (variable length) Network Layer Reachability Info. (NLRI: variable length)
57
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 57 Path Attributes: ORIGIN q ORIGIN: q Describes how a prefix came to BGP at the origin AS q Prefixes are learned from a source and “injected” into BGP: q Directly connected interfaces, manually configured static routes, dynamic IGP or EGP q Values: q IGP (EGP): Prefix learnt from IGP (EGP) q INCOMPLETE: Static routes
58
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 58 Path Attributes: AS-PATH q List of ASs thru which the prefix announcement has passed. AS on path adds ASN to AS-PATH q Eg: 138.39.0.0/16 originates at AS1 and is advertised to AS3 via AS2. q Eg: AS-SEQUENCE: “100 200” q Used for loop detection and path selection AS1 (100) AS2 (200) AS3 (15) 138.39.0.0/16
59
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 59 Traffic Often Follows ASPATH AS 4AS 3 AS 2 AS 1 135.207.0.0/16 ASPATH = 3 2 1 IP Packet Dest = 135.207.44.66
60
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 60 … But It Might Not AS 4AS 3 AS 2 AS 1 135.207.0.0/16 ASPATH = 3 2 1 IP Packet Dest = 135.207.44.66 AS 5 135.207.44.0/25 ASPATH = 5 135.207.44.0/25 AS 2 filters all subnets with masks longer than /24 135.207.0.0/16 ASPATH = 1 From AS 4, it may look like this packet will take path 3 2 1, but it actually takes path 3 2 5
61
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 61 Shorter AS-PATH Doesn’t Mean Shorter # Hops AS 4 AS 3 AS 2 AS 1 BGP says that path 4 1 is better than path 3 2 1 Duh!
62
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 62 Path Attributes: NEXT-HOP q Next-hop: node to which packets must be sent for the IP prefixes. May not be same as peer. q UPDATE for 180.20.0.0, NEXT-HOP= 170.10.20.3 BGP Speakers Not a BGP Speaker
63
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 63 Recursive Lookup q If routes (prefix) are learnt thru iBGP, NEXT-HOP is the iBGP router which originated the route. q Note: iBGP peer might be several IP-level hops away as determined by the IGP q Hence BGP NEXT-HOP is not the same as IP next- hop q BGP therefore checks if the “NEXT-HOP” is reachable through its IGP. q If so, it installs the IGP next-hop for the prefix q This process is known as “recursive lookup” – the lookup is done in the control-plane (not data-plane) before populating the forwarding table. q Example in next slide
64
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 64 Forwarding Table Join EGP with IGP For Connectivity AS 1AS 2 192.0.2.1 135.207.0.0/16 10.10.10.10 EGP 192.0.2.1135.207.0.0/16 destinationnext hop 10.10.10.10192.0.2.0/30 destinationnext hop 135.207.0.0/16 Next Hop = 192.0.2.1 192.0.2.0/30 135.207.0.0/16 destinationnext hop 10.10.10.10 + 192.0.2.0/3010.10.10.10
65
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 65 Local Preference AS1 AS2 MED Load-Balancing Knobs in BGP q LOCAL-PREF: outbound traffic, local preference (box- level knob) q MED: Inbound-traffic, typically from the same ISP (link- level knob)
66
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 66 Path Attribute: LOCAL-PREF q Locally configured indication about which path is preferred to exit the AS in order to reach a certain network. Default value = 100. Higher is better.
67
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 67 Attributes: MULTI-EXIT Discriminator q Also called METRIC or MED Attribute. Lower is better q AS1:multihomed customer. q AS2 (provider) includes MED to AS1 q AS1 chooses which link (NEXTHOP) to use q Eg: traffic to AS3 can go thru Link1, and AS2 thru Link2 AS1 AS2 AS3 AS4 Link A Link B
68
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 68 Hot Potato Routing: Closest Egress Point 192.44.78.0/24 15 56 IGP distances egress 1 egress 2 This Router has two BGP routes to 192.44.78.0/24. Hot potato: get traffic off of your network as Soon as possible. Go for egress 1!
69
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 69 Getting Burned by the Hot Potato 15 56 17 2865 High bandwidth Provider backbone Low b/w customer backbone Heavy Content Web Farm Many customers want their provider to carry the bits! tiny http request huge http reply SFFNYC San Diego
70
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 70 Cold Potato Routing with MEDs (Multi-Exit Discriminator Attribute) 15 56 17 2865 Heavy Content Web Farm 192.44.78.0/24 MED = 15 192.44.78.0/24 MED = 56 This means that MEDs must be considered BEFORE IGP distance! Prefer lower MED values Note1 : some providers will not listen to MEDs Note2 : MEDs need not be tied to IGP distance
71
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 71 MEDs Can Export Internal Instability 15 17 2865 Heavy Content Web Farm 192.44.78.0/24 MED = 15 192.44.78.0/24 MED = 56 OR 10 56 10 FLAP
72
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 72 ASPATH Padding: Shed inbound traffic Padding will (usually) force inbound traffic from AS 1 to take primary link AS 1 192.0.2.0/24 ASPATH = 2 2 2 customer AS 2 provider 192.0.2.0/24 backupprimary 192.0.2.0/24 ASPATH = 2
73
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 73 Padding May Not Shut Off All Traffic AS 1 192.0.2.0/24 ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 customer AS 2 provider 192.0.2.0/24 ASPATH = 2 AS 3 provider AS 3 will send traffic on “backup” link because it prefers customer routes and local preference is considered before ASPATH length! Padding in this way is often used as a form of load balancing backupprimary
74
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 74 Deaggregation + Multihoming AS 1 customer AS 2 provider 12.0.0.0/8 AS 3 provider 12.2.0.0/16 If AS 1 does not announce the more specific prefix, then most traffic to AS 2 will go through AS 3 because it is a longer match AS 2 is “punching a hole” in the CIDR block of AS 1=> subverts CIDR
75
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 75 CIDR at Work, No load balancing ISP3 AS1 128.40/16 140.127/16 Link A Link B ISP1 128.32/11 ISP2 140.64/10 128.40/16 140.127/16 PrefixNext Hop ORIGIN AS 128.32/11ISP1 140.64/10ISP2 Table at ISP3
76
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 76 CIDR Subverted for Load Balancing ISP3 AS1 128.40/16 140.127/16 Link A Link B ISP1 128.32/11 ISP2 140.64/10 140.255.20/24, 128.40/16 128.42.10/24, 140.127/16 PrefixNext Hop ORIGIN AS 128.32/11ISP1 140.64/10ISP2 140.255.20/24ISP1AS1 128.42.10/24ISP2AS1 Table at ISP3
77
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 77 How Can Routes be Colored? BGP Communities A community value is 32 bits By convention, first 16 bits is ASN indicating who is giving it an interpretation community number Community Attribute = a list of community values. (So one route can belong to multiple communities) RFC 1997 (August 1996) Used within and between ASes The set of ASes must agree on how to interpret the community value Very powerful BECAUSE it has no (predefined) meaning Two reserved communities no_advertise 0xFFFFFF02: don’t pass to BGP neighbors no_export = 0xFFFFFF01: don’t export out of AS
78
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 78 Communities Example q 1:100 q Customer routes q 1:200 q Peer routes q 1:300 q Provider Routes q To Customers q 1:100, 1:200, 1:300 q To Peers q 1:100 q To Providers q 1:100 AS 1 ImportExport
79
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 79 BGP Route Selection Process q If NEXTHOP is inaccessible do not consider the route. q Prefer largest LOCAL-PREF q If same LOCAL-PREF prefer the shortest AS-PATH. q If all paths are external prefer the lowest ORIGIN code (IGP<EGP<INCOMPLETE). q If ORIGIN codes are the same prefer the lowest MED. q If MED is same, prefer min-cost NEXT-HOP q If routes learned from EBGP or IBGP, prefer paths learnt from EBGP q Final tie-break: Prefer the route with I-BGP ID (IP address) Series of tie-breaker decisions...
80
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 80 Route Selection Summary Highest Local Preference Shortest ASPATH Lowest MED i-BGP < e-BGP Lowest IGP cost to BGP egress Lowest router ID traffic engineering Enforce relationships Throw up hands and break ties
81
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 81 Caveat BGP is not guaranteed to converge on a stable routing. Policy interactions could lead to “livelock” protocol oscillations. See “Persistent Route Oscillations in Inter-domain Routing” by K. Varadhan, R. Govindan, and D. Estrin. ISI report, 1996 Corollary: BGP is not guaranteed to recover from network failures.
82
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 82 BGP Table Growth Thanks: Geoff Huston. http://www.telstra.net/ops/bgptable.html
83
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 83 Large BGP Tables Considered Harmful Routing tables must store best routes and alternate routes Burden can be large for routers with many alternate routes (route reflectors for example) Routers have been known to die Increases CPU load, especially during session reset
84
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 84 ASNs Growth From: Geoff Huston. http://www.telstra.net/ops
85
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 85 Dealing with ASN growth… q Make ASNs larger than 16 bits q How about 32 bits? q See Internet Draft: “BGP support for four-octet AS number space” (draft-ietf-idr-as4bytes-03.txt) q Requires protocol change and wide deployment q Change the way ASNs are used q Allow multihomed, non-transit networks to use private ASNs q Uses ASE (AS number Substitution on Egress ) q See Internet Draft: “Autonomous System Number Substitution on Egress” (draft-jhaas-ase-00.txt) q Works at edge, requires protocol change (for loop prevention)
86
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 86 Daily Update Count
87
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 87 A Few Bad Apples … Thanks to Madanlal Musuvathi for this plot. Data source: RIPE NCC Typically, 80% of the updates are for less than 5% Of the prefixes. Most prefixes are stable most of the time. On this day, about 83% of the prefixes were not updated. Percent of BGP table prefixes
88
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 88 Squashing Updates q Rate limiting on sending updates q Send batch of updates every MinRouteAdvertisementInterval seconds (+/- random fuzz) q Default value is 30 seconds q A router can change its mind about best routes many times within this interval without telling neighbors q Route Flap Dampening q Punish routes for misbehaving Effective in dampening oscillations inherent in the vectoring approach Must be turned on with configuration
89
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 89 Route Flap Dampening (RFC 2439) Routes are given a penalty for changing. If penalty exceeds suppress limit, the route is dampened. When the route is not changing, its penalty decays exponentially. If the penalty goes below reuse limit, then it is announced again. Can dramatically reduce the number of BGP updates Requires additional router resources Applied on eBGP inbound only
90
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 90 Route Flap Dampening Example route dampened for nearly 1 hour penalty for each flap = 1000
91
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 91 How Long Does BGP Take to Adapt to Changes? From: Abha Ahuja and Craig Labovitz
92
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 92 Two Main Factors in Delayed Convergence q Rate limiting timer slows everything down q BGP can explore many alternate paths before giving up or arriving at a new path q No global knowledge in vectoring protocols
93
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 93 Implementation Does Matter! stateless withdraws widely deployed stateful withdraws widely deployed
94
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 94 What is RPSL? Why ? q Object oriented language (developed by RIPE 181) q Structured objects q Describes things interesting to routing policy q Routes, ASNs, Peer Relationships etc q Allows consistent configuration between BGP peers q Expertise encoded in the tools that generate the policy rather than engineer configuring peering session q Automatic, manageable solution for filter generation RFC 2622 - “Routing Policy Specification Language (RPSL)” FOR MORE INFO...
95
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 95 Summary q BGP is a fairly simple protocol … q … but it is not easy to configure q BGP is running on more than 100K routers making it one of world’s largest and most visible distributed systems q Global dynamics and scaling principles are still not well understood q Traffic Engineering hacked in as an afterthought…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.