Download presentation
Presentation is loading. Please wait.
1
Traffic Engineering for ISP Networks Jennifer Rexford Computer Science Department Princeton University http://www.cs.princeton.edu/~jrex
2
A Challenge in ISP Backbone Networks Finding a good way to route the data packets –Given the current network topology and offered traffic –For good performance and efficient use of resources
3
Why the Problem is Hard? IP traffic varies, and the service is best effort –The offered traffic is not known in advance –The resources in the network are not reserved The routers do not adapt on their own –Load-sensitive routing is not widely deployed –Due to control overhead and stability challenges Routing protocols were not designed to be managed –At best indirect control over the flow of traffic Fine-grain traffic measurements often unavailable –E.g., only have coarse-grain link load statistics
4
In This Talk… TE with traditional IP routing protocols –Shortest-path protocols with configurable link weights Two main research challenges –Optimization: tuning link weights to the offered traffic –Tomography: inferring the offered traffic from link load Deployed solutions in AT&T’s U.S. backbone –Our experiences working with the network operators –And how we improved the tools over time Ongoing research on traffic management
5
Optimization: Tuning Link Weights
6
Routing Inside an Internet Service Provider Routers flood information to learn the topology –Routers determine “next hop” to reach other routers… –By computing shortest paths based on the link weights Routers forward packets via the “next hop” link(s) 3 2 2 1 1 3 1 4 5 3
7
Link Weights Control the Flow of Traffic 3 2 2 1 1 3 1 4 5 3 Routers compute paths –Shortest paths as sum of link weights Operators set the link weights –To control where the traffic goes 3
8
Heuristics for Setting the Link Weights Proportional to physical distance –Cross-country links have higher weights than local ones –Minimizes end-to-end propagation delay Inversely proportional to link capacity –Smaller weights for higher-bandwidth links –Attracts more traffic to links with more capacity Tuned based on the offered traffic –Network-wide optimization of weights based on traffic –Directly minimizes key metrics like max link utilization
9
Why Are the Link Weights Static? Strawman alternative: load-sensitive routing –Link metrics based on traffic load –Flood dynamic metrics as they change –Adapt automatically to changes in offered load Reasons why this is typically not done –Delay-based routing unsuccessful in the early days –Oscillation as routers adapt to out-of-date information –Most Internet transfers are very short-lived Research and standards work continues… –… but operators have to work with what they have
10
Big Picture: Measure, Model, and Control Topology/ Configuration Offered traffic Changes to the network Operational network Network-wide “what if” model measure control
11
Traffic Engineering in an ISP Backbone Topology –Connectivity and capacity of routers and links Traffic matrix –Offered load between points in the network Link weights –Configurable weights for shortest-path routing Performance objective –Balanced load, low latency, service level agreements … Question: Given the topology and traffic matrix in an IP network, which link weights should be used?
12
Key Ingredients of Our Approach Measurement –Topology: monitoring of the routing protocols –Traffic matrix: widely deployed traffic measurement Network-wide models –Representations of topology and traffic –“What-if” models of shortest-path routing Network optimization –Efficient algorithms to find good configurations –Operational experience to identify key constraints
13
Formalizing the Optimization Problem Input: graph G(R,L) –R is the set of routers –L is the set of unidirectional links –c l is the capacity of link l Input: traffic matrix –M i,j is traffic load from router i to j Output: setting of the link weights –w l is weight on unidirectional link l –P i,j,l is fraction of traffic from i to j traversing link l
14
Multiple Shortest Paths With Even Splitting 0.5 0.25 1.0 Values of P i,j,l
15
Defining the Objective Function Computing the link utilization – Link load: u l = i,j M i,j P i,j,l – Utilization: u l /c l Objective functions – min(max l (u l /c l )) – min( l f(u l /c l )) f(x) 1 x
16
Complexity of the Optimization Problem NP-hard optimization problem –No efficient algorithm to find the link weights –Even for the simple convex objective functions Why can’t we just do multi-commodity flow? –E.g., solve the multi-commodity flow problem… –… and the link weights pop out as the dual –Because IP routers cannot split arbitrarily over ties What are the implications? –Have to resort to searching through weight settings
17
Optimization Based on Local Search Start with an initial setting of the link weights –E.g., same integer weight on every link –E.g., weights inversely proportional to link capacity –E.g., existing weights in the operational network Compute the objective function –Compute the all-pairs shortest paths to get P i,j,l –Apply the traffic matrix M i,j to get link loads u l –Evaluate the objective function from the u l /c l Generate a new setting of the link weights repeat
18
Making the Search Efficient Avoid repeating the same weight setting –Keep track of past values of the weight setting –… or keep a small signature (e.g., a hash) of past values –Do not evaluate a weight setting if signatures match Avoid computing the shortest paths from scratch –Explore weight settings that changes just one weight –Apply fast incremental shortest-path algorithms Limit the number of unique values of link weights –Do not explore all 2 16 possible values for each weight Stop early, before exploring the whole search space
19
Incorporating Operational Realities Minimize number of changes to the network –Changing just 1 or 2 link weights is often enough Tolerate failure of network equipment –Weights settings usually remain good after failure –… or can be fixed by changing one or two weights Limit dependence on measurement accuracy –Good weights remain good, despite random noise Limit frequency of changes to the weights –Joint optimization for day and night traffic matrices
20
Application to AT&T’s Backbone Network Performance of the optimized weights –Search finds a good solution within a few minutes –Much better than link capacity or physical distance –Competitive with multi-commodity flow solution How AT&T changes the link weights –Maintenance done every night from midnight to 6am –Predict effects of removing link(s) from the network –Reoptimize the link weights to avoid congestion –Configure new weights before disabling equipment
21
Example from My Visit to AT&T’s Operations Center Amtrak repairing/moving part of the train track –Need to move some of the fiber optic cables –Or, heightened risk of the cables being cut –Amtrak notifies us of the time the work will be done AT&T engineers model the effects –Determine which IP links go over the affected fiber –Pretend the network no longer has these links –Evaluate the new shortest paths and traffic flow –Identify whether link loads will be too high
22
Example Continued If load will be too high –Reoptimize the weights on the remaining links –Schedule the time for the new weights to be configured –Roll back to the old weight setting after Amtrak is done Same process applied to other cases –Assessing the network’s risk to possible failures –Planning for maintenance of existing equipment –Adapting the link weights to installation of new links –Adapting the link weights in response to traffic shifts
23
Conclusions on Traffic Engineering IP networks do not adapt on their own –Routers compute shortest paths based on static weights Service providers need to adapt the weights –Due to failures, congestion, or planned maintenance Leads to an interesting optimization problems –Optimize link weights based on topology and traffic Optimization problem is computationally difficult –Forces the use of efficient local-search techniques Results of the local search are pretty good –Near-optimal solutions that minimize disruptions
24
Extensions Robust link-weight assignments –Link/node failures –Range of traffic matrices More complex routing models –Destinations reachable via multiple “egress points” –Interdomain routing policies Interaction between ISPs –Inter-ISP negotiation for joint optimization –Grappling with scalability and trust issues
25
Tomography: Inferring the Traffic Matrix
26
Computing the Traffic Matrix M i,j Hard to measure the traffic matrix –IP networks transmit data as individual packets –Routers do not keep traffic statistics, except link utilization on (say) a five-minute time scale Need to infer the traffic matrix M i,j from –Current topology G(R,L) –Current routing P i,j,l –Current link load u l –Link capacity c l
27
4Mbps 3Mbps5Mbps Inference: Network Tomography Sources Destinations From link counts to the traffic matrix
28
Tomography: Formalizing the Problem Ingress-egress pairs –p is a ingress-egress pair of nodes (i,j) –x p is the (unknown) traffic volume for this pair M i,j Routing –P lp is proportion of p’s traffic that traverses l Links in the network –l is a unidirectional edge –u l is the observed traffic volume on this link Relationship: u = Px (work backwards to get x)
29
Tomography: One Observation Not Enough Linear system of n nodes is underdetermined –Number of links e is around O(n) –Number of ingress-egress pairs c is O(n 2 ) –Dimension of solution sub-space at least c - e Multiple observations are needed –k independent observations (over time) –Stochastic model with Poisson iid counts –Maximum likelihood estimation to infer matrix Doesn’t work all that well in practice…
30
Approach Used at AT&T: Tomo-gravity Gravitational assumption –Ingress point a has traffic v i a –Egress point b has traffic v e b –Pair (a,b) has traffic proportional to v i a * v e b 9 20 10 6 3 21 14 7
31
Approach Used at AT&T: Tomo-gravity Problem with gravity model –Gravity model ignores the load on the inside links –Gravity assumption isn’t always 100% correct –Resulting traffic matrix might not satisfy the link loads Combining the two techniques –Gravity: find a traffic matrix using the gravity model –Tomography: find the family of traffic matrices consistent with all link load statistics –Tomo-gravity: find the tomography solution that is closest to the output of the gravity model Works extremely well (and fast) in practice
32
Conclusions Managing IP networks is challenging –Routers don’t adapt on their own to congestion –Routers don’t reveal much information about traffic Measurement provides a network-wide view –Topology –Traffic matrix Optimization enables the network to adapt –Inferring the traffic matrix from the link loads –Optimizing the link weights based on the traffic matrix
33
New Research Direction: Design for Manage-ability Two main parts of network management –Control: optimization –Measurement: tomography Two research approaches –Bottom up: do the best with what you have –Top down: design systems that are easier to manage Design for manage-ability –“If you are both the professor and the student, you create exam questions that are easy to answer.”
34
Example: Changing the Path Computation Routers split traffic over multiple paths –More traffic on shorter paths, less on longer ones –In proportion to the exponential of path cost Exciting result –Can achieve optimal distribution of the traffic –With polynomial-time algorithm for setting the weights 3 2 2 1 1 3 1 4 5 3
35
New Research Direction: Logically-Central Control Traditional division of labor –Routers: real-time, distributed protocols –Management system: offline, centralized algorithms Example: routing protocols and traffic engineering –Routing: routers react automatically to link failures –TE: management system sets the link weights The case for separating routing from routers –Better decisions with network-wide visibility –Routers only collect measurements and forward packets
36
Example: Routing Control Platform (RCP) Logically-centralized server –Collects measurement data from the network –Pushes forwarding tables into the routers Benefits –Network-wide policies –Flexible, easy to customize –Fewer nodes to upgrade Feasibility –High-end PC can compute routes for large ISP –Simple replication to survive failures ISP RCP
37
References Traffic engineering using traditional protocols –http://www.cs.princeton.edu/~jrex/papers/ieeecomm02.pdfhttp://www.cs.princeton.edu/~jrex/papers/ieeecomm02.pdf –http://www.cs.princeton.edu/~jrex/papers/opthand04.pdfhttp://www.cs.princeton.edu/~jrex/papers/opthand04.pdf –http://www.cs.princeton.edu/~jrex/papers/ton-whatif.pdfhttp://www.cs.princeton.edu/~jrex/papers/ton-whatif.pdf Tomo-gravity to infer the traffic matrix –http://www.cs.utexas.edu/~yzhang/papers/mmi-ton05.pdfhttp://www.cs.utexas.edu/~yzhang/papers/mmi-ton05.pdf –http://www.cs.utexas.edu/~yzhang/papers/tomogravity- sigm03.pdfhttp://www.cs.utexas.edu/~yzhang/papers/tomogravity- sigm03.pdf –http://www.cs.princeton.edu/~jrex/papers/sfi.pdfhttp://www.cs.princeton.edu/~jrex/papers/sfi.pdf
38
References Design for manage-ability –http://www.cs.princeton.edu/~jrex/papers/pefti.pdfhttp://www.cs.princeton.edu/~jrex/papers/pefti.pdf –http://www.cs.princeton.edu/~jrex/papers/optimizability.pdfhttp://www.cs.princeton.edu/~jrex/papers/optimizability.pdf –http://www.cs.princeton.edu/~jrex/papers/tie-long.pdfhttp://www.cs.princeton.edu/~jrex/papers/tie-long.pdf Routing Control Platform –http://www.cs.princeton.edu/~jrex/papers/rcp.pdf –http://www.cs.princeton.edu/~jrex/papers/ccr05-4d.pdfhttp://www.cs.princeton.edu/~jrex/papers/ccr05-4d.pdf –http://www.cs.princeton.edu/~jrex/papers/rcp-nsdi.pdfhttp://www.cs.princeton.edu/~jrex/papers/rcp-nsdi.pdf –http://www.research.att.com/~kobus/docs/irscp.inm.pdf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.