Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.

Similar presentations


Presentation on theme: "© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual."— Presentation transcript:

1 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. Mining Port-level IP Traffic Data Errol Caby AT&T Labs

2 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. Outline IP traffic metrics Exploring the relationship between IP traffic metrics Classifying IP traffic patterns Making IP traffic projections 1

3 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. IP Traffic Metrics IP services such as VPN (Virtual Private Network) are provided through ports which are identified by IP address and circuit ID. High utilization levels (high traffic levels compared to the port’s bandwidth) may cause degradation in these services. Consequently, it is of value to analyze IP traffic data at the port level to identify/predict those ports that currently have high utilization or will have high utilization within a given period of time. Two IP traffic metrics: Monthly utilization – The monthly utilization of a circuit is the average of the daily peak utilization for the month where utilization measures the fraction/percent of bandwidth used. Hours of over-utilization – The hours of over-utilization of a circuit is the length of time (in hours) that the utilization exceeds a specified threshold in a month 2

4 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. Exploring The Relationship Between The Two IP Traffic Metrics Let m 1 and m 2 denote the monthly utilization and hours of over-utilization metrics, respectively. (That is, if x is a port, then m 1 (x) and m 2 (x) will denote its monthly utilization and hours of over-utilization, respectively.) We would like to examine the relationship between m 1 and m 2, in particular, we would like to find a mapping f such that m 1 (x) = f(m 2 (x)) for any port x. The challenge The data that was available consisted of the two traffic metrics evaluated on disjoint sets of ports, i.e., monthly utilization was calculated for one set and hours of over-utilization was calculated for a different disjoint set. 3

5 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. Exploring The Relationship Between The Two IP Traffic Metrics (cont.) A definition – consistency: Let n 1 and n 2 be two metrics on a set W, we will say that n 1 and n 2 are consistent on W if n 1 (u) < n 1 (v) if and only if n 2 (u) < n 2 (v) where u and v are in W. Assume that m 1 and m 2 (the monthly utilization and hours of over-utilization, respectively) are consistent on the set of ports y for which m 2 (y) > 0, then some consequences are the following: if Y is a set of ports y with m 2 (y) > 0 for all y in Y, then f maps the p th percentile in {m 2 (y) | y in Y} into the p th percentile in {m 1 (y) | y in Y}, i.e., f maps percentiles into corresponding percentiles. furthermore, if Y is a set of ports with m 2 (y) > 0 for all y in Y and if X is a set of ports on which m 1 has been evaluated such that {m 1 (x) | x in X} and {m 1 (y) | y in Y} can be considered to be samples from the same distribution (note that the values in {m 1 (y) | y in Y} are assumed to be unknown but the values in {m 1 (x) | x in X} are known), then the mapping f can be determined from the above result. That is, if m 2 (y 0 ) is the p th percentile in {m 2 (y) | y in Y}, then f(m 2 (y 0 )) can be estimated by the p th percentile in {m 1 (x) | x in X}. 4

6 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. Illustration – Exploring The Relationship Between The IP Traffic Metrics At The Circuit Level Plot of estimated points of the mapping f. 5 Over-Utilization Hrs Monthly Utilization

7 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. Illustration – Exploring The Relationship Between The IP Traffic Metrics At The Circuit Level (cont.) A closed form of the mapping f may be estimated through curve fitting. A good fit was found using a curve of the form 6 Over-Utilization Hrs Monthly Utilization

8 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. Mining IP Traffic Patterns Objective – devise an algorithm for mining the time series history of the monthly utilization for a large number of ports that: classifies the time series pattern for each port forecasts the monthly utilization a number of months out in the future port by port in order to identify ports whose utilization would soon exceed the over-utilization threshold A desirable quality is that the algorithm be simple so that it runs quickly and so that there are few requirements on the computing environment (e.g., it does not require any sophisticated computing platform). 7

9 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. Normalizing Utilization The IP environment is dynamic; bandwidth may change. Consequently, since monthly utilization expresses the percent of the bandwidth used, adjustments to the monthly utilization are needed to get the true pattern of the traffic. This can be done by normalizing monthly utilization, expressing it in terms of a single bandwidth for the entire time period considered. 8

10 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. Example – Normalizing Traffic Patterns The plot on the left is the original time series of monthly utilization; the plot on the right is the normalized monthly utilization. Note that the patterns are different. 9 Month Utilization Month Adj. Utilization

11 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. Traffic Pattern Classification (cont.) In describing a port’s traffic, the curve (from a small set of families of curves) that is the closest to the traffic time series is then found. This curve together with the root- mean-square error describes the traffic pattern (the curve gives the general trend of the traffic; the root-mean-square error captures the fluctuation about this trend). The traffic pattern, consequently, can be classified according to the family to which it belongs. For simplicity, the families of curves considered were 2-parameter families of the form y = a*f(x) + b where f is a function of x. It was found that the following three functions f(x) = x, f(x) = x 2 and f(x) = log e (x) were sufficient to capture many of the patterns occurring. The resulting three families of curves being: y = a*x + b --constant growth rate y = a*x 2 + b --increasing growth rate y = a*log e (x) + b --slowing growth rate Since the curves (models) are linear in the parameters, the best-fitting curve in a family can be found by the usual least squares technique. Also, note that since the curves all have two parameters, the best fitting curve can be found by choosing the one that minimizes R 2. 10

12 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. IP Traffic Projection To evaluate how well the three families of curves succeeded in describing/differentiating traffic patterns and how well they predicted future traffic, - the set of points (say n points) in the available time series were divided into two sets, the first n – k and the last k points, where k < n – k. - the curve (from all three families of curves) that best fitted the first n – k points, i.e. minimized R 2, was selected as the one describing the traffic pattern. - the mean absolute error between this curve and the traffic time series, calculated for the last k points, was then compared with the corresponding mean absolute errors for the best-fitting curves (based on the first n – k points) from the other two classes of curves. 11

13 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. IP Traffic Projection – Example 1 The best-fitting curve to the first 17 points is of the form y = a*log e (x) + b The mean absolute error between this curve and the last 5 points of the traffic time series is smaller than the mean absolute errors of the best-fitting curves from the other families. 12

14 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. IP Traffic Projection – Example 2 The best-fitting curve to the first 17 points is of the form y = a*x 2 + b The mean absolute error between this curve and the last 5 points of the traffic time series is smaller than the mean absolute errors of the best-fitting curves from the other families. 13

15 © 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual Property and/or AT&T affiliated companies. All other marks contained herein are the property of their respective owners. Conclusion Testing the algorithm on a small set of ports have yielded results that suggest that the three families of 2-parameter curves may be sufficient to capture the key elements of the traffic patterns. Full evaluation awaits the full-scale implementation of the algorithm. 14


Download ppt "© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual."

Similar presentations


Ads by Google