1 TCOM 5143 Lecture 9 Traffic and cost generators
2 1. Introduction Traffic and cost generators are useful when actual traffic and cost data is not available. Traffic and cost generators allow sensitivity analysis of candidate designs using a range of network design problems of varying traffic and cost data.
3 2. The Structure of a network design problem The input to an algorithm that solves a network problem either optimally or heuristically (approximately) is usually made of six components organized in the form of tables. 1. Sites table contains information about the sites such as names, locations, and types (e.g., terminal, concentrator, or backbone, etc.). 2. Line type table contains information about the lines/links available for the design such as types (e.g., fiber, satellite, etc.) and their capacities (speeds). 3. Traffic table contains information about the traffic flow (Erlangs, bps, etc.) between sites.
4 4. Tariff table contains information about the link costs available for the design. 5. Equipment table contains information about the cost and capacity of equipment placed at every site. (see Appendix D in Cahn textbook). 6. Parameters table contains information about the network parameters guiding the design (e.g., topology, link utilization, message length, etc.). Example: see section pages in Cahn textbook.
5 3. The sites table for network generators Design Principle 4.1. The first thing a network designer needs to know is the location of the sites to be connected, just as a builder needs a survey of a building site. Information contained in a sites table is: 1. identity of a site (location, name, etc.) 2. type (functionality) of a site (terminal, concentrator, backbone, server, etc.) 3. geographical information of a site 4. coordinates of a site
6 5. parent (homing site) of a site 6. population (user-population or census- population) of a site 7. traffic-out of a site 8. traffic-in of a site 9. level of a site in the level-hierarchy (a site at a higher level to a site at a lower level exhibits more traffic than its reverse) (see section 4.4 in Cahn textbook) 3. The sites table for network generators (continued)
7 4. Traffic generators A traffic generator produces a traffic matrix consisting of realistic traffic data from every site to every other site. We study various models that efficiently generate traffic data. Site v site u... traffic (u, v)......
Uniform traffic Traffic data can be generated by setting traffic (u, v) equal to a constant for all sites u and v. Uniform traffic is unrealistic.
Random Traffic Assume that rand() is a (pseudo-) random function in S={0, 1,..., M}, i.e., for any given element in S the probability that the element is selected by the function is constant and equal to. Random traffic from a site u to a site v in the range can be generated by Note that traffic (u, v) = t min when rand() = 0, and traffic (u, v) = t max when rand() = M.
Realistic Traffic Models For sites u and v with populations (user or census) pop u and pop v at a distance, where u and v have vertical and horizontal coordinates (vert u, hori u ) and (vert v, hori v ), respectively, a realistic traffic assignment is where pop_power and dist_power are some properly chosen exponents and is a scaling factor (forcing traffic (u, v) in an acceptable range).
Realistic Traffic Models (continued) Example 1: pop_power = 1 and dist_power = 0 (in Three-Location Data Network Problem in Lecture Notes 4). There are two problem with the above formulation. 1. The numerical value for the numerator (pop u pop v ) pop_power may not be in a manageable range relative to the denominator. 2.The denominator dist (u, v) dist_power may be zero.
12 We can normalize the sizes of pop u, pop v, and dist (u, v) as follows. Let. pop_max = max{pop w w is a site} and. dist_max=max {dist(x, y) x and y are sites}, and. pop_offset and dist_offset are properly chosen small positive real numbers.
13 We can improve the traffic assignment to The factor-adjustments of and are to normalize the population and distance values, respectively, and the sum-adjustment of dist_max is to avoid “division by zero dist(u, v).”
14 5. Normalization of Traffic Matrices The traffic generation models described above do not address the consistency between the traffic generated and the collective traffic observed for a site or overall network. Now we learn how to normalize the traffic matrix generated into realistic data reflecting the traffic observed. Example: A traffic generator produces the following incoming traffics to site A from four other sites: 100, 200, 150, and 100. Let’s assume that we know that the total incoming traffic to site A is 600. How to adjust the individual incoming traffic to site A to reflect the total incoming traffic to site A?
Total Normalization Suppose that we have a traffic matrix T from a traffic generator (Sections 4.1, 4.2, or 4.3). How should we normalize T so that it reflects (agrees with) an observed (known) traffic_total? A simple way is to scale T as follows. Since the underlying traffic generator produces entries T(u, v) (where u and v are sites) in T that reflect the relative traffic flows from sites to sites, such property is preserved in T for every positive constant . Thus, we determine a total-scale factor so that T reflects an observed traffic_total, i.e.,
16 Example 4.1 in section p. 108 in Cahn textbook Suppose that a company has 50 sites linked by 85 E1-lines(2048 Kbps). The average number of hops in a route is 2.75, and the links have an average utilization of 55%. What value of should be chosen to generate the traffic? Since we are not given the traffic matrix T yet, we can express as in With additional information on populations and coordinates of sites, a traffic generator can generate T. Hence can then be determined.
17 But how do we determine the value of traffic_total? Recall that in Lecture Notes 5, Section (Scalability Problem for MSTs), we have where link flow (u, v) is the total amount of link flows on the link uv. Hence
Row Normalization What if we desire to normalize T so that it reflects an observed traffic_out v for a site v? Instead of scaling the entire matrix T, we scale the row of T indexed by v. Denote the row-scale factor v for the site v with observed traffic_out v. Then Note that the summation in the numerator all sites u T(u, v) gives the row-sum of the row indexed by v, which measures the total out-bound traffic from the site v. Other rows in T can be normalized (with respect to their corresponding traffic_out) similarly.
Column Normalization Similar to row normalization, we denote the column-scale factor v for a site v with observed traffic_in v. Then Note that the summation in the numerator all sites u T(u, v) gives the column-sum of the column indexed by v, which measures the total in-bound traffic into the site v. Other columns in T can be normalized (with respect to their corresponding traffic_in) similarly.
Row and column normalization Here, we know the total traffics into and out of each site in the network and we want to normalize traffics generated by a traffic generator to agree with total traffics. A necessary condition for do row and column normalization The method to do row and column normalization is not covered in this lecture. If you are interested, consult section in Cahn textbook.
Asymmetric traffic flows and its traffic model All traffic models studied so far generate symmetric traffic, that is, T (i, j) = T(j, i) for all sites i and j, where T is the traffic matrix generated. We assign to each site a level that is proportional to its outgoing traffic. In general, all sites of the same type are assigned the same level. For example, all terminal sites are assigned level 1 and all web server sites are assigned level 2, ect. Assume that we have information about the ratio of the traffic from a site u to a site v to the traffic from site v to u; then
22 For example, if site u is a web server and v is terminal, and if a request generated by the terminal to the site is on average 600 bytes and the response from the server is on average 5000 bytes long, then traffic_level(u,v)=12. Note that traffic_level(u,u)=0 since a site (a terminal for example) does not send traffic to itself through the network. In general, the formulae to generate asymmetric traffic is
23 6. A case study in the use of traffic generators Section 4.7 in Cahn textbook. Consider the network design problem of seven sites {N1, N2,..., N7} given in table 4.4 in Cahn textbook, in which N4 and N7 are host sites and the others are non-host sites. Five major applications are identified: 1. remote terminal/user sessions to two hosts N4 and N7, 2. internal within seven sites, 3. external , 4. Web activities (internal and external), and 5. database services by the two hosts N4 and N7, and an additional database server N1.
24 Based on Table 4.4 in Cahn textbook, we compute the distance matrix between the sites (see page 115 in Cahn textbook). For each major activity, we consider the underlying assumptions and develop the corresponding traffic matrix Remote Terminal/User Sessions Assumptions: 1. Per hour: Each non-host site has 50% of its site- population engaged in remote terminal sessions. 2. Per hour, per person: sends 15 packets of 200 bytes to remote host and receives 30 packets of 1,000 bytes from remote host.
25 4. Level-1 sites are N1, N2, N3, N5, N6, and level-2 sites are N4 and N7. There are 100 users in sites N1, N2, N3, N4, and N5, and 200 users in sites N6 and N7. 5. Traffic_level (level-2 site, level-1 site) = 10 traffic_level(level-1 site, level-2 site),. traffic_level (level-1 site, level-1 site) = 0. traffic_level(level-2 site, level-2 site) = 0. traffic_total = number of users at non-host sites 50% (15 1,000) bps = 600 0.5 33,000 = 22,000 bps. Parameter table: see table 4.5 in Cahn textbook. Traffic matrix: see page 116 in Cahn textbook.
Internal Assumptions: 1.Per hour, per person: sends 8 articles of 2,000 bytes each internally (including the local sites). 2.pop_power = 1.0 and dist_power = 0.5. Notes: 1. Traffic_total = number of users in all sites (8 2,000 bps = 900 16,000 8/3600 = 32,000 bps. 2. Need to generate traffic in a local site, and then convert it to zero. Parameter table: see table 4.7 in Cahn textbook. Traffic matrix: see page 119 in Cahn textbook.
External Assumptions: 1.Host sites N4 and N7 are internet gateways. 2.Per hour, per person: sends 3 articles of 2,000 bytes each, and receives 6 articles of 2,000 bytes each. Similar to Remote Terminal/User Sessions (also see [Cah98] Exercise 4.9). Solution: Incoming traffic_total = number of users in the 5 non-host sites (6 2,000 8/3600 )bps = 600 12,000 8/3600 = 16,000 bps.
28 Outgoing traffic_total = number of users in the 5 non-host sites (3 2,000 8/3600)bps = 600 6,000 8/3600 = 8,000 bps. The amount of traffic can be divided in any fashion between the two gateways but it is reasonable to assume that it divides evenly. Thus, the external traffic is 1333 from each gateway to N1, N2, N3, and N5 and 2667 from each gateway to N6.
Web Activities Assumptions: 1.Internal and external Web activities: 25% of Web activities are internal within the seven sites and 75% are external. 2.Per hour, per person: generates 23 Uniform Resource Locator requests, each request generates 5 datagrams of 128 bytes in each direction, and an inbound transfer of a datagram of 3,500 bytes.
Internal Web Activities Per hour, per person: outbound_traffic = 25% 23 (5 128) (8/3600) bps = bps.= bps, and inbound_traffic = % 23 3,500 (8/3600) bps = bps. Parameter table: see table 4.9 in Cahn textbook.
31 Traffic matrices: Generate the initial outbound traffic matrix T and convert all the diagonal entries to zeros. See page 121 in Cahn textbook. Note that we can not perform row and column normalization on T since the condition is not satisfied. The inbound traffic matrix is given by where T t denotes the transpose of T. The traffic matrix for internal Web activities is
External Web Activities Assume that the two hosts N4 and N7 are gateways to internet. Also, assign the sites N1, N3, N5 to N4 and N2, N6 to N7. Per hour, per person: outbound_traffic = 75% 23 (5 128) (8/3600) bps = bps, and inbound_traffic = % 23 3,500 (8/3600) bps = bps. Parameter table: see table 4.9 in Cahn textbook. Traffic matrix: see page 122 in Cahn textbook.
Database Services See sections 4.7.9, , and in Cahn textbook The Complete Model for the Case Study The complete traffic matrix for the design problem in the case study is the sum of the five traffic matrices obtained above. It is important to keep individual tables after combining them in one table because their availability allows the designers to do a more refined sensitivity study corresponding to individual changes.