Download presentation
Presentation is loading. Please wait.
Published byGwenda Matthews Modified over 9 years ago
1
1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney, Chin Guok, Eric Pouyoul UVA Students: Zhenzhen Yan, Tian Jin, Zhengyang Liu, Hanke (Casey) Meng, Ranjana Addanki, Haoyu Chen, and Sam Elliott M. Veeraraghavan University of Virginia (UVA) mvee@virginia.edu Feb. 24, 2014 Chris Tracy ESnet ctracy@es.net
2
Outline Three main contributions –HNTES –AFCS –QoS provisioning Goal: Operationalize AFCS on ESnet5 Future work: feedback? 2 HNTES: Hybrid Network Traffic Engineering System AFCS: Alpha Flow Characterization System or EFCS
3
Contributions HNTES: Tested the hypothesis that if IP address prefixes extracted from offline analysis of completed alpha flows are used to redirect future alpha flows to traffic-engineered MPLS LSPs, the solution will be effective AFCS: Characterize alpha flows (size, duration) QoS provisioning: Requested support for rate- unspecified circuits: policing can throttle throughput –Two new classes added in new ESnet QoS document Best-Effort Circuit Class (different from Best-Effort Class) Assured Forwarding Class 3
4
Publications Published Z. Yan, M. Veeraraghavan, C. Tracy, C. Guok, “On how to provision Quality of Service (QoS) for large dataset transfers,” CTRQ 2013, Best Paper Award T. Jin, C. Tracy, M. Veeraraghavan, Z. Yan, “Traffic Engineering of High-Rate Large-Sized Flows,” IEEE HPSR 2013 Z. Liu, M. Veeraraghavan, Z. Yan, C. Tracy, J. Tie, I. Foster, J. Dennis, J. Hick, Y. Li and W. Yang, “On using virtual circuits for GridFTP transfers,” IEEE SC2012, Nov. 10-16, 2012 Z. Yan, C. Tracy, M. Veeraraghavan, “A hybrid network traffic engineering system,” IEEE HPSR 2012, June 24-27, 2012 Submitted Two journal papers and one conference paper 4
5
HNTES vs. AFCS Goal of HNTES was to identify IP addresses of data transfer nodes that were sourcing/sinking alpha flows –Analyzes only single NetFlow records (one generated per minute per flow) Goal of AFCS: characterize the size, rate and duration of alpha flows –Requires concatenation of multiple NetFlow records to characterize individual flows –Not aggregation as done by commercial tools 5
6
AFCS AFCS work is newer: current focus Easier to operationalize than HNTES –HNTES requires additional step to redirect flows to AF class through firewall filter config. –Needs new work for ALUs – previous QoS experiments on Junipers Goal: Characterize alpha flows –Determine size (bytes), duration, rate 6
7
AFCS Algorithm Find NetFlow records for all gamma flows –gamma flow is defined to be a flow that has at least one “Large” NetFlow record –Large NetFlow record: size > threshold (1 GB) –Maximum duration of a NetFlow record is 1 min because of “active timeout interval” value configured in ESnet routers Start concatenation procedure to reconstruct “flows” out of “records” Use size/rate thresholds to find alpha flows –e.g., 10 GB and 200 Mbps 7
8
Step 1: Finding NetFlow records of gamma flows Find all Large Netflow records Extract five-tuple IDs of these Large records –srcIP, dstIP, srcport, dstport, protocol Find all Small NetFlow records corresponding to those five-tuple IDs 8
9
Step 2: Concatenation procedure (using example) All records (reports) observed on same day Time gap between last-pkt TS of one record and first-pkt TS of next record < 1 min for grouping 9 difference: 889.798 sec difference: 180 ms One flowdifference: 40665 sec
10
Step 3: find alpha flows Total size of each gamma flow –Sum of sizes of concatenated NetFlow records and multiply by 1000 –Packet sampling rate: 1-in-1000 Total duration of each gamma flow –Last packet timestamp of last NetFlow record minus first packet TS of first NetFlow group in group Rate: size/duration Alpha flows: gamma flows whose size and rate exceed preset thresholds 10
11
Validated algorithm Because of NetFlow packet sampling rate, we needed to validate our size/duration computation algorithm Found GridFTP logs from NERSC data transfer node Found corresponding NetFlow records from ESnet router Found additional NetFlow records with same flow IDs Applied algorithm to find size/duration of flows from NetFlow records Recreated “sessions” from GridFTP transfer logs (-fast option: multiple files transferred on one TCP connection); found session size and compared with flow size determined from NetFlow records Accuracy close to 100% but decreases with size Size accuracy ratio > 100% for smaller sizes 11
12
NetFlow observation points (OP) (data obtained from ESnet4: May-Nov. 2011) 12 router-1, router-2: BNL and NERSC PE router-3: sunn-cr1 (REN peerings) router-4: eqx-sj (commercial peerings)
13
Characterization of flows (May-Nov. 2011 data) Provider edge routers (downloads) Core routers (uploads to DOE labs) router-1router-2router-3 REN peerings router-4 Commercial peerings bnlnerscsunn-cr1eqx-sj # flows28685279632516212 # unique flow src-dst pairs 14791611193158 max size ( flow)633.3GB811.6GB233.6GB112.8GB max rate ( flow)5.1Gpbs5.7Gbps0.97Gbps0.78Gbps longest flow9hr8.8hr3.87hr2.77hr
14
Provider edge routers (downloads) Core routers (uploads to DOE labs) router-1router-2router-3 REN peerings router-4 Commercial peerings Min1001 10051010 1 st Qu.1149154040501203 Median1275286943601532 Mean25139046175403612 3rd Qu.17018768213803772 Max633300811600233600112800 IQR5527227173302569 CV5.202.561.42.43 skewness25.3512.562.3710.09 Size (MB) of flows (May-Nov. 2011 data) 14 112 GB 811 GB
15
Duration (s) of flows (May-Nov. 2011 data) 15 Provider edge routers (downloads) Core routers (uploads to DOE labs) router-1router-2router-3 REN peerings router-4 Commercial peerings Min4.2128.0449.5512.03 1 st Qu.41.8560.94190.954.97 Median54.17121.127294.28 Mean122.8414.21098235.6 3rd Qu.73.58398.91169227.6 Max3246031910139409978 IQR31.73338.01977.94172.67 CV7.3922.341.503.18 skewness23.76710.332.3210.99 mean is above the median under right (positive) skew 2.8 hours 9 hours
16
Rate (Mbps) of flows (May-Nov. 2011 data) 16 Provider edge routers (downloads) Core routers (uploads to DOE labs) router-1router-2router-3 REN peerings router-4 Commercial peerings Min11.73.634.649.2 1 st Qu.160.9147117.6130.9 Median199.3181.9132.6156.4 Mean245.2230.9159182.7 3rd Qu.258.9252.1159.2195.8 99%881944503649 Max51545757979776 CV0.710.720.560.61 skewness7.363.953.822.86
17
Characterization of flows (May-Nov. 2011 data) Results: # of flows over 214 days (sensitivity to size-rate threshold) sizerateRouter-1Router-2Router-3Router-4 10GB100Mbps52654607263 10GB150Mbps39941212971 10GB180Mbps37530371240 10GB200Mbps3572443920 50GB200Mbps19505280 80GB500Mbps02000
18
Persistency measure (May-Nov. 2011 data) CDF of number of and flows per src/dst pair (router-1 plot close to router-2 plot and hence omitted) flows: > 5 GB and 100 Mbps flows
19
Discussion Largest-sized flow rate:301 Mbps, fastest-flow size:7.14 GB, and longest-flow size: 370 GB At the low end, one 1.9 GB lasted 4181 sec High skewness in size for downloads Larger-sized flows for downloads than uploads, and more frequent Max number of and flows per src-dst pair were (2913, 1596) for router-2 (nersc) The amount of data analyzed is a small subset of our total dataset, both in time and number of routers analyzed. Concatenating flows is somewhat of an intensive task, so we tried to choose routers that would be representative. 19
20
Potential application Find src-dst pairs that are experiencing high variance in throughput to initiate diagnostics and improve user experience –In the 2913 -flow set between same src-dst pair, 75% of the flows experienced less than 161.2 Mbps while the highest rate experienced was 1.1 Gbps (size: 3.5 GB). –In the 1596 -flow set, 75% of the flows experienced less than 167 Mbps, while the highest rate experienced was 536 Mbps (size: 11 GB). 20
21
Other applications Identify suboptimal paths –science flows should typically enter ESNet via REN peerings, but some of the observed alpha flows at eqx-sj could have occurred because of BGP sub- optimal configurations –correlate AFCS findings with BGP data HNTES: traffic engineering alpha flows 21
22
Ongoing work ESnet4 upgrade to ESnet5 (2012) –Juniper to ALU routers –Netflow v5 to NetFlow v9 –Flow-tools to nfdump Rewrote AFCS code Running on an ESnet VM –CryptoPAN IP address anonymization Demo: D3.js GUI (preliminary) 22
23
Numbers for Oct. 1-Nov. 12, 2013 data from bnl-mr2 (24255 flows) 23 Size (MB)Duration (sec)Rate (Mbps) Min1000815 1 st Qu.199219.2227 Median230022.25651 Mean414780.181217 3rd Qu.483965938.8 90%64002592626 99%233963819249 99.9%40721190410129 Max3136003619010670 (size over estimate) IQR2847.2545.8711.8 CV1.445.151.48 skewness17.19612.87 Not the same flow Increased relative to 2011 data
24
Feedback? Goal: Integrate operational AFCS output with my.es.net Current plan –Run software every night, compute numbers for gamma flows observed that day –Pre-calculate last 24-hours, 7-days, last 30-days json files for quick visualization –Per-site alpha flows (configurable thresholds) –Store gamma-flow information in SQL database for easier querying of other types of requests 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.