Download presentation
Presentation is loading. Please wait.
1
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Infrastructure and Protocols for Dedicated Bandwidth Channels Nagi Rao Computer Science and Mathematics Division Oak Ridge National Laboratory raons@ornl.gov March 14, 2005 1 st Annul Workshop of Cyber Security and Information Infrastructure Research Group (CSIIR) and Information Operations Center (IOC) Oak Ridge, TN Research Sponsored by Department of Energy National Science Foundation Defense Advanced Research Agency
2
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Collaborators Steven Carter, Oak Ridge National Laboratory Leon O. Chua, University of California at Berkeley Jianbo Gao, University of Florida Qishi Wu, Oak Ridge National Laboratory William Wing, Oak Ridge National Laboratory Department of Energy High-Performance Networking Program National Science Foundation Advanced Network Infrastructure Program Defense Advanced Research Agency Network Modeling and Simulation Program Oak Ridge National Laboratory Laboratory Directed R&D Program Sponsors
3
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Outline of Presentation Network Infrastructure Projects DOE UltraScienceNet NSF CHEETAH Dynamics and Control of Transport Protocols TCP AIMD Dynamics Analytical Results Experimental Results New Class of Protocols Throughput Stabilization for Control Transport Protocol Probabilistic Quickest Path Problem Quickest path algorithm Probabilistic algorithm
4
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Outline of Presentation Network Infrastructure Projects DOE UltraScienceNet NSF CHEETAH Dynamics and Control of Transport Protocols TCP AIMD Dynamics Analytical Results Experimental Results New Class of Protocols Throughput Stabilization for Control Transport Protocol Probabilistic Quickest Path Problem Quickest path algorithm Probabilistic algorithm
5
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Science Objective: Understand supernova evolutions DOE SciDAC Project: ORNL and 8 universities Teams of field experts across the country collaborate on computations Experts in hydrodynamics, fusion energy, high energy physics Massive computational code Terabyte in generated in a day currently Archived at nearby HPSS Visualized locally on clusters – only archival data Desired network capabilities Archive and supply massive amounts of data to supercomputers and visualization engines Monitor, visualize, collaborate and steer computations Motivation for Networking Projects: Terascale Supernova Initiative (TSI) DOE large-scale science application Visualization channel Visualization control channel Steering channel
6
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY DOE UltraScience Net The Need DOE large-scale science applications on supercomputers and experimental facilities require high-performance networking Petabyte data sets, collaborative visualization and computational steering Application areas span the disciplinary spectrum: high energy physics, climate, astrophysics, fusion energy, genomics, and others Promising Solution High bandwidth and agile network capable of providing on-demand dedicated channels: multiple 10s Gbps to 150 Mbps Protocols are simpler for high throughput and control channels Challenges: Several technologies need to be (fully) developed User-/application-driven agile control plane: Dynamic scheduling and provisioning Security – encryption, authentication, authorization Protocols, middleware, and applications optimized for dedicated channels Contacts: Bill Wing (wrw@ornl.gov)wrw@ornl.gov Nagi Rao (raons@ornl.gov)raons@ornl.gov
7
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY DOE UltraScience Net Connects ORNL, Chicago, Seattle and Sunnyvale: Dynamically provisioned dedicated dual 10Gbps SONET links Proximity to several DOE locations: SNS, NLCF, FNL, ANL, NERSC Peering with ESnet, NSF CHEETAH and other networks Data Plane User Connections: Direct connections to: core switches –SONET channels MSPP – Ethernet channels Utilize UltraScience Net hosts Funded by U. S. DOE High-Performance Networking Program at Oak Ridge National Laboratory – $4.5M for 3 years
8
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Control-Plane Phase I Centralized VPN connectivity TL1-based communication with core switches and MSPPs User access via centralized web- based scheduler Phase II GMPLS direct enhancements and wrappers for TL1 User access via GMPLS and web to bandwidth scheduler Inter-domain GMPLS-based interface Allows users to logon to website Request dedicated circuits Based on cgi scripts Web-based User Interface and API Computes path with target bandwidth Is bandwidth available now? Extension of Dijkstra’s algorithm Provide all available slots Extension of closed semi ring structure to sequences of reals Both are polynomial-time algorithms GMPLS does not have this capability Bandwidth Scheduler
9
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Objective: Develop the infrastructure and networking technologies to support a broad class of eScience projects and specifically the Terascale Supernova Initiative. Main Technical Components: Optical network testbed Transport protocols Middleware and applications NSF CHEETAH: Circuit-switched High-speed End-to-End Transport ArcHitecture Collaborative Project: $3.5M for 3 years U. Virginia, ORNL, NC State, CUNY Sponsor: National Science Foundation Contacts: Malathi Veeraraghavan(mv@cs.virginia.edu) Nagi Rao (raons@ornl.gov)raons@ornl.gov
10
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY CHEETAH Project concept Network: Create a network that on-demand offers end-to-end dedicated bandwidth channels to applications Operate a PARALLE network to existing high-speed IP networks – NOT AN ALTERNATIVE! Transport protocols: Design to take advantage of dedicated and dual end-to-end paths IP path and dedicated channel eScience Application Requirements: High-throughput file/data transfers Interactive remote visualization Remote computational steering Multipoint collaborative computation
11
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY CHEETAH: Initial Configuration NC GbE/10GbE Ethernet Switch To hosts NCSU Control card OC192 card GbE/ 10GbE card GbE/10GbE Ethernet Switch To hosts ORNL MSPP OC192 card Control card GbE/ 10GbE card NCSU/MCNC/NLR MSPP OC192 card Control card Atlanta (NLR/SOX) MSPP OC192 card GbE/ 10GbE card GbE/10GbE Ethernet Switch To hosts To DC – Dragon Implements GMPLS protocols
12
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Peering Coast-to-coast dedicated channels Access to ORNL supercomputers and storage Applications: TSI on larger scale Peering: UltraScience Net - CHEETAH
13
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Outline of Presentation Network Infrastructure Projects DOE UltraScienceNet NSF CHEETAH Dynamics and Control of Transport Protocols TCP AIMD Dynamics Analytical Results Experimental Results New Class of Protocols Throughput Stabilization for Control Transport Protocol Probabilistic Quickest Path Problem Quickest path algorithm Probabilistic algorithm
14
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Transport Dynamics are Important Data Transport: High bandwidth for large data transfers over dedicated channels maintain suitable sending rate to achieve effective throughput Control of end devices: Remote control of visualizations, computations and instruments Jittery dynamics will destabilize the control loops Will not be able to effectively execute interactive simulations
15
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Study of Transport Dynamics Understanding of transport dynamics: Analytically showed that TCP-AIMD contains chaotic regimes concept of w-update map Internet traces are shown to be both chaotic and stochastic underlying process is anomalous diffusion. Development and tuning of protocols: Protocols for stable flows of fixed rate: ONTCOU Based on classical Robbins-Monro method Transport protocols with statistical stability: RUNAT Combination of AIAD and Kiefer-Wolfowitz method
16
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Simulation Results: TCP-AIMD exhibits “complicated” trajectories TCP streams competing with each other (Veres and Boda 2000) TCP competing with UDP (Rao and Chua 2002) Analytical Results (Rao and Chua 2002): TCP-AIMD has chaotic regimes Developed state space analysis and Poincare maps Internet Measurements (2004): TCP-AIMD traces are a complicated mixture of stochastic and chaotic components Complicated TCP AIMD Dynamics - History Working Definition of Chaotic Trajectories: Nearby starting points will result in trajectories that move far apart at a rate determined by Lyapunov (>0) exponent Trajectories are non-periodic for some starting points The attractor is geometrically complicated
17
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Simplified View: Dynamics of TCP Transport Control Protocol Outline Uses window mechanism to send W bytes/RTT Dynamically adjusts W to network and receiver state Keeps increasing if no loses Keeps shrinking if losses are detected Slow start phase: W increase exponentially until or loss Congestion Control: Additively increase W with delivered packets Multiplicatively decrease with loss Slow start:a Congestion control:1/w time Early loss slows throughput time
18
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Chaotic Dynamics of TCP Competing TCP streams: Window dynamics are chaotic Hard to predict – resemblance to random noise Hard to conclude from experiments – nearby orbits move faraway later Hard to characterize – chaotic attractor Poincare map of two window sizes Two-streams case Four streams case Veres and Boda (2000) did not rigorously establish chaos in a formal sense Attractor could have been generated by periodic orbit with large period We repeated the simulation and found only quasi periodic trajectories
19
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Noisy Nature of TCP (simulation) Simple random traffic generates complicated attractors TCP reacts to network traffic randomness Jittery end-to-end delays Do not need chaos to generate complicated attractors Poincare map of message delay vs. window size TCP source Router: uniform random drops destination
20
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY TCP Competing with UDP (ns-2 simulation) As CBR rate is varied TCP competing with UDP/CBR at the router generates a variety of dynamics TCP/Reno sink UDP/CBR Router Poincare phase plot: Window-size W(t) vs. pkt end-to-end delay D(t) 2Mb, 10ms,DT 1.7Mb, 10ms,DT D(t) W(t) time UDP/CBR=1Mbs
21
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY TCP Competing with UDP UDP/CBR: 0Mbs UDP/CBR: 0.5Mbs UDP/CBR: 1.7Mbs UDP/CBR: 1.0Mbs UDP/CBR: 1.7Mbs UDP/CBR:1.75Mbs
22
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Summary of Our Analytical Results State-Space of TCP: congestion window; packet delay including re-transmits; acknowledgements since last MD; losses inferred since last AI TCP-AIMD dynamics have two qualitatively different regimes Regime one: high-lighted in usual TCP literature increased with while Regime two: high-lighted by and decreases with Its effect and duration is enhanced by network delay and high buffer occupancy Trajectories move back and forth between these two regimes We define Poincare that updates : w-update map M M is 1-dimensional if Regime Two is short-lived M is 2-dimensional and complicated if Regime Two is significant M is qualitatively similar to tent map – generates chaotic trajectories
23
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Dynamics of Transitions Between Regimes map for long TCP transfers Regime 1 t t Regime 2 Both regimes are unstable – Eigenvalue analysis ww
24
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY M: w-update map Given value, gives its next updated values after some time period (not fixed) Regime 1: Regime 2: depends on the number of dropped packets - buffer occupancy at that time - delay between source and bottleneck buffer Result: M is parametrized, and each piece resembles twisted version of classical tent-map Rao, Gao and Chua, chapter in Complex Dynamics in Communications Networks, 2004
25
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Question 1: How relevant are previous simulation and analytical results on chaotic trajectories? Answer: Relevant from an analysis perspective to certain extent. Question2: Do actual Internet TCP measurement exhibit chaotic behavior? Answer: Yes. They are more complicated than chaotic (deterministic). Internet Measurements – Joint work with Jianbo Gao
26
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Internet (net100) traces show that TCP-AIMD dynamics are complicated mixture of chaotic and stochastic regimes: Chaotic – TCP-AIMD dynamics Stochastic – TCP response to network traffic Basic Point: TCP Traces collected on all Internet connections showed complicated dynamics classical “saw-tooth” profile is not seen even once This is not a criticism against TCP, it was not intended for smooth dynamics Internet Measurements
27
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Cwnd time series for ORNL-LSU connection Connection: OC192 to Atlanta-Sox; Internet2 to Houston; LAnet to LSU Time series: cwnd=x(t) Collected at 1ms (approx) resolutions collected using net100 instruments
28
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Time-Dependent Exponent Plots Lorenz – chaotic Common envelope Informally, a measure of how separated close-by states become in time: Exponential separation is characteristic of chaotic regime Form state vectors of size m from time series x(t ), sampled denoted by x(1), x(2), …. For a two state vectors satisfyingwe define time-dependent exponent as Uniform Random Spread out
29
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Internet cwnd measurements: Both Stochastic and Chaotic Parts are Dominant TCP traces have: Common envelope – chaotic Spread out – stochastic at certain scales Observations: From analysis, chaotic dynamics are from AIMD Stochastic component is in response to network traffic; losses and RTT variations Gao and Rao, IEEE Comm Letters, 2005,in press
30
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Observation 1: Avoid AIMD-like behavior to avoid chaotic dynamics Challenge: Randomness is inherent in Internet connections – will not go away even if protocol is non-chaotic. Our Solution: Explicitly account for randomness in the protocol design – stochastic approximation Design of Transport Protocols with Smooth Dynamics
31
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Throughput Stabilization Niche Application Requirement: Provide stable throughput at a target rate - typically much below peak bandwidth High-priority channels Commands for computational steering and visualization Control loops for remote instrumentation TCP AIMD is not suited for stable throughput Complicated dynamics Underflows with sustained traffic Important Consideration Stochasticity of Internet connections must be explicitly accounted for Rao, Wu and Iyengar, IEEE Comm Letters, 2004
32
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Stochastic Approximation: UDP window-based method Transport control loop Objective: adjust source rate to achieve (almost) fixed goodput at the destination application Difficulty: data packets and acks are subject to random processes Approach: Rely on statistical properties of data paths
33
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY UDP-Based Framework Send datagrams and wait for period Source Sending rate: Destination goodput: Loss rate Goodput regression: Loss regression:
34
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Channel Throughput profile Plot of receiving rate as a function of sending rate Its precise interpretation depends on: Sending and receiving mechanisms Definition of rates For protocol optimizations, it is important to use its own sending mechanism to generate the profile Window-based sending process for UDP datagrams: Send datagrams in a one step – window size Wait for time called idle-time or wait-time Sending rate at time resolution : This is an adhoc mechanism facilitated by 1GigE NIC
35
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Throughput Profile: Throughput and loss rates vs. sending rate (window size, cycle time) Objective: adjust source rate to yield the desired throughput at destination Typical day Christmas day Stabilization zone Peak zone
36
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Adaptation of source rate Sending process: send datagrams and wait for duration Adjust the window size Adjust cycle-time Both are special cases of classical Robbins- Monroe method target throughput noisy estimate
37
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Performance Guarantees Summary: Stabilization is achieved with a high probability with a very simple estimation of source rate Basic result: for the general update We have
38
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Internet Measurements ORNL-LSU connection (before recent upgrade) Hosts with 10 M NIC 2000 mile network distance ORNL-NYC – ESnet NYC-DC-Hou – Abilene HOU-LSU – Local n/s ORNL-GaTech Connection Hosts with GigE NICS ORNL-Juniper router – 1Gig link Juniper- ATL Sox – OC192 (1Gig link) Sox-GaTech – 1Gig link
39
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY ORNL-LSU Connection
40
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Goodput Stabilization: ORNL-LSU Experimental Results Case 1: Target goodput = 1.0 Mbps, rate control through congestion window, a = 0.8, Datagram acknowledging time ( ) vs. source rate (Mbps) & goodput (Mbps) Case 2. Target goodput = 2.0 Mbps, rate control through congestion window, a = 0.8,
41
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Throughput Stabilization: ORNL-GaTech T arget goodput = 20.0 Mbps, a = 0.8, adjust congestion window size Target goodput level = 2.0 Mbps, a = 0.8,, adjust sleep time
42
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RUNAT: Reliable UDP-based Network Adaptive Transport Transport protocol Maximize connection utilization: Track peak goodput Uses Keifer-Wolfowitz stochastic approximation to handle ACKs and losses Features: Tailored to random loss rate and RTT Segmented rate control 3 control zones: bottleneck link is underutilized, saturated, and overloaded Explicit accounting for random components Use stochastic approximation methods based on goodput estimates TCP-friendliness Rate-increasing and rate-decreasing coefficients are dynamically adjusted Adaptable to diverse network environments Measurements and control periods are not constant, but link-specific (use RTT). Wu and Rao, INFOCOM2005
43
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Three Zone of Goodput Profile Three control zones Zone I: Adaptive Increase Bottleneck link is underutilized Low packet loss due to occasional congestion or transmission errors Fixed with increasing source rate Zone II (transitional): dynamic KWSA method Bottleneck link is saturated Peak goodput falls within this zone SA determines whether to increase or decrease source rate Zone III: Adaptive Decrease Bottleneck link is overloaded Large packet loss due to network congestion Back off to recover from congestion collapse Goodput regression sending rate r Zone I ~zero loss Zone III high loss Zone II low loss Stabilize sending rate at
44
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Segmented Rate Control Algorithm when Loss rate estimate: Basic Idea: Control sending rate based on loss rate estimate to achieve peak goodput
45
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Convergence Properties of RUNAT Informal Statement: If in zones I or III, it will exit to zone II If in zone II, it will converge to maximum throughput Condition A1: loss statistics vary slowly Condition A2: loss regression is differentiable and its derivative is monotonically increasing with respect to r in Phase II. Result: RUNAT in zone I or III, enters II in a finite number of steps almost surely; In zone II, RUNAT will almost surely converge to the peak goodput
46
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Experimental Results on link between ozy4 (ORNL) and robot (LSU) - Illustration of microscopic RUNAT behaviors during transfer of 20MB data Zone III (loss rate: 37.33%) Slow Start Zone II (loss rate: 3.33%) When far away from the saturation (peak) point, is adjusted to large values to quickly move towards the peak point. When approaching the saturation (peak) point, is adjusted to small values to slowly converge to and remain at the peak point. The decrement of source rate upon packet loss is determined by congestion levels (local loss rate measurements) and : higher congestion levels result in larger rate drops. The increment of source rate is determined by congestion levels (local loss rate measurements) and. Zone I (loss rate: 0%)
47
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Experimental Results on link between ozy4 (ORNL) and robot (LSU) - RUNAT transport performance during transfer of 2GB data with concurrent TCP transfer of 50MB data RUNAT throughput: 10.49Mbps TCP throughput: 0.376Mbps Single TCP throughput: 0.377Mbps Note: The low throughputs were due to the high traffic volume at the time of experiments. In a normal day with regular traffic volume, TCP is able to achieve 3~6Mbps and RUNAT may reach 15~30Mbps at lower loss rates without significantly affecting concurrent TCP on this link. Case 1: run RUNAT & TCP concurrently Case 2: run a single TCP only
48
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Experimental Results on link from ozy4 (ORNL) to orbitty (NC State) Transport method Data sent (MBytes) Throughput (Mbps) Is TCP friendly? iperf (TCP)1008.7YES Iperf (UDP)100095.6 NO (no congestion control, no reliability) FTP (TCP)10018.6YES SABUL100080.0 Seems NOT Concurrent TCP: 18.6Mbps 10.1Mbps RUNAT150080.0 Statistically YES Concurrent TCP: 18.6Mbps 18.2Mbps
49
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY ORNL-Atlanta-ORNL 1Gbps Channel Dell Dual Xeon 3.2GHz Dual Opteron 2.2 GHz OC192 ORNL-ATL GigE Juniper M160 Router at ORNL Juniper M160 Router at Atlanta GigE blade SONET blade IP loop Host to Router Dedicated 1GigE NIC ORNL Router Filter-based forwarding to override both at input and middle queues and disable other traffic to GigE interfaces IP packets on both GigE interfaces are forwarded to out- going SONET port Atlanta-SOX router Default IP loopback Only 1Gbps on OC192 link is used for production traffic – 9Gbps spare capacity
50
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1Gbps Dedicated IP Channel Non-Uniform Physical Channel: GigE – SONET – GigE ~500 network miles End-to-End IP Path Both GigE links are dedicated to the channel Other host traffic is handled through second NIC Routers, OC192 and hosts are lightly loaded IP-based Applications and Protocols are readily executed Dell Dual Xeon 3.2GHz Dual Opteron 2.2 GHz OC192 ORNL-ATL GigE Juniper M160 Router at ORNL Juniper M160 Router at Atlanta GigE blade SONET blade IP loopback
51
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Dedicated Hosts Hosts: Linux 2.4 kernel (Redhat, Suse) Two NICS: optical connection to Juniper M160 router copper connection Ethernet switch/router Disks: RAID 0 dual disks (140GB SCSI) XFS file system Peak disk data rate is ~1.2Gbps (IO Zone measurements) Disk is not a bottleneck for 1Gbps data rates
52
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY UDP goodput and loss profile Point in horizontal plane: Gooput plateau ~990Mbps Non-zero and random loss rate High gooput is received at non-trivial loss
53
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1GigE NICS Act as Rate Controllers GigE NIC Application Buffer Kernel buffer Juniper M160 Host Rate Limited 1Gbps Data rates could exceed 1Gbps Rate Limited 1Gbps Our window-based method: Flow rate from application to NIC is ON/OFF and exceeds 1Gbps at times Flow is regulated to 1Gps: NIC rate matches the link rates This method does not work well if NIC rate is higher than link rate or router port rate: - NIC may send at higher rate causing losses at router port
54
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Best Performance of Existing Protocols Disk-to-Disk Transfers (unet2 to unet1) Memory-to-Memory Transfers UDT: 958Mbps Both Iperf and throughput profiles indicated 990 Mbps levels Potentially such rates are achievable if disk access and protocol parameters are tuned Protocolgoodput tsunami919 Mbps UDT890 Mbps FOBS708 Mbps
55
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Hurricane Protocol Composed based on principles and experiences with UDT and SABUL was not easy for us to figure out all tweaks for pushing peak performance UDP window-base flow-control Nothing fundamentally new but needed for fine tuning 990 Mbps on dedicated 1Gbps connection disk-to-disk No attempt for congestion control
56
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Hurricane Control Structure Sender receiver Receiver buffer Reordering datagrams Group k NACKs Reload lost datagrams TCP disk Send datagrams Different subtasks are handled by threads, which are woken up on demand Thread invocations are reduced by clustered NCKs instead of individual ACKS
57
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Hurricane
58
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Adhoc Optimizations M anual tuning of parameters Wait-time parameter: Initial value chosen from throughput profile Empirically, goodput is “unimodel” in : pairwise measurements for binary search Group size for k for NACKs empirically, goodput is unimodel in k and is tuned Disk-specific details Reads done in batch – no input buffer NAKs are handled using fseek – attached to the next batch This tuning is not likely to be transferable to other configurations and different host loads More work needed: automatic tuning and systematic analysis
59
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Outline of Presentation Network Infrastructure Projects DOE UltraScienceNet NSF CHEETAH Dynamics and Control of Transport Protocols TCP AIMD Dynamics Analytical Results Experimental Results New Class of Protocols Throughput Stabilization for Control Transport Protocol Probabilistic Quickest Path Problem Quickest path algorithm Probabilistic algorithm
60
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Shortest Path Problem Classical Problem: Given a graph along with “distance” function on edges For path we define the path distance delay: for Compute a path with smallest path distance from source node to destination node Solved using Dijkstra’s Algorithm with complexity
61
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Quickest Path Problem Problem: Given a graph along with 1. “delay” function on edges 2. “bandwidth” function on edges For path we define the total delay: for Compute a path with smallest total delay from source node to destination node Solved using Chen and Chin’s Algorithm with complexity Important Observation: Subpath of a quickest path is not necessarily quickest sd 5,20 15,5 15,20 T(60)=32 T(60)=29 T(60)=52 T(60)=57
62
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Quickest Path Algorithm – Chin and Chen Let denote distinct bandwidths Let subnetwork - edges with bandwidth smaller than b are removed path with least delay in : Quickest path is given by Typically implemented using m invocations of Dijkstra’ algorithm m could be quite large
63
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Simple Probabilistic Quickest Path Algorithm Randomly choose a fraction of `s and compute only on For larger networks we only needed less than 10% shortest delay computations Question: Is there a fundamental reason for this?
64
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Analysis Critical Observation: For delay function is non-decreasing Its Vapnik and Chervonenkis dimension is 1 Makes it efficient to approximate it by random sampling Rao 2004, Theoretical Computer Science Optimal delay Approximation based on p shortest path computations Linear Approximation with p points
65
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Conclusions TCP-AIMD Dynamics: Analytically established chaotic dynamics Analyzed Internet traces: combination of chaotic and stochastic dynamics New Classes of Protocols ONTCOU: achieve stable target flow level RUNAT: statistical approach to congestion control Based on Stochastic Approximation: convergence proof under general conditions Experimental results are promising both on Internet and dedicated connections
66
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.