Sting: a TCP-based Network Measurement Tool Stefan Savage Jianxuan Xu
Measurement & Analysis The Internet is supremely hard to measure –VERY heterogeneous –VERY large –Heisenberg effects The Heisenberg effect describes a system in which the observation or measurement of an event changes the event. Still… lots of efforts to measure and understand traffic dynamics, routing, user characteristics, etc… Understanding wide-area network characteristics is critical for evaluating the performance of Internet applications.
Measurement & Analysis ICMP-based tools (e.g. ping,traceroute) --Can’t measure one-way loss Measurement infrastructures (e.g. NIMI) --Require cooperation from remote endpoints
Features Measures one-way packet loss rates TCP-based measurement traffic (not filtered) Only uses the TCP algorithm Target only needs to run a TCP service, such as a web server, Does not require remote cooperation
Basic approach Send selected TCP packets to remote host Analyze TCP behavior to deduce which packets were lost in each direction
Deducing losses in a TCP transfer What we know How many data packets we sent How many acknowledgements we received What we need to know How many data packets were received? Remote host’s TCP MUST know How many acknowledgements were sent? Easy, if one ACK is sent for each data packet (ACK parity)
How TCP reveals packet loss Data packets ordered by seq# ACK packets specify next seq# expected
Basic loss deduction algorithm Forward Loss Data Seeding: –Source sends in-sequence TCP data packets to target, each of which will be a loss sample Hole-filling: –Sends TCP data packet with sequence number one greater than the last seeding packet –If target ACKs this new packet, no loss –Else, each ACK indicates missing packets –Should be reliable, the retransmissions must be made in Hole-filling
Data Seeding phase for i = 1 to nfor each ack received send packet w/seq #i ackReceived++ dataSent++ wait for long time
Hole Filling Phase lastACK := 0for each ack received w/ack #j while lastAck = 0 lastAck = MAX(lastAck,j) send packet w/seq # n+1 while lastAck < n + 1 dataLlost++ retransPkt := lastAck while lastAck = retransPkt send packet w/seq# retransPkt dataReceveid := dataSent – dataLost ackSent := dataReceived
Example
Basic loss deduction algorithm Reverse Loss Data Seeding: –Skip first sequence number, ensuring out-of-sequence data (Fast Retransmit) –Receiver will immediately acknowledge each data packet received –Measure lost ACKs Hole-filling: –Transmit first sequence number –Continue as before
Guaranteeing ACK parity How do we know one ACK is sent for each data packet received? Exploit TCP’s fast retransmit algorithm TCP must send an immediate ACK for each out-of-order packet it receives Send all data packets out-of-order Skip first sequence number Don’t count first “hole” in hole filling phase
Sending Large Bursts Large packets can overflow receiver buffer Mitigate by overlapping sequence numbers
Delaying connection termination Some Web servers/firewalls terminate connections abruptly by sending RST Solutions: Format data packets as valid HTTP request Set advertised receiver window to 0 bytes
Sting implementation details Raw sockets to send TCP datagrams Packet filter (libpcap) to get responses Currently runs on Tru64 and FreeBSD
Last-generation user interface # sting –c 100 –f poisson –m –p 80 Source = Target = :80 dataSent = 100 dataReceived = 98 acksSent= 98 acksReceived = 97 Forward drop rate = Reverse drop rate =
Forward Loss Results
Reverse Loss Results
“ Popular ” Web Servers
Random Web Servers
Result Loss rates increase during business hours, and then decrease Forward and reverse loss rates vary independently On average, with popular web servers, the reverse loss rate is more than 10 times greater than the forward loss rate
Conclusions TCP protocol features can be leveraged for non- standard purposes Packet loss is highly asymmetric Ongoing work: Using TCP to estimate one-way queuing delays, bottleneck bandwidths, propagation delay and server load
Useful or Useless Purpose of the Network Measurement –Diagnose current problem –Design future service Real Time data needed for Network Control Data sample –Event driven –fixed Interval
Research Goal Implement new TCP congestion control algorithm with fuzzy logic control. Develop, test and debug it in Linux Performance Evaluation
Traditional protocol hacking Directly modify the kernel source Migrate protocol stack and related stuff to user space Simulate the algorithm with NS-2
Kernel Hacking Insert and modify the algorithm in kernel source directly Example –Vegas, Westwood+ and BIC implementation within Linux kernel before version
Kernel Hacking Pros –Welcome to the Real World –Less overhead Cons –Not easy to develop, trace, debug and maintenance –Incompatible with difference kernel version
User space migration Move all protocol stack and related stuff to user space Can gain the total control of variable status Example –Sting
User space migration Pros –High flexibility in protocol hacking –Can use general debug method tools, e.g. gdb Cons –A large and thorny project for migrating protocol stack to user space –Incompatible with difference kernel version –Large overhead
Simulation Algorithm is implemented base on a virtual testbed Virtual experiment can be held easily Usually use NS-2 as simulator E.g. Research of FAST TCP,HighSpeed TCP
Simulation Pros –Quick implementation of algorithm –Low cost in experiment –Easy in data statistic Cons –Result is too idealistic –Need further development for final product
Traditional methods are not suitable Source code modification and user space migration required a well understanding of kernel architecture NS-2 is not as realistic as testing on top of PlanetLab All of them are kernel version dependent
My new approach Combine the use of pluggable congestion control algorithm and Kernel Hacking Implementation of new control algorithm within a single kernel module
Pluggable congestion control module Starting from version , a new method of TCP congestion control hacking was published New algorithms can be written as modules file, insert to kernel during run time as like as general I/O drivers BIC,Cubic, HighSpeed, H-TCP, Hybla, Scalable, Vegas and Westwood+ are all implemented as module already
Pluggable congestion control module A congestion control mechanism can be registered through function in tcp_cong.c The functions used by the congestion control mechanism are registered via passing a tcp_congestion_ops struct to tcp_register_congestion_control. As a minimum name, ssthresh, cong_avoid and min_cwnd must be valid.
Pluggable congestion control module The method that is used to determine which congestion control mechanism is determined by the sysctl net.ipv4.tcp_congestion_contrl. The default congestion control will be the last one registered (LIFO) newReno will be built as build-in supporting and always available A particular default value can be set by using sysctl
Pluggable congestion control module tcp_congestion_ops sturct provide the below function entry points: –init –release –ssthresh –min_cwnd –cong_avoid –rtt_sample –set_state –cwnd_event –undo_cwnd –pkts_acked –get_info
Pluggable congestion control module All algorithm related code are packed within a single module file A standardized framework can be followed Codes required for implement an algorithm are greatly reduced. For example, newReno uses 77 lines where BIC uses 335 lines The module will be compatible unless the framework changes
Kernel Hacking Still Needed Raw, Accurate, Real time Data needed for control algorithm –Packet Loss Rate –Bandwidth Estimation –RTT –(tcp vegas----rtt,westwood—be….)
PLR Calculation in Linux Kernel tcp_input.c is the core of the implementation of the TCP protocol –handle incoming packets and acks, –identify duplicate acks and packet losses, –adjust congestion window accordingly
PLR Calculation in Linux Kernel Two types of events are incurred due to congestion: one is retransmission Timeout(rto), and the other is Packet-Loss. The Timeout event is checked by tcp_head_timedout(), The Packet-Loss event is checked by tcp_mark_head_lost function.
PLR Calculation in Linux Kernel the TCP's congestion avoidance (CA) phase is decomposed into five states (defined in the ca_state filed of the tcp_opt data structure). –TCP_CA_OPEN –TCP_CA_Disorder –TCP_CA_CWR –TCP_CA_Recovery –TCP_CA_Loss
PLR Calculation in Linux Kernel The process of the state machine is implemented in function tcp_fastretrans_alert():Processing dubious ack event
PLR Calculation in Linux Kernel ( tcp_update_scoreboard ) tcp_update_scoreboard –This function will mark all the packets which were not sacked (till the maximum seq number sacked) as lost packets. Also the packets which have waited for the acks to arrive for interval equivalent to retransmission time are marked as lost packets. The accounting for lost, sacked and left packets is also done in this function.
PLR Calculation in Linux Kernel left_out = sacked_out + lost_out; sacked_out : Packets, which arrived to receiver out of order and hence not ACKed. With SACKs this number is simply amount of SACKed data. Even without SACKs it is easy to give pretty reliable estimate of this number, counting duplicate ACKs. lost_out : Packets lost by network. TCP has no explicit "loss notification" feedback from network (for now).It means that this number can be only _guessed_. Actually, it is the heuristics to predict lossage that distinguishes different algorithms.