Networks ∙ Services ∙ People Richard-Hughes Jones eduPERT Training Session, Porto A Hands-On Session udpmon for Network Troubleshooting 18/06/2015 Senior Network Advisor, Office of the CTO GÉANT Association - Cambridge
Networks ∙ Services ∙ People What is udpmon? Software package for investigating end host and network performance, using UDP/IP frames. Programs work in client-server pairs to: Transmit streams of sequenced UDP packets at regular, carefully controlled intervals. Can vary frame size and frame transmit spacing. Receive and check the sequence & timing of the packets. Identify if packets lost in the end host or network. Allows measurement of: Request-response latency. Achievable UDP bandwidth, packet loss, packet ordering, jitter. Packet dynamics & packet loss patterns. Quality of the connection path and its stability.
Networks ∙ Services ∙ People The client-server pairs udpmon_bw_mon udpmon_resp Achievable UDP bandwidth, packet loss, packet ordering, jitter Packet dynamics & packet loss patterns udpmon_req udpmon_resp Request-response latency udpmon_send udpmon_recv Quality of the connection path and its stability Time series of achievable UDP bandwidth, packet loss
Networks ∙ Services ∙ People Round trip times measured using Request-Response UDP frames Latency as a function of frame size Slope is given by: Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) Intercept indicates: processing times + HW latencies Histograms of ‘singleton’ latency measurements Tells us about: Behavior of the IP stack The way the HW operates Interrupt coalescence Performance of the LAN / MAN / WAN Latency Measurements Respond Request ●●●Time Latency udpmon_req udpmon_resp Response
Networks ∙ Services ∙ People Achievable UDP Throughput Measurements Send a controlled stream of UDP frames spaced at regular intervals with 64 bit sequence numbers & send time stamp. Record the packet receive time. n bytes Number of packets Wait time time Zero stats set concurrent lockout OK done ●●● Get remote statistics Send statistics back: No. received No. lost + loss pattern No. out-of-order No. lost in network CPU load No. interrupts & SNMP Tx, Rx times & 1-way delay Send data frames at regular intervals ●●● Time to send Time to receive Inter-packet time (Histogram) Signal end of test OK done Time Sender Receiver
Networks ∙ Services ∙ People What udpmon records Packets Num received Num lost: in network, in total Also: loss pattern Num arrived out of order Timestamps when packet sent & received Packet jitter inter-packet arrival times Relative 1-way delay CPU load on end hosts Bytes Received and Bytes/frame rate Elapsed time (microseconds) Receiver data rate and wire rate (Mbit/s)
Networks ∙ Services ∙ People udpmon in Burst Mode Send a set of regularly spaced UDP frames Wait for a specified period – the gap Emulates TCP slow start Useful to investigate Bandwidth impedance miss-matches Buffering issues n bytes no. packets wait time gap time
Networks ∙ Services ∙ People Time-Series Measurements Useful for stability tests and checking for intermittent faults. Send a steady stream of regularly spaced UDP frames for a given (long) period. udpmon_bw_mon udpmon_resp Packet Dynamics Record packet statistics & for each packet the send and receive time stamps. Plot: Lost packets as function of packet number / time Inter-packet transmit times as function of packet number / time Inter-packet arrival times as function of packet number / time Packet Loss Patterns Record the lost packets – info from last valid received packet for each “lost packet” udpmon_send udpmon_recv Network Stability Send a UDP flow for several days. At the receiver take a snapshot of the packet statistics every few sec (e.g. 10 s) record the incremental statistics for that period with the time of that period. Plot packet loss as a function of the elapsed time during the measurement.
Networks ∙ Services ∙ People Start the receiver - and bind to a specific port On the receiver side start udpmon_resp The –S option sets the receiver and sender buffer to Bytes [sbin]$./udpmon_resp –S [sbin]$ By default udpmon uses port It is possible to change the port with option –u (Must use the same port for udpmon sending ) [sbin]$./udpmon_resp –S –u5001 [sbin]$
Networks ∙ Services ∙ People Send a train of equally spaced packets On the sender side, we can send a train of 100 packets with 50 µs spacing name or IP address $ udpmon_bw_mon -d -w50 -l100 pkt len; num_sent; inter-pkt_time us; send_user_data_rate Mbit; num_recv; num_lost; num_badorder; %lost; num_lost_innet; %lost_innet; recv_user_data_rate Mbit; recv_wire_rate Mbit; 64; 100; 50; ; 100; 0; 0; 0; 0; 0; ; udpmon sends these values back from udpmon_resp
Networks ∙ Services ∙ People Histograms: Packet jitter – inter-packet arrival times Sending a train of 100 packets with 50 µs spacing and creating a histogram $ udpmon_bw_mon -d -w50 –H –B10 -l100 64; 100; 50; ; 100; 0; 0; 0; 0; 0; ; ; Hist 0 Time between frames us counts 99 mean underflows 0 overflows ; 4 40 ; ; ; 3... A simple signature of counts in 0-4 μs bin indicates interrupt coalescence in use at the receiving host.
Networks ∙ Services ∙ People Making sets of throughput measurements Make a set of measurements with the wait time - w incremented by - i (increment) until - e (end) $ udpmon_bw_mon -d -w0 -i1 –e7 -l100 pkt len; num_sent; inter-pkt_time us; send_user_data_rate Mbit; num_recv; num_lost; num_badorder; %lost; num_lost_innet; %lost_innet; recv_user_data_rate Mbit; recv_wire_rate Mbit; 64; 100; 0; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 1; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 2; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 3; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 4; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 5; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 6; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 7; ; 100; 0; 0; 0; 0; 0; ; ;
Networks ∙ Services ∙ People Changing the packet size Make a set of measurements with the wait time - w incremented by - i (increment) until - e (end) The option –p allows to set the packet size. -p is the size of the user data: for 1500 Byte MTU max is 1472 Bytes. for 9000 Byte MTU max is 8972 Bytes. $ udpmon_bw_mon -d -p w0 -i1 –e7 -l100 pkt len; num_sent; inter-pkt_time us; send_user_data_rate Mbit; num_recv; num_lost; num_badorder; %lost; num_lost_innet; %lost_innet; recv_user_data_rate Mbit; recv_wire_rate Mbit; 1472; 100; 0; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 1; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 2; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 3; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 4; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 5; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 6; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 7; ; 100; 0; 0; 0; 0; 0; ; ;
Networks ∙ Services ∙ People Using the option –L for packet loss report Using the option -L the program will print a detailed report for each of the first (10) LOST packets $ udpmon_bw_mon -d -w50 -l100 -L10 pkt len; num_sent; inter-pkt_time us; send_user_data_rate Mbit; num_recv; num_lost; num_badorder; %lost; num_lost_innet; %lost_innet; recv_user_data_rate Mbit; recv_wire_rate Mbit; 64; 100; 50; ; 82; 18; 0; 18; 18; 18; ; ; lost event; recv_time 0.1us; send_time 0.1us; diff 0.1us; one_way time us; lost packet num; ;delta recv_time us; delta send_time us; num packets between losses; 1; ; ; 59422; e+07; 3; ; e+07; e+07; 3 2; ; ; 59993; e+07; 7; ; 260.6; 203.5; 4 3; ; ; 59125; e+07; 9; ; 13.3; 100.1; 2 4; ; ; 59170; e+07; 14; ; 259.4; 254.9; 5 5; ; ; 59170; e+07; 15; ; 0; 0; 1 6; ; ; 60323; e+07; 17; ; 267.9; 152.6; 2 7; ; ; 59323; e+07; 34; ; 782.1; 882.1; 17 8; ; ; 60183; e+07; 37; ; 239.6; 153.6; 3 9; ; ; 59473; e+07; 48; ; 510.3; 581.3; 11 10; ; ; 58677; e+07; 50; ; 22.3; 101.9; 2
Networks ∙ Services ∙ People General approach for testing (1) ping traceroute both directions to check the path udpmon to check the connection: Then run a udpmon bandwidth and packet loss test Receiving host $ udpmon_resp –S Sending host $ udpmon_bw_mon -d -p 1472 –w 123 -l1000 Receiving host $ udpmon_resp –S Sending host $./cmd_throughput_lite.pl -d -o sto-man -l 10000
Networks ∙ Services ∙ People General approach for testing (2) If it fails... Try to identify which direction fails to pass UDP packets Check the firewalls in the host – need an iptables term like Call your NOC for help with router ACLs Receiving host $ udpmon_recv –S Sending host $ udpmon_send -d -p 1472 –w 123 -l1000 # udpmon -A INPUT -p udp -m udp --dport j ACCEPT
Networks ∙ Services ∙ People Some examples of looking at udpmon data
Networks ∙ Services ∙ People UDP achievable throughput graph Ideal shape Flat portions Limited by capacity of link Available BW on a loaded link Cannot send packets back-2-back End host: NIC setup time on PCI / context switches Shape follows 1/t Packet spacing most important.
Networks ∙ Services ∙ People Packet jitter plots Histograms of inter-packet arrival times for equally spaced packets (1472 Bytes packets, with 50µs spacing in this case) This is a really good jitter plot, really narrow and no side bands
Networks ∙ Services ∙ People Using packet jitter to discover queuing Histograms of inter-packet arrival times for equally spaced packets Indicates how queuing along the path shows on a jitter plot: Side bands Multiple peaks This is a typical shape of a busy link with cross traffic May not be any packet loss
Networks ∙ Services ∙ People One-way delay on a link with queuing and losses Packet loss signature Queuing signature
Networks ∙ Services ∙ People Looking for Lost Packet Distributions Three trials at about 600 Mbit/s Plot shows packets lost in long bursts at different times into the test../udpmon_bw_mon -w20 -L500 -x –d -p l
Networks ∙ Services ∙ People Network Stability: Lost Packet Events 1 Mbit/s flow for 24 Hr period Plot shows packet loss events as function of time Histogram of the number of loss events per 2 hour period../udpmon_recv./udpmon_recv -S T10 > udp_tseries_HK-netmon.txt &./udpmon_send –d -d p1472 -w t &
Networks ∙ Services ∙ People London-Wellington: Throughput Most packets lost at receiver Total packet loss: Network+Receiver Some packets lost in the network Packets lost in the end-host
Networks ∙ Services ∙ People London-Wellington: Jitter (1-way delay variation) distribution Distribution of inter-packet arrival time for equally spaced packets (Packets send at100 µs-top and 200 µs-bottom) The narrower the peak, the smaller the queues from Source to Destination FWHM ~50 µs for 1472 Bytes and up to ~70 µs for 100 Byte packets.
Networks ∙ Services ∙ People Network limits the bandwidth EXPReS 4 Gigabit GÉANT Plus circuit Stockholm to London PoP January 2008 Alcatel Metro Core Connect MCC Flow control OFF rx-usecs=25 so Interrupt Coalescence ON MTU 9000 bytes Max throughput 4.05 Gbit/s Packet loss as expected Falls to zero at 4.05 Gbit/s BBC to NTT
Networks ∙ Services ∙ People End2end packets from udpmon Only 700 Mbit/s throughput Lots of packet loss 1-way delay & Packet loss distribution shows throughput limited Network switch limits behaviour
Networks ∙ Services ∙ People Thank you Networks ∙ Services ∙ People © GEANT Limited on behalf of the GN4 Phase 1 project (GN4-1). The research leading to these results has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No (GN4-1). 28 Richard-Hughes Jones