Networks ∙ Services ∙ People www.geant.org Richard-Hughes Jones eduPERT Training Session, Porto A Hands-On Session udpmon for Network Troubleshooting 18/06/2015.

Slides:



Advertisements
Similar presentations
MB - NG MB-NG Technical Meeting 03 May 02 R. Hughes-Jones Manchester 1 Task2 Traffic Generation and Measurement Definitions Pass-1.
Advertisements

Pathload A measurement tool for end-to-end available bandwidth Manish Jain, Univ-Delaware Constantinos Dovrolis, Univ-Delaware Sigcomm 02.
CALICE, Mar 2007, R. Hughes-Jones Manchester 1 Protocols Working with 10 Gigabit Ethernet Richard Hughes-Jones The University of Manchester
CSE551: Computer Network Review r Network Layers r TCP/UDP r IP.
BZUPAGES.COM 1 User Datagram Protocol - UDP RFC 768, Protocol 17 Provides unreliable, connectionless on top of IP Minimal overhead, high performance –No.
QoS Solutions Confidential 2010 NetQuality Analyzer and QPerf.
JIVE VLBI Network Meeting 15 Jan 2003 R. Hughes-Jones Manchester The EVN-NREN Project Richard Hughes-Jones The University of Manchester.
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004 R. Hughes-Jones Manchester Networking for ATLAS Remote Farms Richard Hughes-Jones The University.
Transport Layer 3-1 Transport Layer r To learn about transport layer protocols in the Internet: m TCP: connection-oriented protocol m Reliability protocol.
CdL was here DataTAG/WP7 Amsterdam June 2002 R. Hughes-Jones Manchester 1 EU DataGrid - Network Monitoring Richard Hughes-Jones, University of Manchester.
DataGrid WP7 Meeting CERN April 2002 R. Hughes-Jones Manchester Some Measurements on the SuperJANET 4 Production Network (UK Work in progress)
DataTAG Meeting CERN 7-8 May 03 R. Hughes-Jones Manchester 1 High Throughput: Progress and Current Results Lots of people helped: MB-NG team at UCL MB-NG.
Introduction. 2 What Is SmartFlow? SmartFlow is the first application to test QoS and analyze the performance and behavior of the new breed of policy-based.
PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 1 UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet Adapter on: HP rx2600 Dual Itanium.
CdL was here DataTAG CERN Sep 2002 R. Hughes-Jones Manchester 1 European Topology: NRNs & Geant SuperJANET4 CERN UvA Manc SURFnet RAL.
WXES2106 Network Technology Semester /2005 Chapter 8 Intermediate TCP CCNA2: Module 10.
July 2000 PPNCG Meeting R. Hughes-Jones Performance Measurements of LANs MANs and SuperJANET III This is PRELIMINARY uBaseline data for Grid development.
Ch. 28 Q and A IS 333 Spring Q1 Q: What is network latency? 1.Changes in delay and duration of the changes 2.time required to transfer data across.
IP-UDP-RTP Computer Networking (In Chap 3, 4, 7) 건국대학교 인터넷미디어공학부 임 창 훈.
Chapter 4 Queuing, Datagrams, and Addressing
GGF4 Toronto Feb 2002 R. Hughes-Jones Manchester Initial Performance Measurements Gigabit Ethernet NICs 64 bit PCI Motherboards (Work in progress Mar 02)
Document Number ETH West Diamond Avenue - Third Floor, Gaithersburg, MD Phone: (301) Fax: (301)
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Network Layer ICMP and fragmentation.
Chapter 4. After completion of this chapter, you should be able to: Explain “what is the Internet? And how we connect to the Internet using an ISP. Explain.
3: Transport Layer3b-1 Principles of Congestion Control Congestion: r informally: “too many sources sending too much data too fast for network to handle”
Brierley 1 Module 4 Module 4 Introduction to LAN Switching.
These materials are licensed under the Creative Commons Attribution-Noncommercial 3.0 Unported license (
POSTECH DP&NM Lab. Internet Traffic Monitoring and Analysis: Methods and Applications (1) 2. Network Monitoring Metrics.
POSTECH DP&NM Lab. Internet Traffic Monitoring and Analysis: Methods and Applications (1) 4. Active Monitoring Techniques.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 2 Module 9 Basic Router Troubleshooting.
3: Transport Layer3b-1 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 r full duplex data: m bi-directional data flow in same connection m MSS: maximum.
TELE202 Lecture 5 Packet switching in WAN 1 Lecturer Dr Z. Huang Overview ¥Last Lectures »C programming »Source: ¥This Lecture »Packet switching in Wide.
1 Lecture 14 High-speed TCP connections Wraparound Keeping the pipeline full Estimating RTT Fairness of TCP congestion control Internet resource allocation.
Chapter 2 Applications and Layered Architectures Sockets.
Connect. Communicate. Collaborate 4 Gigabit Onsala - Jodrell Lightpath for e-VLBI The iNetTest Unit Development of Real Time eVLBI at Jodrell Bank Observatory.
ESLEA Bits&Bytes, Manchester, 7-8 Dec 2006, R. Hughes-Jones Manchester 1 Protocols DCCP and dccpmon. Richard Hughes-Jones The University of Manchester.
CAIDA Bandwidth Estimation Meeting San Diego June 2002 R. Hughes-Jones Manchester UDPmon and TCPstream Tools to understand Network Performance Richard.
PFLDNet Workshop February 2003 R. Hughes-Jones Manchester Some Performance Measurements Gigabit Ethernet NICs & Server Quality Motherboards Richard Hughes-Jones.
Service Level Monitoring. Measuring Network Delay, Jitter, and Packet-loss  Multi-media applications are sensitive to transmission characteristics of.
DataGrid WP7 Meeting Amsterdam Nov 01 R. Hughes-Jones Manchester 1 UDPmon Measuring Throughput with UDP  Send a burst of UDP frames spaced at regular.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Queuing Delay 1. Access Delay Some protocols require a sender to “gain access” to the channel –The channel is shared and some time is used trying to determine.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
LECTURE 12 NET301 11/19/2015Lect NETWORK PERFORMANCE measures of service quality of a telecommunications product as seen by the customer Can.
CCNA3 Module 4 Brierley Module 4. CCNA3 Module 4 Brierley Topics LAN congestion and its effect on network performance Advantages of LAN segmentation in.
Data Communication Networks Lec 13 and 14. Network Core- Packet Switching.
Measuring packet forwarding behavior in a production network Lars Landmark.
Connect communicate collaborate Performance Metrics & Basic Tools Robert Stoy, DFN EGI TF, Madrid September 2013.
1 TCP ProtocolsLayer name DNSApplication TCP, UDPTransport IPInternet (Network ) WiFi, Ethernet Link (Physical)
2003 컴퓨터 통신 1 Foundation # 컴퓨터 통신 2 Outline A Detailed FTP Example Layering Packet Switching and Circuit Switching Some terms –Data rate, “Bandwidth”
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 OSI transport layer CCNA Exploration Semester 1 – Chapter 4.
Connect. Communicate. Collaborate 4 Gigabit Onsala - Jodrell Lightpath for e-VLBI Richard Hughes-Jones.
DataGrid WP7 Meeting Jan 2002 R. Hughes-Jones Manchester Initial Performance Measurements Gigabit Ethernet NICs 64 bit PCI Motherboards (Work in progress)
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 Muhammad Waseem Iqbal Lecture # 20 Data Communication.
Youngstown State University Cisco Regional Academy
Part1: Ipconfig ping command Tracert command Getmac command
Transport Layer Unit 5.
Network Core and QoS.
Data Communication Networks
Net301 LECTURE 10 11/19/2015 Lect
A tool for locating QoS failures on an Internet path
MB – NG SuperJANET4 Development Network
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
Requirements Definition
Routing and the Network Layer (ref: Interconnections by Perlman
Transport Layer: Congestion Control
TCP flow and congestion control
Network Core and QoS.
Presentation transcript:

Networks ∙ Services ∙ People Richard-Hughes Jones eduPERT Training Session, Porto A Hands-On Session udpmon for Network Troubleshooting 18/06/2015 Senior Network Advisor, Office of the CTO GÉANT Association - Cambridge

Networks ∙ Services ∙ People What is udpmon? Software package for investigating end host and network performance, using UDP/IP frames. Programs work in client-server pairs to: Transmit streams of sequenced UDP packets at regular, carefully controlled intervals. Can vary frame size and frame transmit spacing. Receive and check the sequence & timing of the packets. Identify if packets lost in the end host or network. Allows measurement of: Request-response latency. Achievable UDP bandwidth, packet loss, packet ordering, jitter. Packet dynamics & packet loss patterns. Quality of the connection path and its stability.

Networks ∙ Services ∙ People The client-server pairs udpmon_bw_mon  udpmon_resp Achievable UDP bandwidth, packet loss, packet ordering, jitter Packet dynamics & packet loss patterns udpmon_req  udpmon_resp Request-response latency udpmon_send  udpmon_recv Quality of the connection path and its stability Time series of achievable UDP bandwidth, packet loss

Networks ∙ Services ∙ People Round trip times measured using Request-Response UDP frames Latency as a function of frame size Slope is given by: Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) Intercept indicates: processing times + HW latencies Histograms of ‘singleton’ latency measurements Tells us about: Behavior of the IP stack The way the HW operates Interrupt coalescence Performance of the LAN / MAN / WAN Latency Measurements Respond Request ●●●Time Latency udpmon_req udpmon_resp Response

Networks ∙ Services ∙ People Achievable UDP Throughput Measurements Send a controlled stream of UDP frames spaced at regular intervals with 64 bit sequence numbers & send time stamp. Record the packet receive time. n bytes Number of packets Wait time time  Zero stats set concurrent lockout OK done ●●● Get remote statistics Send statistics back: No. received No. lost + loss pattern No. out-of-order No. lost in network CPU load No. interrupts & SNMP Tx, Rx times & 1-way delay Send data frames at regular intervals ●●● Time to send Time to receive Inter-packet time (Histogram) Signal end of test OK done Time Sender Receiver

Networks ∙ Services ∙ People What udpmon records Packets Num received Num lost: in network, in total Also: loss pattern Num arrived out of order Timestamps when packet sent & received Packet jitter inter-packet arrival times Relative 1-way delay CPU load on end hosts Bytes Received and Bytes/frame rate Elapsed time (microseconds) Receiver data rate and wire rate (Mbit/s)

Networks ∙ Services ∙ People udpmon in Burst Mode Send a set of regularly spaced UDP frames Wait for a specified period – the gap Emulates TCP slow start Useful to investigate Bandwidth impedance miss-matches Buffering issues n bytes no. packets wait time  gap time

Networks ∙ Services ∙ People Time-Series Measurements Useful for stability tests and checking for intermittent faults. Send a steady stream of regularly spaced UDP frames for a given (long) period. udpmon_bw_mon  udpmon_resp Packet Dynamics Record packet statistics & for each packet the send and receive time stamps. Plot: Lost packets as function of packet number / time Inter-packet transmit times as function of packet number / time Inter-packet arrival times as function of packet number / time Packet Loss Patterns Record the lost packets – info from last valid received packet for each “lost packet” udpmon_send  udpmon_recv Network Stability Send a UDP flow for several days. At the receiver take a snapshot of the packet statistics every few sec (e.g. 10 s) record the incremental statistics for that period with the time of that period. Plot packet loss as a function of the elapsed time during the measurement.

Networks ∙ Services ∙ People Start the receiver - and bind to a specific port On the receiver side start udpmon_resp The –S option sets the receiver and sender buffer to Bytes [sbin]$./udpmon_resp –S [sbin]$ By default udpmon uses port It is possible to change the port with option –u (Must use the same port for udpmon sending ) [sbin]$./udpmon_resp –S –u5001 [sbin]$

Networks ∙ Services ∙ People Send a train of equally spaced packets On the sender side, we can send a train of 100 packets with 50 µs spacing name or IP address $ udpmon_bw_mon -d -w50 -l100 pkt len; num_sent; inter-pkt_time us; send_user_data_rate Mbit; num_recv; num_lost; num_badorder; %lost; num_lost_innet; %lost_innet; recv_user_data_rate Mbit; recv_wire_rate Mbit; 64; 100; 50; ; 100; 0; 0; 0; 0; 0; ; udpmon sends these values back from udpmon_resp

Networks ∙ Services ∙ People Histograms: Packet jitter – inter-packet arrival times Sending a train of 100 packets with 50 µs spacing and creating a histogram $ udpmon_bw_mon -d -w50 –H –B10 -l100 64; 100; 50; ; 100; 0; 0; 0; 0; 0; ; ; Hist 0 Time between frames us counts 99 mean underflows 0 overflows ; 4 40 ; ; ; 3... A simple signature of counts in 0-4 μs bin indicates interrupt coalescence in use at the receiving host.

Networks ∙ Services ∙ People Making sets of throughput measurements Make a set of measurements with the wait time - w incremented by - i (increment) until - e (end) $ udpmon_bw_mon -d -w0 -i1 –e7 -l100 pkt len; num_sent; inter-pkt_time us; send_user_data_rate Mbit; num_recv; num_lost; num_badorder; %lost; num_lost_innet; %lost_innet; recv_user_data_rate Mbit; recv_wire_rate Mbit; 64; 100; 0; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 1; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 2; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 3; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 4; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 5; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 6; ; 100; 0; 0; 0; 0; 0; ; ; 64; 100; 7; ; 100; 0; 0; 0; 0; 0; ; ;

Networks ∙ Services ∙ People Changing the packet size Make a set of measurements with the wait time - w incremented by - i (increment) until - e (end) The option –p allows to set the packet size. -p is the size of the user data: for 1500 Byte MTU max is 1472 Bytes. for 9000 Byte MTU max is 8972 Bytes. $ udpmon_bw_mon -d -p w0 -i1 –e7 -l100 pkt len; num_sent; inter-pkt_time us; send_user_data_rate Mbit; num_recv; num_lost; num_badorder; %lost; num_lost_innet; %lost_innet; recv_user_data_rate Mbit; recv_wire_rate Mbit; 1472; 100; 0; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 1; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 2; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 3; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 4; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 5; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 6; ; 100; 0; 0; 0; 0; 0; ; ; 1472; 100; 7; ; 100; 0; 0; 0; 0; 0; ; ;

Networks ∙ Services ∙ People Using the option –L for packet loss report Using the option -L the program will print a detailed report for each of the first (10) LOST packets $ udpmon_bw_mon -d -w50 -l100 -L10 pkt len; num_sent; inter-pkt_time us; send_user_data_rate Mbit; num_recv; num_lost; num_badorder; %lost; num_lost_innet; %lost_innet; recv_user_data_rate Mbit; recv_wire_rate Mbit; 64; 100; 50; ; 82; 18; 0; 18; 18; 18; ; ; lost event; recv_time 0.1us; send_time 0.1us; diff 0.1us; one_way time us; lost packet num; ;delta recv_time us; delta send_time us; num packets between losses; 1; ; ; 59422; e+07; 3; ; e+07; e+07; 3 2; ; ; 59993; e+07; 7; ; 260.6; 203.5; 4 3; ; ; 59125; e+07; 9; ; 13.3; 100.1; 2 4; ; ; 59170; e+07; 14; ; 259.4; 254.9; 5 5; ; ; 59170; e+07; 15; ; 0; 0; 1 6; ; ; 60323; e+07; 17; ; 267.9; 152.6; 2 7; ; ; 59323; e+07; 34; ; 782.1; 882.1; 17 8; ; ; 60183; e+07; 37; ; 239.6; 153.6; 3 9; ; ; 59473; e+07; 48; ; 510.3; 581.3; 11 10; ; ; 58677; e+07; 50; ; 22.3; 101.9; 2

Networks ∙ Services ∙ People General approach for testing (1) ping traceroute both directions to check the path udpmon to check the connection: Then run a udpmon bandwidth and packet loss test Receiving host $ udpmon_resp –S Sending host $ udpmon_bw_mon -d -p 1472 –w 123 -l1000 Receiving host $ udpmon_resp –S Sending host $./cmd_throughput_lite.pl -d -o sto-man -l 10000

Networks ∙ Services ∙ People General approach for testing (2) If it fails... Try to identify which direction fails to pass UDP packets Check the firewalls in the host – need an iptables term like Call your NOC for help with router ACLs Receiving host $ udpmon_recv –S Sending host $ udpmon_send -d -p 1472 –w 123 -l1000 # udpmon -A INPUT -p udp -m udp --dport j ACCEPT

Networks ∙ Services ∙ People Some examples of looking at udpmon data

Networks ∙ Services ∙ People UDP achievable throughput graph Ideal shape Flat portions Limited by capacity of link Available BW on a loaded link Cannot send packets back-2-back End host: NIC setup time on PCI / context switches Shape follows 1/t Packet spacing most important.

Networks ∙ Services ∙ People Packet jitter plots Histograms of inter-packet arrival times for equally spaced packets (1472 Bytes packets, with 50µs spacing in this case) This is a really good jitter plot, really narrow and no side bands

Networks ∙ Services ∙ People Using packet jitter to discover queuing Histograms of inter-packet arrival times for equally spaced packets Indicates how queuing along the path shows on a jitter plot: Side bands Multiple peaks This is a typical shape of a busy link with cross traffic May not be any packet loss

Networks ∙ Services ∙ People One-way delay on a link with queuing and losses Packet loss signature Queuing signature

Networks ∙ Services ∙ People Looking for Lost Packet Distributions Three trials at about 600 Mbit/s Plot shows packets lost in long bursts at different times into the test../udpmon_bw_mon -w20 -L500 -x –d -p l

Networks ∙ Services ∙ People Network Stability: Lost Packet Events 1 Mbit/s flow for 24 Hr period Plot shows packet loss events as function of time Histogram of the number of loss events per 2 hour period../udpmon_recv./udpmon_recv -S T10 > udp_tseries_HK-netmon.txt &./udpmon_send –d -d p1472 -w t &

Networks ∙ Services ∙ People London-Wellington: Throughput Most packets lost at receiver Total packet loss: Network+Receiver Some packets lost in the network Packets lost in the end-host

Networks ∙ Services ∙ People London-Wellington: Jitter (1-way delay variation) distribution Distribution of inter-packet arrival time for equally spaced packets (Packets send at100 µs-top and 200 µs-bottom) The narrower the peak, the smaller the queues from Source to Destination FWHM ~50 µs for 1472 Bytes and up to ~70 µs for 100 Byte packets.

Networks ∙ Services ∙ People Network limits the bandwidth EXPReS 4 Gigabit GÉANT Plus circuit Stockholm to London PoP January 2008 Alcatel Metro Core Connect MCC Flow control OFF rx-usecs=25 so Interrupt Coalescence ON MTU 9000 bytes Max throughput 4.05 Gbit/s Packet loss as expected Falls to zero at 4.05 Gbit/s BBC to NTT

Networks ∙ Services ∙ People End2end packets from udpmon Only 700 Mbit/s throughput Lots of packet loss 1-way delay & Packet loss distribution shows throughput limited Network switch limits behaviour

Networks ∙ Services ∙ People Thank you Networks ∙ Services ∙ People © GEANT Limited on behalf of the GN4 Phase 1 project (GN4-1). The research leading to these results has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No (GN4-1). 28 Richard-Hughes Jones