Linux Traffic Control Linux Traffic Control Essentials TCNG Overview Study of a Token Bucket Scenario Papadimitriou Panagiotis 17/06/2004
Components of Linux Traffic Control The basic components of the Linux QoS architecture are: Queuing Disciplines Queuing Disciplines Classes Classes Filters Filters
Queuing Disciplines Queuing Disciplines (qdiscs) have: an enqueue function, called whenever the network layer of the operating system wants to transmit a packet, and an enqueue function, called whenever the network layer of the operating system wants to transmit a packet, and a dequeue function, called when the device is able to transmit the next packet a dequeue function, called when the device is able to transmit the next packet The available qdiscs can be divided into two groups: The simple qdiscs which have no inner structure, known as queues. These can be used to shape traffic for an entire interface, without any subdivisions. The simple qdiscs which have no inner structure, known as queues. These can be used to shape traffic for an entire interface, without any subdivisions. The qdiscs which have classes, known as schedulers. These are very useful when there are different kinds of traffic which should have differing treatment. The qdiscs which have classes, known as schedulers. These are very useful when there are different kinds of traffic which should have differing treatment.
Path of a Data Socket through the Linux Network Stack
Queues The first group of queuing disciplines includes: pfifo_fast: a 3-band priority FIFO queue (default) pfifo_fast: a 3-band priority FIFO queue (default) sfq: a stochastic fair queuing discipline sfq: a stochastic fair queuing discipline tbf: a Token Bucket Filter queue tbf: a Token Bucket Filter queue red: implements the Random Early Detection Behavior (RED) red: implements the Random Early Detection Behavior (RED) gred: a generalized RED implementation used for DiffServ gred: a generalized RED implementation used for DiffServ support support ingress: a queue used for policing ingress traffic ingress: a queue used for policing ingress traffic
Schedulers The second group of queuing disciplines includes: cbq: implementation of the class based queuing link-sharing cbq: implementation of the class based queuing link-sharing scheme scheme atm: a special qdisc which supports the re-direction of flows atm: a special qdisc which supports the re-direction of flows to ATM virtual channels to ATM virtual channels csz: a Clark-Shenker-Zhang scheduling discipline csz: a Clark-Shenker-Zhang scheduling discipline dsmark: qdisc for DiffServ support (uses DSCP) dsmark: qdisc for DiffServ support (uses DSCP) wrr: a Weighted Round Robin scheduler wrr: a Weighted Round Robin scheduler
Sample qdisc which has inner classes, filters and qdiscs
Sample Scenario for Traffic Control A small company has a 10 Mbit/s link which connects its workstations and one FTP server to an Internet service provider. Since bandwidth is a scarce resource the company wants to limit the share of the FTP traffic to 20% and at times where less bandwidth is needed by FTP the rest should be available for the workstations. On the other hand FTP traffic must never exceed its 20% share even if the rest of the bandwidth is currently unused because the companies. ISP charges extra for any bandwidth consumed above a rate of 2 Mbit/s. In order to solve this problem, a Linux router is installed at the edge of the corporate network.
Traffic Control Configuration The first Ethernet interface of the Linux router (eth0) is connected to the ISP, the second interface (eth1) is the link towards the internal network. Since it is only possible to limit outgoing traffic, the setup consists of two parts: the CBQ configuration limiting the outgoing traffic on eth0 (the downstream” traffic from the internal network’s point of view), and a second part limiting outgoing traffic on eth1 (the “upstream”).
Introduction to TCNG The Traffic Control Next Generation (TCNG) project focuses on: providing a compact and user-friendly configuration language, in which traffic control systems can be expressed in an intuitive way supporting hardware accelerators in traffic control TCNG is comprised by two major components: the Traffic Control Compiler (TCC) the Traffic Control Simulator (TCSIM)
Traffic Control Compiler TCNG language is closely modelled after common programming languages, such as C, Perl or Java. Consequently, learning effort is reduced for anyone who is familiar with one of these languages. Traffic Control Compiler translates configuration scripts from the TCNG language into a multitude of output formats used to configure traffic control subsystems.
TCC in Operation Traffic Control Compiler: gets its input from a script program invokes the appropriate input parser to translate the configuration data into a common internal data structure invokes one or more output generators (named “targets”) to issue commands to the corresponding output processor(s) Finally, output processors translate the output from tcc into actions understood by lower-level components.
TCC Internal Structure & Interface
Traffic Control Simulator Traffic Control Simulator is used to simulate the behavior of Linux Traffic Control at a very high level of detail. Traffic Control Simulator has been developed mainly for the following purposes: validation of configurations generated by tcc development of configuration scripts testing of traffic control components
TCSIM in Operation (1) Traffic Control Simulator: directly supports configuration using the standard traffic control language (tc), and it supports the new TCNG language by automatically invoking TCC, and integrating its output Furthermore, Traffic Control Simulator: combines the original traffic control code from the Linux kernel with the user-space code of the configuration utility tc, and adds the framework for communication among them, plus an event-driven simulation engine
TCSIM in Operation (2) The resulting program runs entirely in user space, but executes almost exactly the same code as a “real system”, approximating the behavior of traffic control in a Linux system much more accurately than a more general simulator (e.g. NS-2) would. Traffic Control Simulator: processes a script defining the system configuration and the data to send, and generates a message trace, which can then be processed to obtain statistics or graphs
TCSIM Internals and Helper Programs
TCNG Example: Steps 1-2 Step 1: We write the following TCNG code in the file: example.tc dev eth0 { egress { drop if tcp_sport != PORT_HTTP; } Step 2: We run tcc to convert the TCNG configuration to tc commands. We save the output in the file: example.sh tcc example.tc > example.sh
TCNG Example: After Step 2 After Step 2 the file example.sh contains the following tc configuration: tc qdisc add dev eth0 handle 1:0 root dsmark indices 1 default_index 0 tc filter add dev eth0 parent 1:0 protocol all prio 1 handle 1:0:0 u32 divisor 1 tc filter add dev eth0 parent 1:0 protocol all prio 1 u32 match u8 0x6 0xff at 9 offset at 0 mask 0f00 shift 6 eat link 1:0:0 tc filter add dev eth0 parent 1:0 protocol all prio 1 handle 1:0:1 u32 ht 1:0:0 match u16 0x50 0xffff at 0 classid 1:0 tc filter add dev eth0 parent 1:0 protocol all prio 1 u32 match u8 0x6 0xff at 9 classid 1:0 police index 1 rate 1bps burst 1 action drop/drop tc filter add dev eth0 parent 1:0 protocol all prio 1 u32 match u32 0x0 0x0 at 0 classid 1:0
TCNG Example: Step 3 Step 3: We define a simulation scenario in the file: example.tcsim with one interface called eth0, running at 100 Mbps. The simulation scenario consists of sending two packets. #include “packet.def” #include “ports.tc” dev eth0 100 Mbps { #include “example.tc” } send TCP_PCK($tcp_sport = PORT_HTTP); send TCP_PCK($tcp_sport = PORT_SSH); end
TCNG Example: Step 4 Step 4: We run the simulation with tcsim: tcsim –s 22 example.tcsim The output looks like this: E : 0x80bd : eth0: a a D : 0x80bd : eth0: a a E : 0x80bd : eth0: a a * : 0x80bd : eth0: enqueue returns POLICED (3)
TCNG Example: Steps 5-6 Step 5: We verify that the configuration did indeed work: The first packet was enqueued (“E”), and then dequeued (“D”). When trying to enqueue the second packet, it is rejected. Step 6: We can try this example on a live system. We execute the tc commands to create the configuration in the kernel: sh example.sh
A more comprehensive TCNG example (1) This example illustrates most of the elements found in a typical TCNG configuration: dev "eth0" { egress { class ( ) if tcp_dport == PORT_HTTP; class ( ) if 1; prio { $high = class (1) { fifo (limit 20kB); } $low = class (2) { fifo (limit 100kB); }
A more comprehensive TCNG example (2) The dev and egress lines dev "eth0" { determine what is being configured: egress { i.e. the egress (outbound) side of the network interface eth0. The configuration consists of two parts: the classification: class ( ) if tcp_dport == PORT_HTTP; class ( ) if 1; the setup of the queuing system: prio { $high = class (1) { fifo (limit 20kB);} $low = class (2) { fifo (limit 100kB);} In this example, we use a priority scheduler with two classes for the priorities “high” and “low”.
A more comprehensive TCNG example (3) In this configuration, packets: with TCP destination port 80 (HTTP) are sent to the high priority class, while all other packets (if 1;) are sent to the low priority class The queuing part defines the queuing discipline for static priorities, with the two classes: Inside the high priority class, there is another queuing discipline: a simple FIFO with a capacity of 20 KB. Likewise, the low priority class contains a FIFO with 100 KB.
A more comprehensive TCNG example (4) The compilation of this TCNG code results in the following tc configuration: tc qdisc add dev eth0 handle 1:0 root dsmark indices 4 default_index 0 tc qdisc add dev eth0 handle 2:0 parent 1:0 prio tc qdisc add dev eth0 handle 3:0 parent 2:1 bfifo limit tc qdisc add dev eth0 handle 4:0 parent 2:2 bfifo limit tc filter add dev eth0 parent 2:0 protocol all prio 1 tcindex mask 0x3 shift 0 tc filter add dev eth0 parent 2:0 protocol all prio 1 handle 2 tcindex classid 2:2 tc filter add dev eth0 parent 2:0 protocol all prio 1 handle 1 tcindex classid 2:1 tc filter add dev eth0 parent 1:0 protocol all prio 1 handle 1:0:0 u32 divisor 1 tc filter add dev eth0 parent 1:0 protocol all prio 1 u32 match u8 0x6 0xff at 9 offset at 0 mask 0f00 shift 6 eat link 1:0:0 tc filter add dev eth0 parent 1:0 protocol all prio 1 handle 1:0:1 u32 ht 1:0:0 match u16 0x50 0xffff at 2 classid 1:1 tc filter add dev eth0 parent 1:0 protocol all prio 1 u32 match u32 0x0 0x0 at 0 classid 1:2
Simulation Output By default, TCSIM prints a message: whenever a packet is enqueued or dequeued, or when some exceptional condition (e.g. an error) occurs. This output can be post-processed: to extract statistical data, or to generate a graphical representation of traffic characteristics TCSIM can also provide more detailed information on the inner workings of the traffic control subsystem, which is useful: for testing configurations, and the development of new traffic control elements
Pretty-printing Traces (1) The script tcsim_pretty can be used to format traces in a more human-readable way. Running the simulation script example.tcsim with the command syntax: tcsim example.tcsim produces the following output: E : 0x93a87c8 40 : eth0: a a D : 0x93a87c8 40 : eth0: a a E : 0x93a88c0 40 : eth0: a a * : 0x93a88c0 40 : eth0: enqueue returns POLICED (3)
Pretty-printing Traces (2) Running the same simulation script with the following command syntax: tcsim example.tcsim | tcsim_pretty produces a more readable output: x9d207c8 E 40: eth0: a a = D 40: eth0: a a x9d208c0 E 40: eth0: a a = * eth0: enqueue returns POLICED (3)
Output Filtering (1) Enqueue and dequeue records can be selected in trace output with the tcsim_filter script. Additional filtering is supported, according to a selection of fields. The following fields are recognized: tos: TOS byte len: Total length field src: Source IP address dst: Destination IP address sport: Source port (TCP or UDP) dport: Destination port (TCP or UDP) dev: Device name (e.g. eth0)
Output Filtering (2) When printing records, each line contains: the time the ID string the packet length in bytes The tcsim_filter script supports counting the results instead of printing data points on standard output. In this case, the records with the same ID string are counted.
Examples of Output Filtering Running the simulation script dsmark+policing with the command syntax: tcsim dsmark+policing | tcsim_filter -c tos produces the following output: D: D:b8 139 E: E: Likewise, tcsim dsmark+policing | tcsim_filter -c tos=0xb8 produces the output: D 139
Graphical Output Filtered output can be further processed with the script tcsim_plot, which uses gnuplot to generate plots. The following plot types are available: rate: Bit rate (based on the inter-arrival time) iat: Packet inter-arrival time cumul: Cumulative amount of data delay: Queuing delay, measured at dequeue time
Token Bucket Scenario #define RATE 1Mbit #define BURST 3kB #define LIMIT 20kB #define NOTHING #define PACKET/* 100-sizeof(iphdr) = 80 bytes. */ IP_PCK(NOTHING) 0 x 80 dev eth /* 10 Mbps */ tc qdisc add dev eth0 root handle 1:0 tbf limit LIMIT rate RATE burst BURST every s send PACKET/* 1.6 Mbps */ time 1s end
Packet Losses Scenario 1 Scenario 2 Scenario 3 Scenario 4 Rate = 1 MbpsRate = 1 Mbps Rate = 1 Mbps Rate = 1.1 Mbps Burst=3KB Burst = 3KB Burst = 5KB Burst = 3KB Limit = 20KB Limit = 10KB Limit = 20KBLimit = 20KB Running the simulation script tbf with the command syntax: tcsim tbf | tcsim_filter -c which counts the packets enqueued and dequeued, produces the following outputs for each scenario: D: 1590D: 1488D: 1612 D: 1727 E: 2002E: 2002 E: 2002 E: 2002
Time of 1 st Packet Loss (1) Scenario E : 0x9ecf : eth0: a E : 0x9ed3c : eth0: a * : 0x9ed3c : eth0: enqueue returns DROP (1) E : 0x9ed3c : eth0: a * : 0x9ed3c : eth0: enqueue returns DROP (1) Scenario E : 0x82b0ee8 100 : eth0: a E : 0x82b0ff8 100 : eth0: a * : 0x82b0ff8 100 : eth0: enqueue returns DROP (1) D : 0x82aece8 100 : eth0: a
Time of 1 st Packet Loss (2) Scenario E : 0x96dbb : eth0: a E : 0x96dbc : eth0: a * : 0x96dbc : eth0: enqueue returns DROP (1) E : 0x96dbc : eth0: a * : 0x96dbc : eth0: enqueue returns DROP (1) Scenario E : 0x8583b : eth0: a E : 0x8583c : eth0: a * : 0x8583c : eth0: enqueue returns DROP (1) E : 0x8583c : eth0: a * : 0x8583c : eth0: enqueue returns DROP (1)
Cumulative Amount of Data for Scenarios
Queuing Delay for Scenarios
TCSIM Restrictions & Extensions TCSIM only includes a small part of the network stack, and does not support full routing or firewalling. Therefore, the route classifier is not available in tcsim, and the usability of the fw classifier is limited. tc bugs may crash TCSIM. TCSIM supports only the simulation of constant bit-rate flows (using the every keyword) and the sending of single packets at a specified point in time. In order to support the simulation of Poisson distributed and bursty flows, a simple tool, the TCSIM Traffic Generator (TrafGen) was developed, which creates trace files to be used in a simulation.
References TCNG HomePage, URL: Linux Advanced Routing & Traffic Control, URL: L. Wischhof and J. W. Lockwood, “Packet Scheduling for Link- Sharing and Quality of Service Support in Wireless Local Area Networks”, November 2001 Linux IP, URL: Practical QoS, URL: