This blogpost is about a linux feature to introduce delays in sending packets on the network.
A first question obviously is: why would you want this? Well, for several reasons actually.
If you deploy anything in the cloud in multiple availability zones, there will be a delay between nodes in these zone's, because the physical distance will enforce a delay, because a packet has to travel that distance, and that takes time, which is ultimately limited by the speed of light. On top of that comes the logical distance, which is how the network between the two is shaped, which can introduce more latency.
Another reason is to test how something behaves when a certain network latency is introduced to a networked application. YugabyteDB is a distributed database, and uses the network to communicate between the nodes in the YugabyteDB cluster.
I am using Alma Linux version 8.5.
It actually looks very simple: a simple search on google shows how to add a delay (in this case: of 100 milliseconds):
tc qdisc add dev eth1 root netem delay 100ms
This means you must have the tc utility installed. If it's not installed, you should install the iproute-tc
package.
However, this throws the following error on my linux box:
# tc qdisc add dev eth1 root netem delay 100ms
Error: Specified qdisc not found.
Checking for the tc (traffic control) settings show no indication of any delay set:
# tc qdisc show dev eth1
qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
Lot's of details, but no delay.
Performing network delays, as well as other features such as simulating packet loss or duplicate packets are part of a 'qdisc' (queueing discipline) called 'netem' (network emulator).
It turns out the 'netem' 'qdisc' is a part of the tc utility, it is named in the manpage of tc, but it actually is a kernel module that makes this feature available, which is not installed by default. In order to make this available, the package kernel-modules-extra
must be installed, which installs the sch_netem
kernel module (among others), which provides the tc netem functionality. Once the kernel module is available, it will allow the tc command to succeed:
tc qdisc add dev eth1 root netem delay 100ms
(it silently succeeds)
Checking with tc qdisc show dev eth1 shows netem is set:
# tc qdisc show dev eth1
qdisc netem 8003: root refcnt 2 limit 1000 delay 100ms
Any network traffic sent to the machine that has the netem (network emulator) qdisc set, will have the packet sent with a delay of 100ms;
Without netem:
$ ping -c 3 192.168.66.82
PING 192.168.66.82 (192.168.66.82) 56(84) bytes of data.
64 bytes from 192.168.66.82: icmp_seq=1 ttl=64 time=0.373 ms
64 bytes from 192.168.66.82: icmp_seq=2 ttl=64 time=0.367 ms
64 bytes from 192.168.66.82: icmp_seq=3 ttl=64 time=0.264 ms
With netem:
$ ping -c 3 192.168.66.82
PING 192.168.66.82 (192.168.66.82) 56(84) bytes of data.
64 bytes from 192.168.66.82: icmp_seq=1 ttl=64 time=101 ms
64 bytes from 192.168.66.82: icmp_seq=2 ttl=64 time=100 ms
64 bytes from 192.168.66.82: icmp_seq=3 ttl=64 time=101 ms
Remove:
# tc qdisc del dev eth1 root
However, this is now a rather brute force delay: everything that is sent from the device eth1 on the host that has traffic control setup is impacted. If you want to have the host to only apply the delay for a limited number of hosts (ip addresses), you can split normal and delayed output and match the to be delayed output with a filter!
In my case I want to only apply the delay for anything that is sent to hosts 192.168.66.80 and 192.168.66.81 (from node 192.168.66.82). In this way I can mimic node 192.168.66.82 being "far away" (and thus having higher latency):
# tc qdisc add dev eth1 root handle 1: prio
# tc qdisc add dev eth1 parent 1:3 handle 30: netem delay 100ms
# tc filter add dev eth1 protocol ip parent 1:0 priority 3 u32 match ip dst 192.168.66.80 flowid 1:3
# tc filter add dev eth1 protocol ip parent 1:0 priority 3 u32 match ip dst 192.168.66.81 flowid 1:3
On the first line I create a queueing discipline attached to the root of the device, and on the second line a parent with flowid 1:3. The third and fourth lines add filters, which binds outgoing traffic on device eth1 to the ip addresses 192.168.66.80 and 192.168.66.81 to be going through flowid 1:3.
Remove:
# tc qdisc del dev eth1 root
If you really like this, and think the examples, even with the filtering to specific ip addresses, is still rather simple: there is a whole world of traffic shaping possibilities, such as variable latencies, fixed or variable packet loss and generating identical packets to simulate network issues! See the manpage of the tc
utility.
Conclusion
Traffic shaping is a valuable tool for testing network influence for any application that includes and is dependent on network traffic, such as the Yugabyte database, but also sharded databases, and databases with replication setup, which would otherwise only be possible by implementing it physically over the world. The linux tc utility allows you to test this on your own laptop.
If you want to take this further, and more closely simulate high latencies for a node in a (local) cluster of nodes, you should set a delay on sending from the local nodes to one or more nodes deemed 'far away', and vice-versa sending from the "remote nodes" to the local ones.
Top comments (3)
Your example has one rather fatal error in using netem, in that as you increase delay, you also need to increase the packet limit to be able to store the packets correctly in that virtual length. There are many other ways to get netem wrong, but if you want to observe something more realistic at that delay, try 10000 packets or more at that limit. This long post needs some updating...
bufferbloat.net/projects/codel/wik...
While I'm pleased that modern-day systems like yours now default to fq_codel, and for limited tests "TSQ" helps get something closer to an accurate result - using just "delay" as a parameter to netem will lead to incorrect conclusions.
another way to check your works is with tc -s qdisc show. If you are getting packet drops from netem, your test is not measuring what you thought it was.
Thank you Dave for your comments. Like always, the devil is in the details, and this is not exception!
If I read your link correctly, the problem is netem will build up packets when delaying execution, and it will drop any packet that is added when the number of packets reaches its default limit, which is 1000 packets.
If that is happening, and you expect such a setting to just delay packets and not these being dropped, obviously the test is flawed, because something else is happening that you think is happening.
Would you deem a test to be correct if
tc -s qdisc show
will not show drops?In principle it's logical that when delaying any form of packet delivery that the packets need to be stored to wait for the delay. And a network device and network stack has limits to what it can do, and will drop any packet that it cannot store because it has run out of its buffers.
(in the past I've seen a fair share of systems where simply too much traffic was sent over a (network) cluster interconnect, leading to packet drops because the network buffer was full)
The link does mention that netem with qdisc doesn't work, and requires a separate machine; how does the separate machine need to be configured to perform predictable network delay?