DEV Community

Cover image for Simulate Clock Skew in Docker Container
Franck Pachot for YugabyteDB Distributed PostgreSQL Database

Posted on • Edited on

Simulate Clock Skew in Docker Container

In real deployments, without atomic clocks, the time synchronized by NTP can drift, and servers in a distributed system can show a clock skew of hundreds of milliseconds. A simple way to test this in a Docker lab is to fake the clock_gettime function. Here is an example with a 2-node RF1 YugabyteDB cluster (PostgreSQL-compatible Distributed SQL database).

I create a yb network and start the first node, yb1 in the background:

docker network create yb
docker run -d  --rm --network yb --hostname yb1 -p 7000:7000 yugabytedb/yugabyte yugabyted start --background=false --tserver_flags="TEST_docdb_log_write_batches=true"

Enter fullscreen mode Exit fullscreen mode

I start a shell in a second node:

docker run -it --rm --network yb --hostname yb2              yugabytedb/yugabyte bash

Enter fullscreen mode Exit fullscreen mode

In this container, I wait to be sure that yb1 is up and start yb2 that joins yb1

until postgres/bin/pg_isready -h yb1.yb ; do sleep 1 ; done
yugabyted start --join yb1.yb --tserver_flags="TEST_docdb_log_write_batches=true"

Enter fullscreen mode Exit fullscreen mode

Here, running on the same host, both containers show the same Physical Time in http://localhost:7000/tablet-server-clocks
Image description

I install gcc and compile a fake_clock_gettime.so that overrides clock_gettime, calls the original one, and subtracts 499 milliseconds to its result:

cat > fake_clock_gettime.c <<'C'
#define _GNU_SOURCE
#include <stdlib.h>
#include <dlfcn.h>
int clock_gettime(clockid_t clk_id, struct timespec *tp)
{
  static int skew_millisecond = 499;
  static int (*origin_clock_gettime)();
  static int ret;
  // define the real clock_gettime and call it
  if(!origin_clock_gettime) {
   origin_clock_gettime = (int (*)()) dlsym(RTLD_NEXT, "clock_gettime");
  }
  ret=origin_clock_gettime(clk_id,tp);
  // add clock skew and return
  if (tp->tv_nsec >= skew_millisecond * 1000000 ) {
      tp->tv_nsec -= skew_millisecond * 1000000  ;
  } else {
      tp->tv_sec -= 1;
      tp->tv_nsec += 1000000000 - skew_millisecond * 1000000 ;
  }
  return(ret);
}
C

dnf install -y gcc

gcc -o fake_clock_gettime.so -fPIC -shared fake_clock_gettime.c -ldl

Enter fullscreen mode Exit fullscreen mode

This library can be loaded with LD_PRELOAD, and I test it by calling date:

[root@yb2 yugabyte]# date +"%T:%N" ; LD_PRELOAD=$PWD/fake_clock_gettime.so date +"%T:%N" ; date +"%T:%N"
21:31:44:015385334
21:31:43:518894559
21:31:44:020271039
[root@yb2 yugabyte]# date +"%T:%N" ; LD_PRELOAD=$PWD/fake_clock_gettime.so date +"%T:%N" ; date +"%T:%N"
21:31:45:955772746
21:31:45:459189786
21:31:45:960576587

Enter fullscreen mode Exit fullscreen mode

The date called with the library shows a lower time.

I restart YugabyteDB on yb2 with this hack:

yugabyted stop
LD_PRELOAD=$PWD/fake_clock_gettime.so yugabyted start

Enter fullscreen mode Exit fullscreen mode

I can see the clock skew on the Physical Time and Hybrid Time:
Image description

I run some workload that involves tablets in both nodes to get some Lamport logical clock synchronization:

/home/yugabyte/postgres/bin/ysql_bench -i -h $(hostname) -s 10

Enter fullscreen mode Exit fullscreen mode

With the messaging between the nodes, the Physical Time still shows a clock skew, but the Logical Time is synchronized:
Image description

If you are curious, here is more information about clock synchronisation in distributed databases: https://www.yugabyte.com/blog/evolving-clock-sync-for-distributed-databases/

Top comments (0)