DEV Community

Cover image for InfluxDB: Optimize your Data with Data Retention Policies and Continuous Queries
Sebastian
Sebastian

Posted on

InfluxDB: Optimize your Data with Data Retention Policies and Continuous Queries

InfluxDB is a time series database that you will find in the context of IOT projects. What makes these databases time-based is the simple fact that every record is inserted with a fixed timestamp. The rest of the data are either tags - discrete values that can be used for indexing - or values, continuous data like temperature, humidity, energy consumption and everything else that sensors detect and record. InfluxDB is easy to setup as a Docker container that comes preconfigured with the required binary programs and a working default configuration. It’s very easy to start collecting data from your sensors or telemetry data from your IOT devices - see my earlier article about monitoring a Raspberry Pi. Usually, you visualize and consume this data with additional external tools, for example Grafana.

But there is more. InfluxDB provides two query languages to select and manipulate the data: InfluxQL, a language that resembles SQL, and Flux, a powerful scripting language. This article introduces the InfluxQL language. You will learn how to open a database shell, see the basis commands to list data, and finally learn about retention policies and continuous queries, two powerful ways to trim large datasets.

The technical context of this article is InfluxDB v1.11.8, but it should work with newer versions as well.

This article originally appeared at my blog admantium.com

InfluxDB Shell

To start, connect to the computer or exec into the Docker container that runs Influx, and then open the shell by executing the following command:

> influx

Connected to http://localhost:8086 version 1.8.9
InfluxDB shell version: 1.8.9
Enter fullscreen mode Exit fullscreen mode

The shell loads and we can start writing queries. Let’s start simple and see the list of all databases.


> SHOW databases

name: databases
name
----
_internal
test
telegraf
telegraf-macbook
>
Enter fullscreen mode Exit fullscreen mode

The database telegraf contains the telemetry data from my Raspberry Pi. Let’s see which measurements, akin to a database table, are included in it.

> USE telegraf

Using database telegraf

> SHOW MEASUREMENTS

name: measurements
name
----
alive
cpu
cpu_temperatures
disk
environment
mem
system
>
Enter fullscreen mode Exit fullscreen mode

To see the individual data inside a measurement, and limit the number of records being shown.

> select * from cpu_temperatures limit 10

name: cpu_temperatures
time                node        temperature
----                ----        -----------
1629979793360140718 raspberrypi 51.5
1629981542154078749 raspberrypi 52.6
1629981601660331504 raspberrypi 53.7
1629981662117939950 raspberrypi 53.2
1629981721628356266 raspberrypi 53.7
1629981782141150209 raspberrypi 53.7
1629981841613292250 raspberrypi 53.7
1629981902069031601 raspberrypi 53.7
1629981961562980036 raspberrypi 53.7
1629982022056184813 raspberrypi 53.7
Enter fullscreen mode Exit fullscreen mode

These are the basics of the InfluxQL language. Let’s continue to learn about the powerful concept of data retention.

Data Retention

A data retention defines how long data is kept in your database. Normally you store data one way, indefinitely. And over time, as the storage grows, so grows the query execution time. Also, keeping data indefinitely might not be required or desired, or is telemetry data like the CPU load of your Raspberry Pi from July 2021 still valuable?

When you start using influxdb, a default retention policy called autogen is generated. Let’s see it.

> show retention policies on telegraf

name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       168h0m0s           1        true
Enter fullscreen mode Exit fullscreen mode

This policy has an indefinite duration. To create a new retention policy that keeps data for 4 weeks, execute this:

CREATE RETENTION POLICY "4w" ON "telegraf" DURATION 4w REPLICATION 1 SHARD DURATION 1w
Enter fullscreen mode Exit fullscreen mode

Now you have two policies:

> show retention policies on telegraf
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       168h0m0s           1        true
4w      672h0m0s 168h0m0s           1        false
>
Enter fullscreen mode Exit fullscreen mode

If you want to delete old data, you can either explicitly add new data to the database by using the new retention policy, or set a new default retention policy. I would choose the latter. But before we do this, InfluxDB also offers a handy way to aggregate fine-grained time-based data into more coarse-grained datasets. For example, the telemetry data collected every minute can be compressed to about 1-hour timeslots. Let’s see how to do this.

Down Sampling Data with Continuous Queries

Down sampling is the process of reducing the timestamp granularity of your data. You reduce the data points to a bigger interval, e.g. from one measurement each minute to one measurement each hour. In this way, you still keep that data, but reduce it to the time intervals that matter to you.

The typical process of down sampling is to first check the data you want to compact: Identify the most important fields, determine their values, and select an appropriate aggregation or selectors to apply. Then, create new tables that contain these values and execute the query.

Let’s see an example for this process by down sampling the telegraf data about CPU usage. The complete data set has these values:

select * from cpu LIMIT 10
name: cpu
time                cpu       host   usage_idle       usage_iowait        usage_irq usage_nice usage_softirq        usage_steal usage_system       usage_user
----                ---       ----  -----------       ----------------    --------- ---------  --------------       ----------- ------------       -----------
1638993720000000000 cpu-total nexus 88.48673160825874 0.5542492442081621  0         0          0.03359086328513486  0           7.285018474984913  3.640409808547399
1638993900000000000 cpu-total nexus 88.4373427276703  0.5494044623344672  0         0          0.020969635967077953 0           7.356148297215396  3.6361348766956656
1638993960000000000 cpu-total nexus 85.5886417890801  0.3434267286484769  0         0          0.0753863550696145   0           9.837919336614593  4.15462579048627
Enter fullscreen mode Exit fullscreen mode

The interesting values are usage_idle, usage_system and usage_user. These numerical values are averaged with the mean aggregator.

Let’s create a new database first.

CREATE DATABASE telegraf_historic
Enter fullscreen mode Exit fullscreen mode

The following query down samples this data to 24hour intervals.

SELECT  mean("usage_system") AS usage_system,
        mean("usage_user") AS "usage_user",
        mean("usage_idle") AS usage_idle
INTO "telegraf_historic"."autogen"."cpu"
FROM cpu
GROUP BY time(4h), *
Enter fullscreen mode Exit fullscreen mode

Now this new table contains the values exactly as wanted.

> select * from cpu Limit 3

time                cpu       host         usage_idle        usage_system       usage_user
----                ---       ----         ----------        ------------       ----------
1629903600000000000 cpu-total 149d0f0d1652 96.29634109411532 1.7514928790992803 1.8512490785491578
1629907200000000000 cpu-total 149d0f0d1652 91.8826191680272  4.19285087804316   2.0645028164087944
1629910800000000000 cpu-total 149d0f0d1652 88.40234061999482 6.022559852282428  2.0730418859291064
Enter fullscreen mode Exit fullscreen mode

One big caveat: As you see, the tag host is transformed to a number format, not a string. Astonishingly, there is no way in InfluxDB to change the tag values with a select INTO values, as documented in this issue from 2015, or a way to rename tags of existing data as discussed in this issue. The workarounds are to use a client software or to use the Flux language, but both is outside the scope of this article.

Combining Data Retention and Continuous Queries

Now we are ready to form the solution for effective data management.

The queries used to average data are executed at specific defined time points with continuous queries. Basically, you wrap the working query into another one, specify the target database and the timespan after which it should trigger. To define the above shown query as a continuous query, we can execute the following statement:

CREATE CONTINUOUS QUERY cq_cpu_temperatures
ON telegraf
RESAMPLE EVERY 24h
BEGIN
  SELECT  mean("usage_system") AS usage_system,
          mean("usage_user") AS "usage_user",
          mean("usage_idle") AS usage_idle
  INTO "telegraf_historic"."autogen"."cpu"
  FROM cpu
  GROUP BY time(4h), *
END
Enter fullscreen mode Exit fullscreen mode

We repeat this for all other tables. Once completed, see all defined continuous queries with the following statement:

SHOW CONTINOUS QUERIES

name: telegraf
name                query
----                -----
cq_cpu              CREATE CONTINUOUS QUERY cq_cpu ON telegraf RESAMPLE EVERY 1d BEGIN SELECT mean(usage_system) AS usage_system, mean(usage_user) AS usage_user, mean(usage_idle) AS usage_idle INTO telegraf_historic.autogen.cpu FROM telegraf.autogen.cpu GROUP BY time(1h), * END
cq_mem              CREATE CONTINUOUS QUERY cq_mem ON telegraf RESAMPLE EVERY 1d BEGIN SELECT mean(used) AS used, mean(available) AS available INTO telegraf_historic.autogen.mem FROM telegraf.autogen.mem GROUP BY time(1h), *END
cq_system           CREATE CONTINUOUS QUERY cq_system ON telegraf RESAMPLE EVERY 1d BEGIN SELECT mean(load5) AS load5, max(uptime) AS uptime INTO telegraf_historic.autogen.system FROM telegraf.autogen.system GROUP BY time(1h),* END
cq_cpu_temperatures CREATE CONTINUOUS QUERY cq_cpu_temperatures ON telegraf RESAMPLE EVERY 1d BEGIN SELECT mean(temperature) AS temperature, mode(node) AS node INTO telegraf_historic.autogen.cpu_temperature FROM telegraf.autogen.cpu_temperatures GROUP BY time(1h), * END
Enter fullscreen mode Exit fullscreen mode

Once in place, wait for the timespan and check that the data is written correctly. And then, we simple change the default autogen retention policy to the newly define one.

SHOW RETENTION POLICIES "4w" ON telegraf DEFAULT

SHOW RETENTION POLICIES
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       168h0m0s           1        false
4w      672h0m0s 168h0m0s           1        true
Enter fullscreen mode Exit fullscreen mode

Conclusion

The time series database InfluxDB is a helpful tool, quick to setup for gathering data continuously. Over time, this data will grow, and at the same time, its importance diminishes - for example when you recorded telemetry data from your nodes. Instead of clinging to this data forever, in a very fine-grained time period, consider to down sample the data to coarser timeframes. This article showed you how to combine data retention policies, rules that govern how long is data kept, and continuous queries for down sampling queries, which trigger at defined time and average the data. Combining both provides you with an effective way to shape the date to only keep fields and timeframes that are interesting to you.

Top comments (0)