Leon Adato for New Relic

Posted on Mar 9, 2023 • Originally published at adatosystems.com

Mind the Gap

#monitoring #observability #nrflex #pihole

(This post originally appeared on AdatoSystems.com)

Recently I posted a tutorial on how I monitored my Raspberry-pi based "pihole" server using New Relic and Flex.

Like many tutorials, what you read there is the end-result, a narrative of the perfect execution of a well-conceived idea where all steps and variables are foreseen beforehand.

This, my friends, is not how a day in I.T. typically goes.

The truth is that I had a lot of false starts and a couple of sleepless nights trying to get everything put together just the way I wanted it. Aspects of both the pihole itself and New Relic didn't work the way I initially expected, and I had to find work-arounds. In the end, if it weren't for the help of several colleagues - including Zameer Fouzan, Kav Pather, Haihong Ren,

Before you call for your fainting couch and smelling salts, shocked as I know you are to hear me imply that New Relic isn't pure perfection and elegant execution, I want to be clear: While no tool is perfect or does all things for all people, the issues I ran into were completely normal and the ultimate solutions were both simple to understand and easy to execute.

What I initially struggled with was trying to make New Relic operate based on my biases of how I thought things ought to work, rather than understanding and accepting how they did work.

The problem in a nutshell

But I'm being un-necessarily vague. Let me get to the specifics:

On the pihole, you can query the API for data like this:
http://pi.hole/admin/api.php?summary

And it will give you output like this:

{"domains_being_blocked":"177,888","dns_queries_today":"41,240","ads_blocked_today":"2,802","ads_percentage_today":"6.8","unique_domains":"8,001","queries_forwarded":"18,912","queries_cached":"19,266","clients_ever_seen":"34","unique_clients":"28","dns_queries_all_types":"41,240","reply_UNKNOWN":"258","reply_NODATA":"1,155","reply_NXDOMAIN":"11,989","reply_CNAME":"12,296","reply_IP":"15,436","reply_DOMAIN":"48","reply_RRNAME":"0","reply_SERVFAIL":"2","reply_REFUSED":"0","reply_NOTIMP":"0","reply_OTHER":"0","reply_DNSSEC":"0","reply_NONE":"0","reply_BLOB":"56","dns_queries_all_replies":"41,240","privacy_level":"0","status":"enabled","gravity_last_updated":{"file_exists":true,"absolute":1676309149,"relative":{"days":4,"hours":0,"minutes":27}}}

While it may not be immediate obvious, let me draw your attention to the issue:
"domains_being_blocked":"177,888"

Being surrounded by quotes, New Relic will treat the number "177,888" as a string (text) rather than as a number.

NRQL to the rescue!

My first attempt to fix this leveraged the obvious (and ultimately incomplete) approach of changing the type of the input using a function. In this case numeric() is purpose-built for to do just that - take data that's "typed" as a string and treat it as a number. Easy-peasy right?

If you've worked in IT for more than 15 minutes, you know the answer is, "of course not". This technique only worked for numbers less than 1,000

The reason for this is that numeric() can't handle formatted numbers - meaning strings with symbols for currency, percentage, or - to my chagrin - commas.

At that point, my colleague and fellow DevRel Advocate Zameer Fouzan came to the rescue. He helped me leverage one of the newer capabilities in NRQL - the ability to parse out sub-elements in a table.

The feature is named aparse() which stands for "anchor parse" You can find more information about it here, but in brief it lets you name a field, describe how you want to separate it, and then rename the separated parts. Like this:

aparse(unique_domains ,'*,*' ) As (n1,n2)

In plain English this says, "take the data in the unique_domains field, put everything before the comma into one variable (called n1), and everything after the comma into another variable (called n2)."

Now I have the two halves of my number and I can recombine them:

numeric(concat(n1,n2))

The result looks like this:

Which might have been the end of my problems, except if the numbers go into the millions.

A more FLEX-able approach

The penultimate step for resolving this issue was to take it back to the source - in this case the New Relic Flex integration, to see if I couldn't reformat the numbers before sending them into New Relic. Which is absolutely possible.

Within the Flex YAML file, there are a lot of possibilities for parsing, re-arranging, and reformatting the data prior to passing it into the data store. One of the most powerful of these is jq. You can find the New Relic documentation on jq here. But for a deeper dive into the utility itself, you should go to the source.

I can't describe how jq works any better than the author:

"A jq program is a "filter": it takes an input, and produces an output. There are a lot of builtin filters for extracting a particular field of an object, or converting a number to a string, or various other standard tasks."

Being a little more specific, jq will take JSON input, and for every key that matches your search parameters, it will output the value of that key, and reformat it in the process if you tell it to.

Therefore, I could create the most basic search filter like this:
jq > .domains_being_blocked

...and it would output "177,888". But that wouldn't solve my issue. HOWEVER, using additional filters you can split the output by it's comma, join the two parts back together, set the output as a number, and come out the other side with a beautiful (and correct) output set.

But I don't want you to think this solution occurred to me all on it's own, or that I was able to slap it all together with minimal effort. This was all as new to me as it may be to you, and what you're reading below comes from the amazing and generous minds of Kav Pather (who basically invented the Flex integration) and Senior Solutions Architect Haihong Ren.

Unwinding the fullness of the jq string below is far beyond the scope of this blog. But Kav and Haihong have helped me to understand enough to summarize it as:

Pull out the "status:" key and the entire "gravity_last_updated" section and keep it as-is.
For everything else,
- split the value on the comma
- put the component parts back together (without the comma)
- output it as a number, rather than a string
Finally, output everything (status, gravity_last_updated, and all the other values) as a single data block which Flex will pack up and send to New Relic.

The full YAML file looks like this::

integrations:
  - name: nri-flex
    config:
      name: pihole_test
      apis:
        - name: pihole_test
          url: http://pi.hole/admin/api.php?summary&auth=a11049ddbf38fc1b678f4c4b17b87999a35a1d56617a9e2dcc36f1cc176ab7ce
          jq: >
            .[]|with_entries( select(.key | test("^gravity_last_updated|status|api"))) as $xx | with_entries( select(.key | test("^gravity_last_updated|status|api")|not)) |to_entries|map({(.key):(.value|split(",")|join("")|tonumber)})|add+$xx
          headers:
            accept: application/json
      remove_keys:
        - timestamp

Which results in data that looks like this:

"domains_being_blocked": 182113
"dns_queries_today": 41258
"ads_blocked_today": 3152
(and so on)

This is a wonderfully effective solution to the entire issue!

A summary, with a plot twist

To sum everything up:

We learned how to convert strings to numbers in NRQL using numeric().
We learned how to split strings in NRQL based on a delimiter (or even text) using aparse().
We learned how to put the split parts back together in NRQL using concat().
We learned how to use jq in Flex to perform fairly complex string and data manipulations.
But most importantly, we learned that asking colleagues for help isn't a sign of weakness, it's a sign of maturity and that, as author and speaker Ken Blanchard said, "None of us is as smart as all of us."

But... after all of my searching, all of my queries to coworkers, all of my testing and troubleshooting; and after all the associated tears, rage, and frustration - I discovered I didn't need any of it.

Remember what I said at the start of this blog?

What I initially struggled with was trying to make New Relic operate based on my biases of how I thought things ought to work, rather than understanding and accepting how they did work.

In my race to force New Relic to do all the heavy lifting; in my egocentric need to force New Relic to accept any type of data I threw at it, I ignored the simplest and most elegent solution of all: Pihole has an option to output raw, unformatted data.

In the place of the standard URL:
http://pi.hole/admin/api.php?summary

If instead I had used:
http://pi.hole/admin/api.php?summaryRaw

...all of the numbers come out in a way that New Relic can take and use without any additional manipulation.

{
  "domains_being_blocked": 182113,
  "dns_queries_today": 42825,
  "ads_blocked_today": 1846,
  "ads_percentage_today": 4.310566,
  "unique_domains": 9228,
  "queries_forwarded": 25224,
  "queries_cached": 15547,
  "clients_ever_seen": 36,
  (and so on)

Which simply goes to show that the solutions to our problems are out there, as long as we have the patience and perseverance to find it.

DEV Community

Mind the Gap

The problem in a nutshell

NRQL to the rescue!

A more FLEX-able approach

A summary, with a plot twist

Top comments (0)

Read next

Prometheus 3.0: A Quantum Leap in Monitoring

AWS CloudWatch Observability Solutions: Game-Changer or Just a Glossy Wrapper? Honest First Impressions!

Get Started with NAGIOS: Basic to Intermediate

End-to-End Observability Project - Zero-to-Hero