Wesley Skeen

Posted on Mar 13, 2023

Detecting PII leakage in logs

#grafana

First I wanted to mention I collaborated on this project and article with @mereta.

Before we begin, I want to direct you to the post I published to set up grafana locally using docker. Here you will find simple steps to get your environment set up to experiment.

Setup Grafana, Jaeger & Zipkin locally

Wesley Skeen ・ Nov 21 '22

#grafana #dotnet

Once you have this running, I want to direct you towards the promtail.yml file. This is what we are going to change to let promtail apply our PII detection logic.

Pipeline Stages

We are going to add pipeline_stages to this file.

Simply put, each log that gets passed through promtail will go through these stages. We can perform a number of actions that you can read in detail about here in the grafana docs, but I will go through stages to

Detect PII
Validate the result of the detection
Create a label to hold the result

Detect PII

As part of the stages section, we added the regex stage

- regex:
    expression: '(?P<sensitive_email>([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+))'

Here we add an expression. This is built up of 2 parts
(?P<{0}>({1}))

0 - This is the variable that holds the result of the regex match
1 - This is the actual regex used on the log content

Validate the result of the detection

Next we have the template stage

- template:
    source: sensitive_email
    template: '{{ not (empty .Value) }}'

This stage takes the result held in the variable that was set in the regex stage and applies some logic to it. This logic also updates the value of the variable.

Log	value in `sensitive_email`	`{{ not (empty .Value) }}`	`sensitive_email` new value
My email is JP@mail.com	JP@mail.com	true	true
My email is ***		false	false

Create a label to hold the result

For this all we have to do is add the following

- labels:
    sensitive_email:

This adds a label to the log and sets its value to what is held in sensitive_email

Example of it working

I added a log in my API

_logger.LogInformation($"my data is JP@mail.com");

Here is the result in Loki

As you can see, the log line is

and the value of sensitive_email is true

New content of `promtail.yml`

With the above addition of pipeline_stages this file should look like. I have added another example of detecting credit card PII.

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: system
    pipeline_stages:      
      - match:
          pipeline_name: "security"
          selector: '{app="api"}'
          stages:

            - regex:
                expression: '(?P<sensitive_creditcard>(?:\d[ -]*?){13,16})'
            - regex:
                expression: '(?P<sensitive_email>([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+))'

            - template:
                source: sensitive_creditcard
                template: '{{ not (empty .Value) }}'
            - template:
                source: sensitive_email
                template: '{{ not (empty .Value) }}'            

            - labels:
                sensitive_creditcard:
                sensitive_email:

    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/*local.log
          app: 'api'

Using the results of these stages

There are several things you can do with these new log labels. Among others, you could

Create an alert to detect if PII has leaked into your logs.
Create dashboards to monitor base on the new labels
You can do some interesting things in grafana such as route these logs to a different tenant. This tenant would have special privileges to view logs with PII contained.

Improvements

Merge the results of the regex matches into a single label.

First we need to update the source template to

- template:
    source: sensitive_email
    template: '{{ if not (empty .Value) }} true {{ end }}' 

- template:
     source: sensitive_creditcard
     template: '{{ if not (empty .Value) }} true {{ end }}'

then we add a new source template to merge the results

- template:
     source: sensitive
     template: '{{ or .sensitive_email .sensitive_creditcard false }}'

- labels:
     sensitive:

DEV Community

Detecting PII leakage in logs

Setup Grafana, Jaeger & Zipkin locally

Wesley Skeen ・ Nov 21 '22

Pipeline Stages

Detect PII

Validate the result of the detection

Create a label to hold the result

Example of it working

New content of `promtail.yml`

Using the results of these stages

Improvements

Top comments (0)

Read next

HackerRank Problem#13

How to Create a Custom Quote Template in Salesforce

Why AI Projects Fail — and How Monitoring Can Turn the Tide

React timed server-side flip just right

Setup Grafana, Jaeger & Zipkin locally

Wesley Skeen ・ Nov 21 '22

Pipeline Stages

Detect PII

Validate the result of the detection

Create a label to hold the result

Example of it working

New content of promtail.yml

Using the results of these stages

Improvements

Read next

HackerRank Problem#13

How to Create a Custom Quote Template in Salesforce

Why AI Projects Fail — and How Monitoring Can Turn the Tide

React timed server-side flip just right

New content of `promtail.yml`