Anton Golub

Posted on Oct 3, 2023 • Edited on Feb 3

config: enhancing declarations with graphs

#javascript #config #programming #node

Typically application configs are not a bottleneck or development issue. But everything can suddenly change when the context of expands with infrastructure domain, external sources interactions, environments.

Config mess

Many years ago configs were pretty simple. They looked more or less like .properties-files or INI-files, simple kv-maps with sections or composite keys to bring some kind of context:

# https://docs.oracle.com/cd/E23095_01/Platform.93/ATGProgGuide/html/s0204propertiesfileformat01.html

# You are reading a comment in ".properties" file.
! The exclamation mark can also be used for comments.
# Lines with "properties" contain a key and a value separated by a delimiting character.
# There are 3 delimiting characters: '=' (equal), ':' (colon) and whitespace (space, \t and \f).
website = https://en.wikipedia.org/
language : English
topic .properties files
# A word on a line will just create a key with no value.
empty

; last modified 1 April 2001 by John Doe
[owner]
name = John Doe
organization = Acme Widgets Inc.

[database]
; use IP address in case network name resolution is not working
server = 192.0.2.62     
port = 143
file = "payroll.dat"

At the same time, another part of the configuration was supplied from the environment variables or CLI parameters reflecting the idea of dynamic settings.

Now we use dotenv-files, ironic :

# https://hexdocs.pm/dotenvy/0.2.0/dotenv-file-format.html
S3_BUCKET=YOURS3BUCKET
SECRET_KEY=YOURSECRETKEYGOESHERE

Even then, the resolution logic began to penetrate into the app layer.

// Just an illustration. This problem existed before JS was invented

const config = require('config')
const logLevel = process.env.DEBUG ? 'trace' : config.get('log.level') || 'info'
//...
const dbConfig = config.get('Customer.dbConfig')
db.connect(dbConfig, ...)

if (config.has('optionalFeature.detail')) {
  const detail = config.get('optionalFeature.detail')
  //...
}

When centralized configuration management came, the settings has been moved partially to the remote storage. Local pre-config (entrypoints, db credentials) was used to get the rest. Configuration assembly itself has become multi-stage.

Later, specialized systems such as vault made new additions: now env holds an access token and defines an entrypoint by running mode to make a POST request to reveal credentials profile to mix this data to the entire config.

Here's how uniconfig obtains secrets from the vault storage:

{
  "data": {
    "secret": "$vault:data"
  },
  "sources": {
    "vault": {
      "data": {
        "data": {
          "method": "GET",
          "url": "$url:",
          "opts": {
            "headers": {
              "X-Vault-Token": "$token:auth.client_token"
            }
          }
        },
        "sources": {
          "url": {
            "data": {
              "data": {
                "data": {
                  "name": "$pkg:name",
                  "space": "openapi",
                  "env": "$env:ENVIRONMENT_PROFILE_NAME",
                  "vaultHost": "$env:VAULT_HOST",
                  "vaultPort": "$env:VAULT_PORT"
                },
                "template": "{{=it.env==='production' ? 'https': 'http'}}://{{=it.vaultHost}}:{{=it.vaultPort}}/v1/secret/applications/{{=it.space}}/{{=it.name}}"
              },
              "sources": {
                "env": {
                  "pipeline": "env"
                },
                "pkg": {
                  "pipeline": "pkg"
                }
              }
            },
            "pipeline": "datatree>dot"
          },
          "token": {
            "data": {
              "data": {
                "method": "POST",
                "url": "$url:",
                "opts": {
                  "json": {
                    "role": "$pkg:name",
                    "jwt": "$jwt:"
                  }
                }
              },
              "sources": {
                "pkg": {
                  "pipeline": "pkg"
                },
                "jwt": {
                  "data": {
                    "data": {
                      "data": {
                        "tokenPath": "$env:TOKEN_FILE",
                        "defaultTokenPath": "/var/run/secrets/kubernetes.io/serviceaccount/token"
                      },
                      "template": "{{=it.tokenPath || it.defaultTokenPath}}"
                    },
                    "sources": {
                      "env": {
                        "pipeline": "env"
                      }
                    }
                  },
                  "pipeline": "datatree>dot>file"
                },
                "url": {
                  "data": {
                    "data": {
                      "data": {
                        "env": "$env:ENVIRONMENT_PROFILE_NAME",
                        "vaultHost": "$env:VAULT_HOST",
                        "vaultPort": "$env:VAULT_PORT"
                      },
                      "template": "{{=it.env==='production' ? 'https': 'http'}}://{{=it.vaultHost}}:{{=it.vaultPort}}/v1/auth/kubernetes/login"
                    },
                    "sources": {
                      "env": {
                        "pipeline": "env"
                      },
                      "pkg": {
                        "pipeline": "pkg"
                      }
                    }
                  },
                  "pipeline": "datatree>dot"
                }
              }
            },
            "pipeline": "datatree>http>json"
          }
        }
      },
      "pipeline": "datatree>http>json"
    }
  }
}

Meanwhile, formats have been evolving (JSON5, YAML), config entry points are constantly changing. These fluctuations, fortunately, were covered by tools like the cosmiconfig.

[
  'package.json',
  `.${moduleName}rc`,
  `.${moduleName}rc.json`,
  `.${moduleName}rc.yaml`,
  `.${moduleName}rc.yml`,
  `.${moduleName}rc.js`,
  `.${moduleName}rc.ts`,
  `.${moduleName}rc.mjs`,
  `.${moduleName}rc.cjs`,
  `.config/${moduleName}rc`,
  `.config/${moduleName}rc.json`,
  `.config/${moduleName}rc.yaml`,
  `.config/${moduleName}rc.yml`,
  `.config/${moduleName}rc.js`,
  `.config/${moduleName}rc.ts`,
  `.config/${moduleName}rc.cjs`,
  `${moduleName}.config.js`,
  `${moduleName}.config.ts`,
  `${moduleName}.config.mjs`,
  `${moduleName}.config.cjs`,
]

Configs are still trying to be declarative, but they can't. Templates appeared first.

template:
    metadata:
      annotations:
        cni.projectcalico.org/ipv4pools: '["${APP_NAME}"]'
        vault.hashicorp.com/agent-init-first: "true"
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/secrets-injection-method: "env"
        vault.hashicorp.com/secrets-type: "static"
        vault.hashicorp.com/agent-inject-secret-${APP_NAME}: secret-v2/applications/${DEPLOYMENT_NAMESPACE}/${APP_NAME}
        vault.hashicorp.com/agent-inject-template-${APP_NAME}: |
          {{ with secret "secret-v2/applications/${DEPLOYMENT_NAMESPACE}/${APP_NAME}" }}
            {{- range $secret_key, $secret_value := .Data.data }}
            export {{ $secret_key }}={{ $secret_value }}
            {{- end }}
          {{ end }}
        vault.hashicorp.com/auth-path: ${AUTH_PATH}
        vault.hashicorp.com/role: ${APP_NAME}

Then templates inside templates. With commands and scripts invocations inside dynamic DSL wrapped into matrices.

      - uses: actions/cache@v3
        id: yarn-cache
        with:
          path: ${{ needs.init.outputs.yarn-cache-dir }}
          key: ${{ runner.os }}-yarn-${{ hashFiles('**/yarn.lock') }}
          restore-keys: |
            ${{ runner.os }}-yarn-

      - name: Restore artifact from cache (if exists)
        uses: actions/cache@v3
        with:
          path: artifact.tar
          key: artifact-${{ needs.init.outputs.checksum }}

      - name: Check artifact
        if: always()
        id: check-artifact
        run: echo "::set-output name=exists::$([ -e "artifact.tar" ] && echo true || echo false)"

As we can see, syntax complexity increases as the cost of declarativeness. It's still unclear how this problem can be mitigated. Perhaps new specialized formats will appear or more strict forms (schemas) of using existing ones will be introduced.

Budget loss

Anyway, ::$([ is definitely not an optimal solution. Сonfusing, fragile and overcomplicated for the most developers. For example, here is how Python Engineer was fighting against kube.yaml:

fix vault in kube yaml Jul 04   XS      
fix vault in kube yaml Jul 04   XS
fix vault in kube yaml Jul 04   XS
fix vault in kube yaml Jul 04   XS
fix vault in kube yaml Jul 04   XS
fix vault in kube yaml Jul 04   XS
fix vault in kube yaml Jul 04   XS
fix vault in kube yaml Jul 04   XS
fix vault in kube yaml Jul 03   XS
fix vault in kube yaml Jul 03   XS
fix vault in kube yaml Jul 03   XS
fix vault in kube yaml Jul 03   XS
fix vault in kube yaml Jul 03   XS
fix vault in kube yaml Jul 03   XS
...

This is definitely not configuring but more guessing. On a company scale, such exercises are a significant waste of resources. And this experience is almost one-time only, which cannot be formalized and transmitted except by copy-paste. Every time we see the same thing, with a different number of attempts.

What we need

The overcomplexity problem seems to have arisen from the fact that we combined resolving, processing and accessing data into one structure. Although the entire theory of programming / CS instructs us to do exactly the opposite. Separation of concerns: imagine a config which explicitly divides value resolutions, compositions and operations.

{
  "data": "<how to expose values>",
  "sources": "<how to resolve values>",
  "cmds": "<available cmds/ops/fns>"
}

Let data to represent how the result structure may be built if all the required transformations were made — like a mapping.

{
  "data": {
    "a": {
      "b": "$b.some.nested.prop.value.of.b",
      "c": "$external.prop.of.prop"
    }
  }
}

Templating bases on regular substring replacements:

String.format("foo %s", "bar")                   // gives 'foobar'
// But positional contract is enhanced with named refmap
String.format("foo $a $b $a", {"a": "A", "b": "B"}) // returns 'foo A B A'
//            ↑ data chunks ↑ sources map

Let sources to describe how to obtain and process values for referencing in data map. Like reducing pipelines.

{
  "sources": {
    "a": "<pipeline 1>",
    "b": "<pipeline 2>"
  }
}

Let pipeline to compose actions in natural ~~human~~ dev-readable format like CLI:

cmd param > cmd2 param param > ... > cmd3

Let intermediate values be referenced by lateral (bubbling concept) or nested contexts.

{
  "sources": {
    "a" : "cmd param",
    "b": "cmd $a" // b refers to a
  }
}

Apply DAG walker for consistency checks and processing.

Working draft

https://github.com/antongolub/misc/tree/master/packages/topoconfig

Install

yarn add topoconfig@draft

API

import {topoconfig} from 'topoconfig'

const config = await topoconfig({
  // define functions to use in pipelines: sync or async
  cmds: {
    foo: () => 'bar',
    baz: async (v) => v + 'qux'
  },
  // pipelines to resolve intermediate variables
  sources: {
    a: 'foo > baz', // pipeline returns 'barqux'
    b: {            // b refers to b.data
      data: {
        c: {
          d: 'e'
        }
      }
    }
  },
  // output value
  data: {
    // $name.inner.path populates var ref with its value
    x: '$b.c.d',  // 'e'
    y: {
      z: '$a'     // 'barqux'
    }
  }
})

Real-world usage example may look like:

import {topoconfig} from 'topoconfig'

const config = await topoconfig({
  data: {
    foo:      '$a',
    url:      'https://some.url',
    param:    'regular param value',
    num:      123,
    pwd:      '\\$to.prevent.value.inject.use.\.prefix',
    a: {
      b:      '$b.some.nested.prop.value.of.b',
      c:      '$external.prop.of.prop'
    },
    log: {
      level:  `$loglevel`
    }
  },
  sources: {
    a:        'file ./file.json utf8',
    b:        'json $a',
    c:        'get $b > assert type number',
    cwd:      'cwd',
    schema:   'file $cwd/schema.json utf8 > json',
    external: 'fetch http://foo.example.com > get .body > json > get .prop > ajv $schema',
    extended: 'extend $b $external',
    loglevel: 'find $env.LOG_LEVEL $argv.log-level $argv.log.level info',
    template: `dot {{? $name }}
<div>Oh, I love your name, {{=$name}}!</div>
{{?? $age === 0}}
<div>Guess nobody named you yet!</div>
{{??}}
You are {{=$age}} and still don't have a name?
{{?}} > assert $foo`,
  },
  cmds: {
    // http://olado.github.io/doT/index.html
    dot:      (...chunks) => dot.template(chunks.join(' '))({}),
    extend:   Object.assign,
    cwd:      () => process.cwd(),
    file:     (file, opts) => fs.readFile(file, opts),
    json:     JSON.parse,
    get:      lodash.get,
    argv:     () => minimist(process.argv.slice(2)),
    env:      () => process.env,
    find:     (...args) => args.find(Boolean),
    fetch:    async (url) => {
      const res = await fetch(url)
      const code = res.status
      const headers = Object.fromEntries(res.headers)
      const body = await res.body()

      return {
        res,
        headers,
        body,
        code
      }
    },
    //...
  }
})

Notes

export type TData = number | string | { [key: string]: TData } | { [key: number]: TData }
export type TCmd = (...opts: any[]) => any
export type TCmds = Record<string | symbol, TCmd>
export type TConfigDeclaration = {
  data: TData,
  sources: Record<string, string | TConfigDeclaration>
  cmds?: TCmds
}

TConfigDeclaration defines two sections: data and sources:

data describes how to build the result value based on the bound sources: it populates $-prefixed refs with their values in every place.
sources is a map, which declares the algorithm to resolve intermediate values through cmd calls composition. To fetch data from remote, to read from file, to convert, etc.

{
  "data": "$res",
  "sources": {
    "res": "fetch https://example.com > get .body > json"
  }
}

cmd is a provider that performs a specific action.

type TCmd = (...opts: any[]) => any

directive is a template for defining a value transformation pipeline

// fetch http://foo.example.com > get body > json > get .prop
// ↑ cmd ↑opts                  ↑ pipes delimiter

Next steps

Add ternaries: cmd ? cmd1 > cmd2 ... : cmd
Handle or statement: cmd > cmd || cmd > cmd
Expose commands presets: import {cmds} from 'topoconfig/cmds' or @topoconfig/cmds
Provide lazy-loading for cmds:

{
  cmds: {
    foo: 'some/package',
    bar: './local/plugin.js'
  }
}

Support a pipeline factory as cmd declaration.

{
  cmds: {
    readjson: 'path $0 resolve > file $1 > json'
  }
}

Use vars as cmd refs:

{
  sources: {
    files: 'glob ./*.*'
    reader: 'detect $files'
    foo: 'file $files.0 > $reader'
  }
}

Bring smth like watchers to trigger graph re-resolution from the specified vertex

antongolub / misc

Experiment on maintaining a multi-project monorepository

<misc>

An experiment on maintaining a multi-project monorepository

Statuses

Blueprint/B marks the project as an idea w/o any implementation provided. Just a contract proposal.
PoC/C — proof of concept that shows the declared behavior in action.
Working draft/W — the project work is in progress. Some known corner cases are not covered, but it's already mostly usable.
Production ready/R — the implementation is stable, documented, tested and ready for use.
Deprecated/D — the project is no longer maintained.

Package	Description	Status
@antongolub/blank	Blank TS project
@antongolub/infra	Repo infra assets
@topoconfig/cmds	Topoconfig basic cmds preset	W
@topoconfig/extends	Flexible config extender	W
depseek	Seeks for dependency references in JS/TS code	W
depshot	Gathers deps snapshot by analyzing sources	B
envader	Occupies env vars for data storage	C
envimist	Applies minimist to process.env	W
esbuild-c	Empowers esbuild with config processing	C
esbuild-plugin-entry-chunks	Esbuild plugin to compose entryPoints as chunks	C
lcov-utils

…