I am a self-professed want-to-be keyboard-wizard, and I was brought up on amazing text-manipulation tools like grep, sed and awk. Text-based data manipulation can still be very fast and powerful and might save you dozens of lines of code. This post shows how a little knowledge of JSON-parsing tools can go a long way.
Tools
The tools used in this article are as follows:
- AWS CLI
- JMES Path parser
- JQ
- Bash
This combination may mean that some examples below may not map exactly to your particular use-case or configuration, so please bear that in mind.
Basic JSON Formatting and Filtering
I looked at basic formatting and filtering in this post about essential AWS CLI skills. As a quick recap, you can use the --query
command-line parameter to pass a string specifying a query expression on the JSON results, and sometimes these can get quite complex.
If you want a bit more power in your querying, it is worth looking at the JQ tool, which also lets you process JSON structures. The good thing about using JQ is that if can operate on files as well as STDIN, so you can save your JSON output into a file and run JQ over and over. This will definitely be faster when writing your queries and may also save you some AWS processing cost.
As a start, JQ is great for simply pretty-formatting JSON output from something like a lambda function, by just piping it into jq '.'
. This isn't necessary with the AWS CLI because it already formats the JSON output.
AWS CLI examples
The simplest use of the CLI filter is to print a reduced amount of data so that it is more manageable. This uses JMES Path expressions to process the JSON data in the output.
Here are some basic examples. Note that the query expression is added using the --query
argument. In this case we will use CloudFormation output:
aws cloudformation describe-stacks --query "<filter-goes-here>"
Use Case | Command |
---|---|
Show only stack ID, name and update time | 'Stacks[*].[StackId, StackName, LastUpdatedTime]' |
Show stack name/ID for stacks whose name contains 'foo' | 'Stacks[?StackId.contains(@, 'foo')].[StackId, StackName]' |
Show stack name/ID for stacks which have outputs exported and have been updated since Nov 2022 | "Stacks[?Outputs && LastUpdatedTime>'2022-11'].[StackId,StackName,LastUpdatedTime]" |
(Note the final example only works because the date format in LastUpdatedTime
is able to be compared as a string - more on that later.)
JQ
If you want to do serious local-processing of AWS or any other JSON output, you need to get familiar with JQ, an awesome tool which lets you process JSON structures, not only filtering it like the AWS CLI, but also restructuring it.
Here are a few basic ways to use it. Note that you can either pipe input into JQ or provide a filename which contains your JSON.
Use Case | Command |
---|---|
Simply pretty-print JSON output | jq '.' |
Get all the sort keys from a dynamo query response (assuming they are strings) |
jq '.Items[].SortKey.S' |
Put the result into a list | jq '[.Items[].SortKey.S]' |
Only print keys, not values | jq 'keys[]' |
An Important Note
It should be stressed here that QUOTING IS IMPORTANT. You may have noticed above that for the AWS --query
I used double-quotes to enclose the whole expressions, and single-quotes when quoting literals within the expression. But for JQ, you use the opposite, which is made clear in the manual. At least within the environment I am using (bash), not following this will only lead to endless miserable debugging.
Comparing The Two
In general, the --query
filters using JMES are a little more concise than their JQ alternatives, but in my opinion the sequential nature of JQ using pipes (|
) is more readable than JMES. However, there are some other important differences to consider:
CLI --query expression | JQ Expression | |
---|---|---|
Usage | Only with AWS CLI commands | With any output or file |
Output | Only outputs filtered results | Can restructure into new JSON |
Types | Does not handle dates natively | Handles date conversions |
Scripting | Expression must be entered on command-line | Expression can be store in file with comments |
Below are some common queries I've used, with the CLI query and JQ query side-by-side. You should note that all the JQ expressions are wrapped in []
, because by default JQ does not output a list. The AWS CLI query function does output a list, so the additional []
are used to match the outputs. For these examples, I am using DynamoDB output which looks something like this:
{
"Items": [
{
"PartitionKey": { "S": "blog/books/2023-01-drive-daniel-pink/" },
"SortKey": { "S": "1675071762" },
"SomeField": { "N": "54" },
// Other Fields...
},
{
"PartitionKey": { "S": "blog/books/2023-01-drive-daniel-pink/" },
"SortKey": { "S": "1675071862" },
"SomeField": { "N": "44" },
// Other Fields...
},
// More items...
]
}
Example | AWS CLI query expression | JQ Expression |
---|---|---|
Show all sort keys from dynamo output | Items[*].SortKey.S |
[.Items[].SortKey.S] |
Particular attributes from dynamo output | Items[*].[SortKey.S,SomeField.N] |
[.Items[] | [.SortKey.S,.SomeField.N]] |
Filter by field value | Items[?SortKey.S>'1674000000'].SomeField.N |
[.Items[] | select(.SortKey.S>"1674000000").SomeField.N] |
Filter on string prefix | Items[?starts_with(SortKey.S, 'TEXT')].SomeField.N |
[.Items[] | select(.SortKey.S | startswith("TEXT")).SomeField.N] |
If you want an example not using the data above, here is one you can run on your CloudFormation stacks right now. Each command (one JMES and one JQ) will show the last updated time of only your 'Dev'-stage stacks:
bash-5.1$ aws cloudformation describe-stacks --query "Stacks[?contains(Tags[], {Key: 'STAGE', Value: 'dev'})].[StackName,LastUpdatedTime]"
bash-5.1$ aws cloudformation describe-stacks | jq '[.Stacks[] | select(.Tags[] | contains({Key: "STAGE", Value: "dev"})) | [.StackName,.LastUpdatedTime]]'
[
[
"my-sls-stack-dev",
"2023-02-03T07:56:49.848Z"
],
[
"my-www-stack-dev",
"2022-12-20T11:07:21.155Z"
]
]
Getting more complex
Let's do some sorting. Yes, they can do that, and have many other functions built in!
Example | AWS CLI query expression | JQ Expression |
---|---|---|
Sort output numerically by field values* | sort_by(Items[*], &to_number(SomeField.N))[*][SortKey.S,SomeField.N] |
[.Items[] | [.SortKey.S,.ServiceTime.N]] | sort_by(.[1] | tonumber) |
Sum fields (e.g. get total page access time ) | sum(map(&to_number(ServiceTime.N), Items[*])) |
[.Items[].ServiceTime.N | tonumber] | add |
Perform counting e.g. sum of pages accessed by Mozilla | length(Items[?AgentString && starts_with(AgentString.S, 'Mozilla')]) |
[.Items[] | select(.AgentString.S | startswith("Mozilla"))] | length |
* Note the expressions convert the fields to numbers here so as to sort numerically rather than textually
Wrapping Up
This post shows that it is possible to perform some complicated transformations on JSON output data. I think most people could see that the above commands, which you can almost call 1-liners, can replace a whole JavaScript or Python function and allow you to perform complicated ad-hoc and maybe even regular tasks, with much less development overhead.
In fact, because JQ can read your filter expression from a file (which can also contain comments), complex filters can turn into 1-liners, with JQ as your script interpreter. This also means that you can version-control and track your JQ scripts.
The specifications of these tools are quite similar and are available in library form in JavaScript, Python, Go and many other languages.
More Resources
Filtering output from the AWS CLI
This post was adapted from a larger post on my blog. See the full post here
Top comments (0)