Please reach out to me on Twitter @nathangloverAUS if you have follow up questions!
This post was originally written on DevOpStar. Check it out here
Amazon Managed Grafana has an unfortunate limitation where API keys created have a maximum expiration of 30 days. This limitation is quite frustrating if you were trying to automate the deployment of Grafana dashboards and datasources as part of your CI/CD pipeline - as you would need to manually update the API key every 30 days or your deployments would fail.
This problem is exacerbated by the fact that Amazon Managed Grafana API keys are billed out at the cost of a full user license - so you cannot simply create a new API key every time you deploy your dashboards either. I found this out the expensive way when I didn't read the pricing guide properly and created an API key for each deployment ($8 a key).
Hopefully, Amazon will address this limitation in the future - but in the meantime, I've written a simple pattern that can be used to automatically rotate an API key every 30 days and store it for use in AWS Secrets Manager. At the end of this post, I outline my plea to Amazon to address this limitation along with some suggestions on how they could do it.
Overview of the solution
I've opted to build this example in terraform as it is most likely you will be wanting to deploy Grafana dashboards as part of your CI/CD pipeline. Terraform is arguably the best tool for this - however, there is no reason why the code in this post couldn't be adapted to work with other tools such as CloudFormation or AWS CDK.
The solution is made up of two components:
- AWS Secret is created with a rotation lifecycle policy that will trigger a Lambda function every 30 days
- AWS Lambda Function that will create a new API key in Amazon Managed Grafana and update the AWS Secret with the new key
It is expected that you will have already created an Amazon Managed Grafana instance (though you can copy the terraform from this post side by side with your existing terraform and it will work).
The source code for this example can be found at t04glovern/amazon-managed-grafana-api-key-rotation-terraform.
Solution Walkthrough
We'll begin by looking at the python code that is in charge of rotating the Managed Grafana API keys - as understanding how that works will help when we look at the terraform code.
Look at src/rotate.py in the source code for this example
The function is going to expect three environment variables to be present
grafana_secret_arn = os.environ['GRAFANA_API_SECRET_ARN']
grafana_api_key_name = os.environ['GRAFANA_API_KEY_NAME']
grafana_workspace_id = os.environ['GRAFANA_WORKSPACE_ID']
NOTE: While you don't technically need to pass the secret ARN as it is available in the Lambda context when the function is invoked by secrets manager, I've opted to do so as it makes it easier to understand.
Next, we attempt to delete any existing API keys with the same name as the one we are about to create. This is to ensure that we clean up old API keys and we don't get billed for duplicates (even though you cannot have multiple API keys with the same name).
try:
grafana_client.delete_workspace_api_key(
keyName=grafana_api_key_name,
workspaceId=grafana_workspace_id
)
except grafana_client.exceptions.ResourceNotFoundException:
pass
Following cleanup, we create a new API key with a 30-day expiration.
try:
new_api_key = grafana_client.create_workspace_api_key(
keyName=grafana_api_key_name,
keyRole='ADMIN',
secondsToLive=2592000,
workspaceId=grafana_workspace_id
)['key']
except botocore.exceptions.ClientError as error:
logger.error(error)
return {
'statusCode': 500,
'message': 'Error: Failed to generate new API key'
}
The last step is to update the AWS Secret with the new API key.
try:
secretmanager_client.update_secret(
SecretId=grafana_secret_arn,
SecretString=new_api_key
)
except botocore.exceptions.ClientError as error:
logger.error(error)
return {
'statusCode': 500,
'message': 'Error: Failed to update secret'
}
Now that we've seen how the Lambda function works, let's look at the terraform code that will deploy it.
variables.tf
There are two expected variables for this solution to function - You can however substitute these with hardcoded values or references to your own terraform resources instead.
variable "name" {
type = string
description = "Named identifier for the workspace and related resources"
}
variable "grafana_workspace_id" {
type = string
description = "The ID of the Grafana workspace to manage"
}
main.tf
The main.tf file is where the bulk of the solution is defined. The first thing we do is create an AWS Secret that will store the API key.
resource "aws_secretsmanager_secret" "api_key" {
name = "${var.name}-api-key"
}
The next part looks complicated but is required to bundle the python code into a zip file that can be deployed to Lambda. The code is zipped up and stored in a zip
directory alongside src
and is not checked into source control.
resource "random_uuid" "lambda_src_hash" {
keepers = {
for filename in setunion(
fileset("${path.module}/src/", "*.py"),
fileset("${path.module}/src/", "requirements.txt"),
) :
filename => filemd5("${path.module}/src/${filename}")
}
}
data "archive_file" "lambda_zip" {
depends_on = [
null_resource.install_dependencies
]
type = "zip"
source_dir = "${path.module}/src/"
excludes = [
"__pycache__"
]
output_path = "${path.module}/zip/${random_uuid.lambda_src_hash.result}.zip"
}
Unfortunately, because AWS Lambda python runtime is not running the most up-to-date boto3 core version - we must force the terraform to install and zip a more recent version of boto3 which complicates this solution quite a lot. If you are reading this post in the future, check out the AWS Lambda Python Runtime page to see if this is still required. As of writing this, version boto3-1.20.32
is the latest version available and boto3-1.26.65
is needed.
resource "null_resource" "install_dependencies" {
provisioner "local-exec" {
command = "pip install -r ${path.module}/src/requirements.txt -t ${path.module}/src/ --upgrade"
}
triggers = {
dependencies_versions = filemd5("${path.module}/src/requirements.txt")
}
}
Skipping over the IAM role and policy terraform (which I won't explain in this post, but you can find in the source code on GitHub), we can see that the Lambda function is created and provided the environment variables we defined earlier.
resource "aws_lambda_function" "api_key_rotation" {
function_name = "${var.name}-api-key-rotation"
filename = data.archive_file.lambda_zip.output_path
source_code_hash = data.archive_file.lambda_zip.output_base64sha256
handler = "rotate.lambda_handler"
runtime = "python3.9"
environment {
variables = {
GRAFANA_API_SECRET_ARN = aws_secretsmanager_secret.api_key.arn
GRAFANA_API_KEY_NAME = "${var.name}-mangement-api-key"
GRAFANA_WORKSPACE_ID = var.grafana_workspace_id
}
}
role = aws_iam_role.api_key_rotation_lambda_role.arn
}
With both the lambda and secret created, a Secret manager rotation schedule is created to invoke the lambda function every 29 days.
resource "aws_lambda_permission" "secrets_manager_api_key_rotation" {
statement_id = "AllowExecutionFromSecretsManager"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.api_key_rotation.function_name
principal = "secretsmanager.amazonaws.com"
}
resource "aws_secretsmanager_secret_rotation" "api_key" {
secret_id = aws_secretsmanager_secret.api_key.id
rotation_lambda_arn = aws_lambda_function.api_key_rotation.arn
rotation_rules {
automatically_after_days = 29
}
}
If you only need the terraform to deploy an API key - but do not necessarily use it straight away, then you could get away with not including the final part of the terraform code. However, if there is a requirement to use the API key immediately after deployment in the same Terraform stack, then you will need to add a null_resource
to delay the terraform execution until the secret has been rotated.
I set an arbitrary 20 second delay, but if you wanted to be safe you could increase that.
resource "null_resource" "api_key_delay" {
provisioner "local-exec" {
command = "sleep 20"
}
triggers = {
after = aws_secretsmanager_secret_rotation.api_key.id
}
}
The following terraform can be used to retrieve the API key from the newly created and updated secret.
data "aws_secretsmanager_secret" "api_key" {
depends_on = [
null_resource.api_key_delay
]
arn = aws_secretsmanager_secret.api_key.arn
}
data "aws_secretsmanager_secret_version" "api_key" {
secret_id = data.aws_secretsmanager_secret.api_key.id
}
Plea to Amazon
In this post, we have seen how to use Terraform to deploy an AWS Secret and Lambda function that will rotate the API key for an Amazon Managed Grafana workspace. We did this to get around some frustrating limitations with the way that Amazon Managed Grafana provides API keys that hopefully will be addressed in the future.
My request to Amazon is for them to make this solution I outlined redundant by doing the following:
- Provide a way to vend short-lived session tokens for API access against Managed Grafanas API for use in CI/CD pipelines.
- If this is not possible, provide a way to create API keys that are not tied to a specific user account - but manage the storage and rotation of these keys in Secrets manager similar to RDS: https://aws.amazon.com/about-aws/whats-new/2022/12/amazon-rds-integration-aws-secrets-manager/
- If this is not possible, reduce the price of API keys to $0.01 per month so we can use them in CI/CD pipelines without worrying about the cost.
If you've had this same problem and can think of a better way to solve it, please let me know on Twitter @nathangloverAUS or in the comments below.
Top comments (1)
Awesome solution, works as expected. Thanks for sharing.