When designing cloud environments, it is often recommended to set up multiple accounts. While this approach offers resource independence, isolation, better security, access, and billing boundaries, it also comes with its own set of issues. One such challenge is efficiently promoting and tracking applications between different environments.
The GitOps approach, along with tools like ArgoCD and Kustomize, simplifies tracking and promotion. However, image promotion is often overlooked. Many enterprises adopt a shared image registry, but it soon becomes bloated with many unused versions.
This article explores a recent journey during which we examined the problem of promoting images and the innovative solution that was adopted, all while adhering to the principles of GitOps.
Challenge
Recently, a scenario was presented where a company utilizing the shared ECR registry was considering migrating to separate ECR registries for cost-effectiveness, better governance, and streamlined lifecycle management.
Here is a look at the existing state of infrastructure and pipelines:
Each environment has a dedicated AWS account with its own cluster and ArgoCD installation.
Kustomize is used for managing configuration differences across environments.
├── infra
│ ├── charts/
└── overlays
├── dev
│ ├── patch-image.yaml
└── production
├── patch-image.yaml
└── patch-replicas.yaml
- Jenkins is used to continuously build new images in the development environment.
However, none of the tools provided out-of-the-box support for promoting images between ECR registries, leading to the exploration of innovative solutions with some considerations.
Considerations:
Selective Promotion: The company’s application landscape is composed of multiple modules and teams with different timelines. Therefore, it is necessary to support the promotion of images for only selected modules in each release.
Optimized Storage: Environments such as production only need to store promoted image versions, reducing clutter and optimizing resource usage.
Image Tag and Digest Replication: Replicating image tags and digests between ECR registries is critical for security, and traceability.
Potential Solutions
At the outset, two potential solutions were proposed:
ECR Cross Account Replication: AWS’s ECR natively supports replicating images between two accounts. However, as of now, there is no way to filter the images being replicated based on any criteria. Alternatively, AWS recommends event-based design to selectively replicate images based on tag naming conventions. However, since we are not aware of which versions will be promoted, it requires an additional step of retagging before promotion.
Jenkins Promotion Pipeline: A Jenkins pipeline that parses Kustomize Overlays for image tags and programmatical replicates them.
Both options are viable, but they introduce an additional layer of complexity to the promotion process. Additionally, you need to ensure that images are promoted before Kustomize overlays are updated*.*
The Winning Strategy: ArgoCD PreSync Job
In this scenario, the client was already using ArgoCD for continuous deployment of the application changes. Therefore, we decided to also assign ArgoCD the responsibility of delivering images to the target environment cluster.
ArgoCD supports hooks that allow you to run custom scripts before or after a deployment or synchronization process.
1. ECR Repository Permission: Authorize cross-account pull access for Docker images
To enable ArgoCD to pull images from the source ECR, we need to add a resource-based policy to our repository.
// cross-account-ecr-read-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPull",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{DESTINATION_ACCOUNT}:root" // Replace with your destination account
},
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:BatchGetImage",
"ecr:GetDownloadUrlForLayer"
]
}
]
}
Apply the policy to ECR repositories:
aws ecr set-repository-policy --repository-name example
--policy-text "file://cross-account-ecr-read-policy.json"
// For multiple repositories:
aws ecr describe-repositories --query "repositories[].[repositoryName]"
| xargs -I {} aws ecr set-repository-policy --repository-name {} --policy-text "file://cross-account-ecr-read-policy.json"
2. PreSync Hook Job: Copy image between accounts
We use Crane to copy images without changing their tag and digest.
The PreSync Hook job is stored in git along with other application manifests and monitored by ArgoCD. ArgoCD runs the job before the synchronizing changes.
The source account is the Development or DevOps account from which the images will be pulled.
The destination account is the Production or target environment where the image needs to be copied.
// Helm template example
apiVersion: batch/v1
kind: Job
metadata:
generateName: argo-presync-promote-image-
annotations:
argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
template:
spec:
volumes:
- name: creds
emptyDir: {}
initContainers:
- name: aws-creds
image: public.ecr.aws/aws-cli/aws-cli
command:
- sh
- -c
- |
aws ecr get-login-password > /creds/ecr
volumeMounts:
- name: creds
mountPath: /creds
containers:
// For brevity, I have assumed that all Helm values are available on the root.
- name: promote-image
image: gcr.io/go-containerregistry/crane:debug
command:
- sh
- -c
- |
// Login to both ECR registries
cat /creds/ecr | crane auth login {{.Values.sourceAccount}}.dkr.ecr.us-east-1.amazonaws.com -u AWS --password-stdin
cat /creds/ecr | crane auth login {{.Values.destinationAccount}}.dkr.ecr.us-east-1.amazonaws.com -u AWS --password-stdin
// Copy image from source account to destination account
crane copy {{.Values.image | replace .Values.destinationAccount .Values.sourceAccount}} {{.Values.image}}
volumeMounts:
- name: creds
mountPath: /creds
restartPolicy: Never
backoffLimit: 2
Conclusion
In conclusion, the team was able to promote images on demand by using the pre-sync hook. This made production promotion a single step of updating the Kustomize overlays.
I would love to hear about other options that you have adopted. For instance, an alternative approach could be to use Kubernetes Dynamic Admission Control to intercept and pull missing images on demand.
Top comments (0)