How to enable reconciliation windows for a GitOps Setup using the suspension feature of the flux
Kustomize
resource and K8s CronJobs.
When using Flux to manage a K8s cluster every new change in your repository will be immediately applied to the cluster’s state. In some use cases, the newest changes to a GitOps repository should only apply to the cluster within a designated time window. For example, the cluster should reconcile to the newest changes of the GitOps repository only between Monday 8am to Thursday 5pm. Any change coming in to the GitOps repository on Friday or the weekend will have to wait till Monday 8am to be applied.
What are the scenarios this could be used for in real life?
- Sometimes the cluster is connected to external systems, which need to be in maintenance mode before updates can be applied.
- You want to be able to determine a designated time window when the next changes go into production, so that in case of issue you are able to react quickly.
So our problem in short:
We want to be able to predefine time windows to deploy all new changes to a cluster that is managed by Flux.
To make things easier, let's call these time windows "reconciliation windows" and dig right into how to solve the problem.
Pre-requisits:
Core principles
Now how do we create such reconciliation windows using Flux and K8s native resources?
To go there we first need to understand how the Flux Kustomization
and Flux Source
resource work, and how we can leverage this to solve our problem.
When setting up a cluster with Flux there will always be a Source
resource that reconciles the changes from the GitOps repository into the cluster.
After that, the Kustomization
resource will poll the newest changes from the Source
resource and apply them to the cluster.
Now interestingly enough both of the reconciliations of these resources can be suspended.
Suspend Source
/Kustomization
resource from reconciling
flux suspend source <name>
flux suspend kustomization <name>
Resume reconciling of Source
/Kustomization
resource
flux resume source <name>
flux resume kustomization <name>
Suspending the Kustomization
resource means no changes are applied to the cluster:
Since our goal is to suspend the reconciliation of the cluster state, just suspending the Kustomization
resource is enough. The Source
resource can continues syncing content in the predefined interval.
Schedule opening and closing of reconciliation windows
So far so good. But how do we automate this?
Well, K8s has already native ways to support scheduling of jobs, which are CronJob
resources, so why not use them?
With Cron Jobs we can create an open-reconciliation-window-job
and a close-reconciliation-window-job
which will use the Flux CLI and a ServiceAccount
to resume/suspend the kustomizations.
Let's use the “No-deployment Friday” example. For the reconciliation window from every Monday 8:00 am to Thursday 5:00 pm, this is how the jobs would look.
Note: The
ServiceAccount
and the correspondingRoleBinding
andRole
is needed to give the job the right access to perform operations on the cluster resources. For more information on this see the K8s docs on configuring service accounts
# open-reconciliation-window-job.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: open-reconciliation-window
namespace: jobs
spec:
schedule: "0 8 * * MON"
suspend: true
jobTemplate:
spec:
template:
spec:
serviceAccountName: sa-job-runner
containers:
- name: hello
image: ghcr.io/fluxcd/flux-cli:v0.36.0
imagePullPolicy: IfNotPresent
command: ["/bin/sh", "-c"]
args:
- flux resume kustomization infra -n flux-system;
flux resume kustomization apps -n flux-system;
restartPolicy: Never
# close-reconciliation-window-job.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: close-reconciliation-window
namespace: jobs
spec:
schedule: "0 17 * * THU"
suspend: true
jobTemplate:
spec:
template:
spec:
serviceAccountName: sa-job-runner
containers:
- name: hello
image: ghcr.io/fluxcd/flux-cli:v0.36.0
imagePullPolicy: IfNotPresent
command: ["/bin/sh", "-c"]
args:
- flux suspend kustomization infra -n flux-system;
flux suspend kustomization apps -n flux-system;
restartPolicy: Never
Note: you can customize the window times as you want by playing with the scheduling string set in
specs.schedule
. There are a few online tools to help you understand how these cron-strings work, eg crontab guru.
Scale by using GitOps to manage reconciliation windows in GitOps
At this point, we have the capabilities to resume and suspend, but we still need to create the CronJobs
manually for each cluster.
Imagine we have a GitOps repository that manages 10+ clusters. Not all of these clusters will probably have their reconciliation window set at the same time. Also, you don't want to manually have to create these jobs, let alone maintain the jobs if for example more Kustomization
resources get added to the cluster.
Not to worry, there is also a solution for that ;)
I mean we are already using GitOps? Why not stick the definition of the job into the repository as part of our infrastructure?
And why not use kustomize's patch functionality to overwrite the CronJob's cron string to be able to customize the reconciliation window times for each cluster?
If that sounds interesting check out the full sample here.
Now instead of having to manually create the ClusterRole
, RoleBinding
, ServiceAccount
, and CronJobs
, Flux will take care of that for us.
Conclusion
Now this is how we can leverage Flux and K8s native approaches to restrict the application of changes to a cluster to happen only in a reconciliation window.
There are a few advantages to this approach:
- For clusters running on the edge, if the connectivity goes down during a reconciliation window, simple changes will still reconcile normally. This is because the
Source
resource already pulled the newest changes.
Note: Careful this only works for image tag changes if there is a local ACR. Else the new images need to be pre-downloaded to the device
- The GitOps repository reflects the desired state after a reconciliation window of the cluster.
- No need to maintain a custom gateway or such. All the used components are open-source and there is no need for custom logic.
- During the reconciliation windows changes are applied like we used to know from Flux.
What we are however not solving with this, is scheduling fine granular changes. As you might have noticed the granularity end at every resource which is managed by the Kustomization
resource the CronJobs suspend and resume. So individual configuration cannot be managed with this approach.
That did not solve your problem yet and your cluster needs real-time changes, as well as changes within a reconciliation window. Not to worry, got you ;) Check out the next part.
Top comments (3)
what did you use for creating the diagrams in those GIFs?
thanks in advance :)
Hi Assaf, just simple powerpoint. You can create each step in an individual slide and export it as a GIF :)
It is simple but very clear , sometime we don't want to suspend kustomization , the reason is for example if some daemonsets gone from your nodes , what would you do , as you have suspended kustomization , so suspention is not a good solution , we can think of introducing a webhook between kustomize controller and source controller , or b/w kustomize controller and k8s cluster which will give us gate approval solution.