It can be challenging to manage costs if your developers use Kubernetes clusters running in the cloud, whether they use shared clusters or have their own dedicated clusters. It’s difficult to keep track of what workloads are running where, and that gets even harder as you add clusters. Sure, you could rely on people to manually clean up after themselves, but we all know that nobody really likes to clean up. So you’re likely going to end up with a lot of idle containers and wasted resources. And besides increased costs and management headaches, wasting resources is terrible for the environment.
In many cases, you don’t need applications running in your clusters to be always available, especially when these applications run in dev clusters and engineers don't work 24 hours a day. Let’s do some quick math on this: Say that a typical engineer at your company works 40 hours a week. A week contains 168 hours. If the applications in their dev environment run all of the time, that leaves 168 - 40 hours = 128 hours a week where applications are up and ready to take traffic even if the engineer is not working. That equals roughly 76% ( 128 idle hours / 168 hours) of every week. And that’s assuming the engineer is actively using that app for 40 hours, which isn’t likely because they will also be in meetings, grabbing lunch or working on non-coding tasks.
What if I told you that you could automatically suspend workloads in your clusters when they’re not being used? Or even delete unused namespaces?
Sleep Mode is one of our favorite features in Loft because of the significant benefits that our customers get from it and because it's so easy to measure the financial impact. Sleep Mode can automatically scale down your apps when they’re not used to save on cloud resources and overall cost.
How Sleep Mode Works
Sleep mode works based on ReplicaSets. Let’s say for example that you have an NGINX Deployment that is set to run 5 replicas. When Loft detects that the namespace the app is in has been idle for a predefined amount of time, it will automatically scale the NGINX ReplicaSet down to 0 replicas, deleting all of the pods that belong to this NGINX Deployment. Loft remembers that there should be 5 replicas running. Once it detects activity to the namespace (e.g. a kubectl request such as kubect get pods
), it will restore the previous number of 5 replicas, and Kubernetes will spin them back up again.
Loft detects whether the namespace is idle by examining incoming API requests. The Loft API Gateway acts as a proxy for the Kubernetes API server, so it can see when a request is made for a particular namespace. As soon as it sees an API request coming in, like a kubectl command, it will fire the pods back up. That’s the case for any other API requests, like from Helm or other tooling you have in place that uses the kube-context for any of the clusters you connect to Loft.
A developer could simply run kubectl get pods -n namespace
as soon as they sit down at their laptop, and within a few seconds, they’d have their dev environment back up. Since nothing has been changed in their namespace besides the numbers of replicas running, they’ll quickly be back to the place they left off at. And without wasting resources while they were away.
Sleep mode works with both the classic Kubernetes namespaces and virtual clusters.
Enabling Sleep Mode
Let’s look at an example of enabling sleep mode for a user’s namespace. If you’d like to follow along and you’re not currently using Loft, you can run through the first two steps of our quickstart to get Loft installed and running in your cluster.
Add a user by clicking on the Users icon in the left navigation bar. Fill in the relevant information and hit the Create button.
Next, click on Clusters in the left navigation bar and then loft-cluster. Then click on the Accounts tab at the top of the screen.
Next, click on the user that you added. This is their Loft account in the Kubernetes cluster. The settings for sleep mode are under the Space Creation Settings. A space is a virtual representation of a self-service namespace that has additional functionality (like sleep mode).
Here you can adjust the number of minutes before sleep mode kicks in for this user’s spaces. You can also set a time to delete inactive spaces. Auto-delete is a great way to make sure that unused resources inside your clusters get cleaned up automatically.
You could use sleep mode and auto-delete in conjunction, like setting the user’s spaces to sleep after 60 minutes of inactivity and to be deleted after one month of inactivity. Or whatever combination of values works best for your users’ workflows. You can also define separate sleep mode and auto-delete timeout values for each individual namespace if needed.
And if you’re using virtual clusters, the process is the same. Each virtual cluster in Loft has a corresponding Loft space with the same name. You simply apply the sleep mode or auto-delete settings you want for the virtual cluster to either that space or the account that owns it.
You probably don’t want to edit every individual account to enable sleep mode. There are more options, like creating an account template that will enable sleep mode for all accounts created, or adding an annotation to accounts if you create them with YAML and kubectl. This is particularly useful if you use single sign-on for Loft and you want to auto-configure sleep mode and auto-delete for certain Active Directory or Okta user groups, for example.
There’s a lot of flexibility with how you can handle sleep mode, and there are more details in the docs.
Manually Triggering Sleep Mode
Users in your Loft-managed clusters can also trigger sleep mode manually. Just click on Spaces in the left menu, and click the sleep icon above the space.
If the user prefers the command line, they can also run:
loft sleep [SPACE_NAME]
Manually waking up spaces is just as easy. You can wake them up from the Spaces section of the web UI:
As we mentioned earlier, spaces will wake up automatically when they receive API requests, so you could also just run any kubectl command that touches that space, like:
kubectl get pods -n [SPACE_NAME]
Or use the Loft CLI to explicitly wake up a namespace:
loft wakeup [SPACE_NAME]
Conclusion
As you can see, there’s a lot of possibilities with sleep mode and auto-delete. Sleep mode can suspend workloads that aren’t being used, and auto-delete can automatically clean up idle Kubernetes namespaces and virtual clusters. These features can help you eliminate waste in your infrastructure and reduce some of the headaches that come with managing a large number of tenants operating in shared Kubernetes clusters.
At Loft Labs, we believe that self-service Kubernetes is essential, both for developers who don’t want to wait for infrastructure to be provisioned as well as for platform engineers who have huge backlogs and would rather not be responsible for creating every new namespace. With Loft, developers can provision namespaces and virtual clusters when needed, and platform engineers can ensure guardrails are in place to reduce waste. In the end, we all just want to do our jobs the best way we can, and self-service infrastructure is a critical part of making that happen.
Top comments (0)