A while ago, I wrote about using IAM Roles for ServiceAccounts on kOps.
In short, this feature lets you define an AWS IAM Policy for a given ServiceAccount, and kOps will create the respective AWS IAM Role,
assign the policy and establish a trust relationship allowing the ServiceAccount to assume the IAM Role.
Challenge of configuring workloads
While kOps elegantly handles what happens on the AWS side, we had not implemented anything that configures Pods to actually make
use of the IAM Role. Indeed, some of the more frequently asked support questions
in the kOps Slack channels have been around how to configure applications to assume roles.
The kOps documentation
recommended directly adding the volumes and environment variables to the Pod spec,
but it is not obvious exactly what needs to be added, and you have to manually fetch the actual role ARN that kOps creates from the AWS API or console.
The pod identity webhook
On EKS, the pod identity webhook is commonly used as the mechanism for adding the necessary parts of the Pod spec.
This webhook looks for ServiceAccounts with a specific set of annotations telling it what ARN it can assume and various other settings. When a Pod is created that uses one of
these ServiceAccounts, the webhook mutates the Pod using information found in the ServiceAccount annotations.
Configuring these annotations is a lot simpler than directly configuring the Pod spec.
Typically, EKS-specific tooling "owns" the ServiceAccount, which makes linking the role/ServiceAccount pair simpler, but also means that
ServiceAccounts cannot be managed together with the application using them.
For various reasons, installing the webhook on kOps was not that straightforward. For example, one could tell the webhook to use mounted TLS secrets. It could only use the CSR API.
And even when the webhook was installed, you had to manually annotate ServiceAccounts with the role ARN that the Pods should try to assume.
kOps could have "owned" the ServiceAccounts configured in the Cluster spec as well, but I feel the ownership of ServiceAccounts should be with the application and not the cluster.
Webhook the kOps way
As mentioned towards the end of my previous article,
because kOps already knows the mapping between ServiceAccounts and IAM roles, there shouldn't be any need for
users to copy the ARN from AWS into the ServiceAccount annotation. Something should be able to just read the mapping in the Cluster spec
and and configure workloads accordingly.
I wrote this could be a webhook similar to the pod identity webhook. But why not just implement this as a feature in the pod identity webhook?
The EKS team was very open to the idea, and a PR later, the webhook can be configured to look for additional Pods to mutate.
After this PR, the webhook will:
- First look for annotations on the ServiceAccount as before.
- If no annotations are found on the ServiceAccount, the webhook will look for a mapping configured in the pod-identity-webhook ConfigMap.
Using the pod identity webhook addon
As of kOps 1.23, kOps supports the webhook as a managed addon. When installed, kOps will populate the webhook ConfigMap based on the spec.iam.serviceAccountExternalPermissions
struct.
Installing
Before continuing, make sure you already have a kOps 1.23 cluster with an AWS OIDC provider enabled.
See my previous article on how to go about that.
Once your cluster is running 1.23, you can enable the webhook by adding the following to your cluster spec:
spec:
certManager:
enabled: true
podIdentityWebhook:
enabled: true
The cert manager addon is required to establish the trust between the webhook and the API server.
Now run kops update cluster --yes
and wait a minute or so for the control plane to deploy the addon(s).
Adding a ServiceAccount mapping
Start by granting a set of AWS privileges to a ServiceAccount:
spec:
iam:
serviceAccountExternalPermissions:
- aws:
policyARNs:
- arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess
name: pod-identity-webhook-test
namespace: default
Running kops update cluster
you will see something like the following:
IAMRole/pod-identity-webhook-test.default.sa.<cluster>
Tags {Name: pod-identity-webhook-test.default.sa.<clusterZ, KubernetesCluster: <cluster>, kubernetes.io/cluster/<cluster>: owned}
ExportWithID default-pod-identity-webhook-test
IAMRolePolicy/external-pod-identity-webhook-test.default.sa.test.<cluster>
Role name:pod-identity-webhook-test.default.sa.test.<cluster>
ExternalPolicies [arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess]
Managed true
...
+ config: '{"default/pod-identity-webhook-test":{"RoleARN":"arn:aws:iam::<account>:role/pod-identity-webhook-test.default.sa.<cluster>","Audience":"amazonaws.com","UseRegionalSTS":true,"TokenExpiration":0}}'
- config: '{}'
kOps wants to create an IAM role for the ServiceAccount and assign it the AmazonEC2ReadOnlyAccess
policy.
You can also see that it populates the mapping information into the pod-identity-webhook ConfigMap.
Run kops update cluster --yes
to apply the changes. Then run kubectl logs -n kube-system -l app=pod-identity-webhook -f
and observe the webhook picking up the mapping.
I0319 07:10:28.312786 1 cache.go:186] Adding SA default/pod-identity-webhook-test to CM cache: &{RoleARN:arn:aws:iam::<account>:role/pod-identity-webhook-test.default.sa.<cluster> Audience:amazonaws.com UseRegionalSTS:true TokenExpiration:86400}
Deploying the workload
Once the mapping is in place, we can deploy the ServiceAccount and a Pod using that ServiceAccount. It's important to remember that the webhook will only mutate Pods on creation, so it must be aware of the mapping before the Pod is created.
Deploy the following to the cluster:
apiVersion: v1
kind: ServiceAccount
metadata:
name: pod-identity-webhook-test
namespace: default
---
apiVersion: v1
kind: Pod
metadata:
name: pod-identity-webhook-test
namespace: default
spec:
containers:
- name: aws-cli
image: amazon/aws-cli:latest
command:
- sleep
- "300"
serviceAccountName: "pod-identity-webhook-test"
You should now see the following in the webhook logs:
I0319 07:39:33.373273 1 cache.go:80] Fetching sa default/pod-identity-webhook-test from cache
I0319 07:39:33.373346 1 handler.go:423] Pod was mutated. Pod=pod-identity-webhook-test, ServiceAccount=pod-identity-webhook-test, Namespace=default
I0319 07:39:33.373522 1 middleware.go:132] path=/mutate method=POST status=200 user_agent=kube-apiserver-admission body_bytes=1441
And running kubectl get pod pod-identity-webhook-test -o yaml
you should see that the Pod has been mutated and now contains the expected volumes and environment variables.
Testing that it works.
To confirm everything is good, you can run the following
$ kubectl exec -it -n default pod-identity-webhook-test -- aws sts get-caller-identity
{
"UserId": "AROAV6PNU2XQTMAZ64FBK:botocore-session-1647675906",
"Account": "<account>",
"Arn": "arn:aws:sts::409057154529:assumed-role/pod-identity-webhook-test.default.sa.<cluster>/botocore-session-1647675906"
}
You can also check that the Pod is allowed to use the granted privileges by running something like the following:
`kubectl exec -it -n default pod-identity-webhook-test -- aws ec2 describe-instances --region eu-central-1`
Conclusion
Hopefully this will make the use of IRSA on kOps-based clusters much simpler. And I hope this post will explain how things work under the hood.
As always, I appreciate feedback on this feature and if this is useful for you.
Top comments (1)
This is great! thanks