Kubernetes is a platform for building platforms. It's a better place to start; not the endgame.
- Kelsey Hightower ( @kelseyhightower)
What is EKS?
Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service running on AWS. It takes away the bulk of the pain of managing a Kubernetes service by running the master tier for you. As with all AWS services, security is a Shared Responsibility Model. Amazon ensure the security of the master tier, but what you run inside the cluster – that’s up to you.
What is Pod Security Policy?
In Kubernetes, workloads are deployed as Pods, which expose a lot of the functionality of running Docker containers.
Pod Security Policies are cluster-wide resources that control security sensitive aspects of pod specification. PSP objects define a set of conditions that a pod must run with in order to be accepted into the system, as well as defaults for their related fields. PodSecurityPolicy is an optional admission controller that is enabled by default through the API, thus policies can be deployed without the PSP admission plugin enabled. This functions as a validating and mutating controller simultaneously.
Pod Security Policies allow you to control:
- The running of privileged containers
- Usage of host namespaces
- Usage of host networking and ports
- Usage of volume types
- Usage of the host filesystem
- A white list of Flexvolume drivers
- The allocation of an FSGroup that owns the pod’s volumes
- Requirements for use of a read only root file system
- The user and group IDs of the container
- Escalations of root privileges
- Linux capabilities, SELinux context, AppArmor, seccomp, sysctl profile
Lets approach this subject from a application standpoint. Most applications are deployed into EKS in form of deployments running pods. Or sample deployment will be such:
Assuming we have agreen-field EKS with no special security controls on cluster/namespaces :
In the manifest alpine-restricted.yml , we are defining a few security contexts at the pod and container level.
runAsUser: 1000 means all containers in the pod will run as user UID 1000
fsGroup: 2000 means the owner for mounted volumes and any files created in that volume will be GID 2000
allowPrivilegeEscalation: false means the container cannot escalate privileges
readOnlyRootFilesystem: true means the container can only read the root filesystem
To enable these PSPs we need to delete a PSP. You see, by default pods are in fact validated against a PSP by default – it’s just that it allows everything and is accessible to everyone. it’s called eks.privileged
Note that by doing this, you can bork your cluster – make sure you’re ready to recreate it.
kubectl delete psp eks.privileged
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: alpine-restricted
labels:
app: alpine-restricted
spec:
replicas: 1
selector:
matchLabels:
app: alpine-restricted
template:
metadata:
labels:
app: alpine-restricted
spec:
securityContext:
runAsUser: 1000
fsGroup: 2000
volumes:
- name: sec-ctx-vol
emptyDir: {}
containers:
- name: alpine-restricted
image: alpine:3.9
command: ["sleep", "3600"]
volumeMounts:
- name: sec-ctx-vol
mountPath: /data/demo
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
Upon deployment it will fail to launch with the following error:
> kubectl apply -f alpine-restricted.yaml
> kubectl get deploy,rs,pod
# output
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.extensions/alpine-test 0/1 0 0 67s
NAME DESIRED CURRENT READY AGE
replicaset.extensions/alpine-test-85c976cdd 1 0 0 67s
> kubectl describe replicaset.extensions/alpine-test-85c976cdd | tail -n3
# output
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 114s (x16 over 4m38s) replicaset-controller Error creating: pods "alpine-test-85c976cdd-" is forbidden: unable to validate against any pod security policy: []
Without Pod Security Policies, the replication controller cannot create the pod. Let's deploy a restricted PSP and create a ClusterRole and ClusterRolebinding, which will allow the replication-controller to use the PSP defined above.
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp.restricted
annotations:
seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default'
spec:
readOnlyRootFilesystem: true
privileged: false
allowPrivilegeEscalation: false
runAsUser:
rule: 'MustRunAsNonRoot'
supplementalGroups:
rule: 'MustRunAs'
ranges:
- min: 1
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
- min: 1
max: 65535
seLinux:
rule: 'RunAsAny'
volumes:
- configMap
- emptyDir
- secret
--------
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: psp:restricted
rules:
- apiGroups:
- policy
resourceNames:
- psp.restricted
resources:
- podsecuritypolicies
verbs:
- use
--------
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: psp:restricted:binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: psp:restricted
subjects:
# Apply PSP to service account of the controllers in kube-system that bring up pods
# DaemonSet is excluded since they are used to interact with the host
- kind: ServiceAccount
namespace: kube-system
name: replication-controller
- kind: ServiceAccount
namespace: kube-system
name: replicaset-controller
- kind: ServiceAccount
namespace: kube-system
name: job-controller
the PSP psp.restricted have some restrictions for pods :
- the root filesystem must be readonly
- it doesn't allow for privileged containers
- privilege escalations are by forbidden readOnlyRootFilesystem: true
- the user must be nonroot runAsUser: rule: 'MustRunAsNonRoot'
- fsGroup and supplementalGroups cannot be root defined allowed range
- the pod can only use the volumes specified: configMap, emptyDir and secret
- in the annotations section there is a seccomp profile, which is used by containers. Due to these annotations, the PSP will mutate the podspec before deploying it. If we check the pod manifest in Kubernetes, we will see seccomp annotations defined there as well.
Re-deploy the alpine-restricted.yml and inspect the pod annotation
kubectl get pod/alpine-test-85c976cdd-k69cn -o jsonpath='{.metadata.annotations}'
# output
map[kubernetes.io/psp:psp.restricted seccomp.security.alpha.kubernetes.io/pod:docker/default]
We can see that there are two annotations set:
- kubernetes.io/psp:psp.restricted
- seccomp.security.alpha.kubernetes.io/pod:docker/default
The first is the PodSecurityPolicy(psp.restricted) used by the pod.
The second is the seccomp profile used by the pod. Seccomp (secure computing mode) is a Linux kernel feature used to restrict the actions available inside a container.
Lets edit our deployment a bit and deploy a more privileged version: alpine-privileged.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: alpine-privileged
namespace: privileged
labels:
app: alpine-privileged
spec:
replicas: 1
selector:
matchLabels:
app: alpine-privileged
template:
metadata:
labels:
app: alpine-privileged
spec:
securityContext:
runAsUser: 1000
fsGroup: 2000
volumes:
- name: sec-ctx-vol
emptyDir: {}
containers:
- name: alpine-privileged
image: alpine:3.9
command: ["sleep", "1800"]
volumeMounts:
- name: sec-ctx-vol
mountPath: /data/demo
securityContext:
allowPrivilegeEscalation: true
readOnlyRootFilesystem: false
This deployment want to start a pod with write permissions on the root filesystem, and to enable privilege escalation.
kubectl describe replicaset.extensions/alpine-privileged-7bdb64b569 -n privileged | tail -n3
# output
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 29s (x17 over 5m56s) replicaset-controller Error creating: pods "alpine-privileged-7bdb64b569-" is forbidden: unable to validate against any pod security policy: [spec.containers[0].securityContext.readOnlyRootFilesystem: Invalid value: false: ReadOnlyRootFilesystem must be set to true spec.containers[0].securityContext.allowPrivilegeEscalation: Invalid value: true: Allowing privilege escalation for containers is not allowed]
A long string of error, but basically it states the following:
The pod is created with the replicaset-controller service account, which allows use of the psp.restricted policy. Pod creation failed because podspec contains the allowPrivilegeEscalation: true and readOnlyRootFilesystem: false securityContexts - at this time, these are invalid values as far as psp.restricted is concerned.
To deploy this, let's make the deployment go through a service account that can allow it do get deployed. Making the following modification in the deployment manifest file, making it use the privileged-sa service account
...
template:
metadata:
labels:
app: alpine-privileged
spec:
serviceAccountName: privileged-sa
securityContext:
runAsUser: 1
fsGroup: 1
...
deploy a psp.privileged policy and create a Role and a Rolebinding to allow the privileged-sa to use the defined PSP
Create a separate namespace and a service account in it:
kubectl create sa privileged-sa -n privileged
next create the PSP and the role and role-binding to go with it:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
creationTimestamp: null
name: psp.privileged
spec:
readOnlyRootFilesystem: false
privileged: true
allowPrivilegeEscalation: true
runAsUser:
rule: 'RunAsAny'
supplementalGroups:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
seLinux:
rule: 'RunAsAny'
volumes:
- configMap
- emptyDir
- secret
--------
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: psp:privileged
namespace: privileged
rules:
- apiGroups:
- policy
resourceNames:
- psp.privileged
resources:
- podsecuritypolicies
verbs:
- use
--------
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: psp:privileged:binding
namespace: privileged
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: psp:privileged
subjects:
- kind: ServiceAccount
name: privileged-sa
namespace: privileged
The psp.privileged policy contains readOnlyRootFilesystem: false and allowPrivilegeEscalation: true. The privileged-sa service account in the privileged namespace allows us to use the psp.privileged policy, so, if we deploy the modified alpine-privileged.yml, the pod should start.
Deploy the pod and inspect the pod annotation:
kubectl get pod/alpine-privileged-667dc5c859-q7mf6 -n privileged -o jsonpath='{.metadata.annotations}'
# output
map[kubernetes.io/psp:psp.privileged]
What if we have multiple PSP
When multiple policies are available, the pod security policy controller selects policies in the following order:
- If policies successfully validate the pod without altering it, they are used.
- In the event of a pod creation request, the first valid policy in alphabetical order is used.
- Otherwise, if there's a pod update request an error is returned, because pod mutations are disallowed during update operations.
we'll create two Pod Security Policies with RBAC:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp.1
annotations:
seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default'
spec:
readOnlyRootFilesystem: false
privileged: false
allowPrivilegeEscalation: true
runAsUser:
rule: 'RunAsAny'
supplementalGroups:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
seLinux:
rule: 'RunAsAny'
volumes:
- configMap
- emptyDir
- secret
--------
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp.2
annotations:
seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default'
spec:
readOnlyRootFilesystem: false
privileged: false
allowPrivilegeEscalation: false
runAsUser:
rule: 'MustRunAsNonRoot'
supplementalGroups:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
seLinux:
rule: 'RunAsAny'
volumes:
- configMap
- emptyDir
- secret
--------
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: psp:multy
namespace: multy
rules:
- apiGroups:
- policy
resourceNames:
- psp.1
- psp.2
resources:
- podsecuritypolicies
verbs:
- use
--------
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: psp:multy:binding
namespace: multy
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: psp:multy
subjects:
- kind: ServiceAccount
name: replicaset-controller
namespace: kube-system
there are two PSPs in psp-multiple.yaml, which are almost identical. Now, Role and RoleBinding in the multy namespace will allow us to use a predefined psp.1 and psp.2 for the replicaset-controller
kubectl get psp
# output
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
...
psp.1 false RunAsAny RunAsAny RunAsAny RunAsAny true configMap,emptyDir,secret
psp.2 false RunAsAny MustRunAsNonRoot RunAsAny RunAsAny true configMap,emptyDir,secret
...
Creating a deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: alpine-multy
namespace: multy
labels:
app: alpine-multy
spec:
replicas: 1
selector:
matchLabels:
app: alpine-multy
template:
metadata:
labels:
app: alpine-multy
spec:
securityContext:
runAsUser: 1000
fsGroup: 2000
containers:
- name: alpine-multy
image: alpine:3.9
command: ["sleep", "2400"]
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
Checking the annotation again:
kubectl get pod alpine-multy-5864df9bc8-mgg68 -n multy -o jsonpath='{.metadata.annotations}'
# output
map[kubernetes.io/psp:psp.1 seccomp.security.alpha.kubernetes.io/pod:docker/default]
As you can see, this pod uses the psp.1 PodSecurityPolicy, which is the first valid policy in alphabetical order, though it is the second rule in the PSP selection.
Let's modify psp.2 by deleting seccomp's annotations:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp.2
spec:
readOnlyRootFilesystem: false
delete and deploy the pod again after modifying the psp.2 check the annotation again:
kubectl get pod alpine-multy-5864df9bc8-lj2fz -n multy -o jsonpath='{.metadata.annotations}'
# output
map[kubernetes.io/psp:psp.2]
Now, the pod uses psp.2 due to the annotations being deleted, since, in this case, the policy successfully validated the pod without altering it; it's the 1st rule in PSP selection.
Conclusion
The security model around nodes is well thought out and you cannot escalate cluster admin just by compromising a node. However, without PSPs, anything already running on that node is fair game. After configuring appropriate PSPs these vulnerabilities cannot be exploited.
Top comments (1)
thanks for a detailed post.