Description
What happened?
Even PSP or other policy engines enforces the field values of PodSecurityContext.SupplementalGroups
, user can bypass the setting by building a custom container image like this.
Assume the below PSP is enforced:
kind: PodSecurityPolicy
...
spec:
# this only allows uid=1000, gid=1000, supplementalGroups=[60000]
runAsUser:
ranges: { min: 1000, max: 1000 }
rule: MustRunAs
runAsGroup:
ranges: { min: 1000, max: 1000 }
rule: MustRunAs
supplementalGroups:
ranges: { min: 60000, max: 60000 }
rule: MustRunAs
Then, a user can create a pod that PSP allows with a custom image. But it can bypass SupplementalGroups
setting by crafting a container image.
# Dockerfile
# In the image, uid=1000 belongs to 50000
FROM ubuntu:22.04
RUN groupadd -g 50000 bypassed-group \
&& useradd -m -u 1000 alice \
&& gpasswd -a alice bypassed-group
# Pod
kind: Pod
metadata:
name: bypass-supplementalgroups-pod
spec:
# the securityContext satisfies the above PSP
securityContext:
runAsUser: 1000
runAsGroup: 1000
supplementalGroups: [60000]
containers:
# However, because uid=1000 belongs to gid=50000 in the image,
# the process's identity in the container has groups=50000,60000
# Note that 50000 is bypassed by container image
# even though PSP enforces supplementalGroups=60000
- image: built-above
# the command outputs: 'uid=1000(alice) gid=1000(alice) groups=1000(alice),50000(bypassed-group),60000'
command: ["bash", "-c", "id; true"]
...
Why does this behavior matter?
In a multi-tenant Kubernetes cluster using hostPath
volumes, this behavior may be very confusing for cluster administrators.
hostPath
volumes are access-controlled by uid/gid as in the usual Linux manner. In such clusters, I think it would be a common practice for the cluster administrator to use PSP or policy engine to restrict the values of PodSecurityContext.{runAsUser, runAsGroup, SupplementalGroups}
field to secure private directory in hostPath
volumes. However, a custom container image can easily bypass the setting and can get access to some private directory in the hostPath
volume. This behavior confuses cluster administrators. They might say "What can supplementalGroups
policy in PSP protect?"
See here for a more detailed and practical impact.
What did you expect to happen?
bypass-supplementalgroups-pod
can have group identities defined only in PodSecurityContext.SupplementalGroups
. That is, the pod's log expects to be
uid=1000(alice) gid=1000(alice) groups=1000(alice),60000
How can we reproduce it (as minimally and precisely as possible)?
See https://github.com/pfnet-research/strict-supplementalgroups-container-runtime/tree/reproduce-bypass-supplementalgroups. You can reproduce this behavior in a single command. You can try to reproduce this both with contained and cri-o.
Anything else we need to know?
Is this a security issue??
We reported this behavior to hackerone.com(#1688374
) by following Kubernetes Security and Disclosure Information. Kubernetes Security Response Committee responded that this behavior "works as intended," and they recommended discussing how to handle this behavior in a public k/k issue.
However, I understand this behavior is not recognized widely. So, for clusters using NFS(or similar filesystems) as hostPath
volumes, I think it's popular for supercomputers or HPC area, this behavior might cause critical consequences because many cluster administrators might understand PSP(or other policy engines) can provide enough protection to private directories in such hostPath
volumes.
See here for a more detailed and practical impact.
cc/ @cjcullen (as a responder to our report)
Root Cause: ambiguous definition
In kubernetes API, Pod.SecurityContext.SupplementalGroups
is defined as "A list of groups applied to the first process run in each container, in addition to the container's primary GID. If unspecified, no groups will be added to any container. Note that this field cannot be set when spec.os.name is windows."
I think this definition may be a bit ambiguous for readers regarding the gids defined in the container image. However, most popular CRI implementations(contained and cri-o) create OCI runtime spec(config.json
) with merged gids defined in the container image and supplementalGroups
in the pod spec like this:
{
"ociVersion": "1.0.2-dev",
"process": {
"user": {
"uid": 1000,
"gid": 1000,
# CRI merges group gids for uid=1000
"additionalGids": [
50000, # defined in container image (uid=1000 belongs to it)
60000 # defined in securityContext.supplementalGroups
]
},
...
How to resolve this issue?
added: as described in #112879 (comment), we will improve the API description first, and keep discussing new API in KEP.
First, I would like to discuss how to resolve this in the community. I propose several ways to resolve this.
- Step 1. Improve the description in the API spec and warn this behavior in a blog or similar ways to users
- Pros
- Easy. Just Documentation. No breaking changes.
- This behavior can be mitigated by implementing the custom
RuntimeClass
which enforces it. - (I'm welcome to write an article for this and its mitigations)
- Cons
- This behavior will become official. It might break the "secure by design" principle.
- Pros
- Step 2. Add a new API:
PodSecurityContext.SupplementalGroupsPolicy={Merge(default), Strict}
or similar (KEP-3619)- Pros
- No breaking change.
- Users can have an official API to select the behavior.
- Cons
- Hard. Learge impact.
- Needs KEP (I'm also welcome to write a KEP)
- This affects all the CRI implementations
- Pros
Change an API behaviorProsNone.
ConsBreaking change.
/area security
/sig security
/area api
/sig architecture
Tasks for Step 1
- [x] https://github.com/kubernetes/kubernetes/pull/113047
- [x] https://github.com/kubernetes-sigs/cri-tools/pull/1005
- [ ] https://github.com/kubernetes/enhancements/issues/3619
Tasks for Step 2
- [ ] https://github.com/kubernetes/enhancements/issues/3619
Kubernetes version
$ kubectl version
Client Version: v1.24.6
Kustomize Version: v4.5.4
Server Version: v1.25.2
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux reproduce-bypass-supplementalgroups-control-plane 5.10.104-linuxkit #1 SMP PREEMPT Thu Mar 17 17:05:54 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
Install tools
Container runtime (CRI) and version (if applicable)
I tested this behavior both on containerd and cri-o.
crio version 1.25.0