Skip to content

Can bypass PodSecurityContext.SupplementalGroups by custom container image although PSP(or other policy engines) enforces the field #112879

Closed
@everpeace

Description

@everpeace

What happened?

Even PSP or other policy engines enforces the field values of PodSecurityContext.SupplementalGroups, user can bypass the setting by building a custom container image like this.

Assume the below PSP is enforced:

kind: PodSecurityPolicy
...
spec:
  # this only allows uid=1000, gid=1000, supplementalGroups=[60000]
  runAsUser:
    ranges: { min: 1000, max: 1000 }
    rule: MustRunAs
  runAsGroup:
    ranges: { min: 1000, max: 1000 }
    rule: MustRunAs
  supplementalGroups:
    ranges: { min: 60000, max: 60000 }
    rule: MustRunAs

Then, a user can create a pod that PSP allows with a custom image. But it can bypass SupplementalGroups setting by crafting a container image.

# Dockerfile 
# In the image, uid=1000 belongs to 50000
FROM ubuntu:22.04
RUN groupadd -g 50000 bypassed-group \
    && useradd -m -u 1000 alice \
    && gpasswd -a alice bypassed-group
# Pod
kind: Pod
metadata:
  name: bypass-supplementalgroups-pod
spec:
  # the securityContext satisfies the above PSP
  securityContext:
    runAsUser: 1000
    runAsGroup: 1000
    supplementalGroups: [60000]
  containers:
  # However, because uid=1000 belongs to gid=50000 in the image,
  # the process's identity in the container has groups=50000,60000
  # Note that 50000 is bypassed by container image 
  # even though PSP enforces supplementalGroups=60000
  - image: built-above
    # the command outputs: 'uid=1000(alice) gid=1000(alice) groups=1000(alice),50000(bypassed-group),60000'
    command: ["bash", "-c", "id; true"]
...

Why does this behavior matter?

In a multi-tenant Kubernetes cluster using hostPath volumes, this behavior may be very confusing for cluster administrators.

hostPath volumes are access-controlled by uid/gid as in the usual Linux manner. In such clusters, I think it would be a common practice for the cluster administrator to use PSP or policy engine to restrict the values of PodSecurityContext.{runAsUser, runAsGroup, SupplementalGroups} field to secure private directory in hostPath volumes. However, a custom container image can easily bypass the setting and can get access to some private directory in the hostPath volume. This behavior confuses cluster administrators. They might say "What can supplementalGroups policy in PSP protect?"

See here for a more detailed and practical impact.

What did you expect to happen?

bypass-supplementalgroups-pod can have group identities defined only in PodSecurityContext.SupplementalGroups. That is, the pod's log expects to be

uid=1000(alice) gid=1000(alice) groups=1000(alice),60000

How can we reproduce it (as minimally and precisely as possible)?

See https://github.com/pfnet-research/strict-supplementalgroups-container-runtime/tree/reproduce-bypass-supplementalgroups. You can reproduce this behavior in a single command. You can try to reproduce this both with contained and cri-o.

Anything else we need to know?

Is this a security issue??

We reported this behavior to hackerone.com(#1688374) by following Kubernetes Security and Disclosure Information. Kubernetes Security Response Committee responded that this behavior "works as intended," and they recommended discussing how to handle this behavior in a public k/k issue.

However, I understand this behavior is not recognized widely. So, for clusters using NFS(or similar filesystems) as hostPath volumes, I think it's popular for supercomputers or HPC area, this behavior might cause critical consequences because many cluster administrators might understand PSP(or other policy engines) can provide enough protection to private directories in such hostPath volumes.

See here for a more detailed and practical impact.

cc/ @cjcullen (as a responder to our report)

Root Cause: ambiguous definition

In kubernetes API, Pod.SecurityContext.SupplementalGroups is defined as "A list of groups applied to the first process run in each container, in addition to the container's primary GID. If unspecified, no groups will be added to any container. Note that this field cannot be set when spec.os.name is windows."

I think this definition may be a bit ambiguous for readers regarding the gids defined in the container image. However, most popular CRI implementations(contained and cri-o) create OCI runtime spec(config.json) with merged gids defined in the container image and supplementalGroups in the pod spec like this:

{
  "ociVersion": "1.0.2-dev",
  "process": {
    "user": {
      "uid": 1000,
      "gid": 1000,
      # CRI merges group gids for uid=1000
      "additionalGids": [
        50000, # defined in container image (uid=1000 belongs to it)
        60000  # defined in securityContext.supplementalGroups
      ]
    },
...

How to resolve this issue?

added: as described in #112879 (comment), we will improve the API description first, and keep discussing new API in KEP.

First, I would like to discuss how to resolve this in the community. I propose several ways to resolve this.

  • Step 1. Improve the description in the API spec and warn this behavior in a blog or similar ways to users
    • Pros
    • Cons
      • This behavior will become official. It might break the "secure by design" principle.
  • Step 2. Add a new API: PodSecurityContext.SupplementalGroupsPolicy={Merge(default), Strict} or similar (KEP-3619)
    • Pros
      • No breaking change.
      • Users can have an official API to select the behavior.
    • Cons
      • Hard. Learge impact.
      • Needs KEP (I'm also welcome to write a KEP)
      • This affects all the CRI implementations
  • Change an API behavior
    • Pros
      • None.
    • Cons
      • Breaking change.

/area security
/sig security
/area api
/sig architecture

Tasks for Step 1

- [x] https://github.com/kubernetes/kubernetes/pull/113047
- [x] https://github.com/kubernetes-sigs/cri-tools/pull/1005
- [ ] https://github.com/kubernetes/enhancements/issues/3619

Tasks for Step 2

- [ ] https://github.com/kubernetes/enhancements/issues/3619

Kubernetes version

I checked this behavior in the below version. But I think this might affect all the recent versions.
$ kubectl version
Client Version: v1.24.6
Kustomize Version: v4.5.4
Server Version: v1.25.2

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux reproduce-bypass-supplementalgroups-control-plane 5.10.104-linuxkit #1 SMP PREEMPT Thu Mar 17 17:05:54 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

Install tools

Container runtime (CRI) and version (if applicable)

I tested this behavior both on containerd and cri-o.

containerd github.com/containerd/containerd v1.6.8 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
crio version 1.25.0

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

Labels

area/apiIndicates an issue on api area.area/securitykind/documentationCategorizes issue or PR as related to documentation.sig/nodeCategorizes an issue or PR as relevant to SIG Node.sig/securityCategorizes an issue or PR as relevant to SIG Security.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions