Skip to content

Commit a2edf56

Browse files
authored
Fix docs usability and known issues (NVIDIA#355)
* known issue for 25.10.1 and lifecycle terminology update Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com> * update release note, update full lifecycle section Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com> * Highlight openshift docs better Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com> * Update section links Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com> * Update section title Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com> --------- Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
1 parent d682b5b commit a2edf56

5 files changed

Lines changed: 40 additions & 16 deletions

File tree

‎gpu-operator/getting-started.rst‎

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,11 @@ Installing the NVIDIA GPU Operator
3131

3232
The current patch release of this version of the NVIDIA GPU Operator is ``${version}``.
3333

34+
.. admonition:: Red Hat OpenShift Container Platform Install
35+
:class: tip
36+
37+
For installation on Red Hat OpenShift Container Platform, refer to :external+ocp:doc:`steps-overview`.
38+
3439
*************
3540
Prerequisites
3641
*************

‎gpu-operator/index.rst‎

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,13 +68,20 @@
6868
Service Mesh <install-gpu-operator-service-mesh.rst>
6969

7070
.. toctree::
71-
:caption: CSP configurations
71+
:titlesonly:
72+
:hidden:
73+
74+
75+
76+
.. toctree::
77+
:caption: Platform-Specific Configurations
7278
:titlesonly:
7379
:hidden:
7480

7581
Amazon EKS <amazon-eks.rst>
7682
Azure AKS <microsoft-aks.rst>
7783
Google GKE <google-gke.rst>
84+
NVIDIA GPU Operator on Red Hat OpenShift Container Platform <https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html>
7885

7986

8087
.. include:: overview.rst

‎gpu-operator/life-cycle-policy.rst‎

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,8 @@ Patch releases typically include critical bug and CVE fixes, but can include min
3939
NVIDIA GPU Operator Life Cycle
4040
******************************
4141

42-
When a major version of NVIDIA GPU Operator is released, the previous major version enters maintenance support
43-
and only receives patch release updates for critical bug and CVE fixes.
44-
All prior major versions enter end-of-life (EOL) and are no longer supported and do not receive patch release updates.
42+
When a new major version of NVIDIA GPU Operator is released, the previous major version enters deprecated support and only receives patch release updates for critical bug and CVE fixes.
43+
All prior major versions enter end of support and are no longer supported and do not receive patch release updates.
4544

4645
The product life cycle and versioning are subject to change in the future.
4746

@@ -56,13 +55,13 @@ The product life cycle and versioning are subject to change in the future.
5655
- Status
5756

5857
* - 25.10.x
59-
- Generally Available
58+
- Supported
6059

6160
* - 25.3.x
62-
- Maintenance
61+
- Deprecated
6362

6463
* - 24.9.x and lower
65-
- EOL
64+
- End of Support
6665

6766

6867
.. _operator-component-matrix:

‎gpu-operator/overview.rst‎

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -34,18 +34,15 @@ Kubernetes device plugin for GPUs, the `NVIDIA Container Toolkit <https://github
3434
automatic node labeling using `GFD <https://github.com/NVIDIA/gpu-feature-discovery>`_, `DCGM <https://developer.nvidia.com/dcgm>`_ based monitoring and others.
3535

3636

37-
.. card:: Red Hat OpenShift Container Platform
38-
39-
For information about installing, managing, and upgrading the Operator,
40-
refer to :external+ocp:doc:`index`.
41-
42-
Information about supported versions is available in :ref:`Supported Operating Systems and Kubernetes Platforms`.
43-
44-
4537
About This Documentation
4638
========================
4739

48-
Browse through the following documents for getting started, platform support and release notes.
40+
Browse through the following documents for getting started, platform support and release notes for the NVIDIA GPU Operator.
41+
42+
.. admonition:: Red Hat OpenShift Container Platform
43+
:class: tip
44+
45+
Refer to :external+ocp:doc:`index` for information about installing, managing, and upgrading the Operator on Red Hat OpenShift Container Platform.
4946

5047
Getting Started
5148
---------------

‎gpu-operator/release-notes.rst‎

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,22 @@ Fixed Issues
8383
* Fixed a bug where the k8s-driver-manager would wait indefinitely when MOFED is enabled and ``USE_HOST_MOFED`` is set to true despite the MOFED being pre-installed on the host.
8484

8585

86+
Known Issues
87+
------------
88+
89+
* When deploying the GPU Operator on systems with SELinux in enforcing mode, the MIG Manager does not get scheduled on GPU nodes.
90+
This happens because the GPU Feature Discovery pod has insufficient permissions on Node Feature Discovery's feature-file drop-in directory, so it cannot add the label that indicates a MIG-capable GPU is present.
91+
To work around this issue, configure NVIDIA GPU Feature Discovery to use the Node Feature API instead of feature files in ClusterPolicy:
92+
93+
.. code-block:: yaml
94+
95+
gfd:
96+
env:
97+
- name: USE_NODE_FEATURE_API
98+
value: "true"
99+
100+
101+
86102
.. _v25.10.0:
87103

88104
25.10.0

0 commit comments

Comments
 (0)