cuda.core: complete the naming audit from #1945 by Andy-Jost · Pull Request #1986 · NVIDIA/cuda-python

Andy-Jost · 2026-04-29T00:06:24Z

Summary

Address the remaining items from the cuda.core naming audit (#1945)
in preparation for v1.0.0. Three logical chunks, three commits:

Mechanical batch: convert no-arg deterministic getters to
properties, rename boolean / non-noun properties (with one polarity
fix), align graph allocation/event/embed verbs across the API.
Conditional API: unify the conditional-graph factory on
GraphCondition, rename if_cond -> if_then, and let a
GraphCondition be passed straight to launch().
KernelAttributes: convert the 16 attribute methods to
properties; expose per-device queries via kernel.attributes[device].

Changes

Property conversions (no-arg deterministic getters):

Buffer.get_ipc_descriptor() -> Buffer.ipc_descriptor
Event.get_ipc_descriptor() -> Event.ipc_descriptor
DeviceMemoryResource.get_allocation_handle() -> .allocation_handle
PinnedMemoryResource.get_allocation_handle() -> .allocation_handle

Boolean / non-noun cleanup (with polarity correction):

LaunchConfig.cooperative_launch -> .is_cooperative (also the
constructor keyword argument).
Event.is_timing_disabled -> .is_timing_enabled (polarity flipped
to avoid the double-negative form). The constructor keyword
EventOptions.enable_timing is renamed to .timing_enabled (bare
adjective).
Event.is_sync_busy_waited -> .is_blocking_sync. The new name
matches the underlying CU_EVENT_BLOCKING_SYNC flag: when True,
the calling CPU thread blocks (yields) on Event.sync instead of
busy-waiting. The previous name was inverted with respect to its
semantics. The constructor keyword EventOptions.busy_waited_sync
is correspondingly renamed to .blocking_sync. This also closes
[DOC]: Description of EventOptions.busy_waited_sync is contradictory #657.
This standardizes the *Options / property convention: bare
adjective on the option (timing_enabled, blocking_sync) and
is_-prefixed boolean on the property (is_timing_enabled,
is_blocking_sync).
The C++ EventBox field/accessor pair and the Cython mirrors are
renamed in parallel to is_blocking_sync /
get_event_is_blocking_sync; IPCEventDescriptor._busy_waited
-> ._is_blocking_sync.

Graph allocation method consistency (matches MemoryResource):

GraphDefinition.alloc / free -> .allocate / .deallocate
GraphNode.alloc / free -> .allocate / .deallocate

Cross-API consistency for graph builders:

GraphBuilder.add_child -> .embed (matches GraphDefinition.embed
and GraphNode.embed).
GraphDefinition.record_event / wait_event -> .record / .wait
and the same on GraphNode, matching Stream.record / Stream.wait.

Conditional graph API unification:

GraphBuilder.create_conditional_handle() -> .create_condition(),
now returning a GraphCondition (matches GraphDefinition.create_condition).
if_cond -> if_then on GraphBuilder, GraphDefinition, and
GraphNode. The new name parallels the existing if_else,
while_loop, and switch methods (verb describing the construct,
not an abbreviation of "condition") and matches Python's own
if/then/else vocabulary.
The four conditional builder methods on GraphBuilder (if_then,
if_else, while_loop, switch) now accept a GraphCondition
instead of a raw CUgraphConditionalHandle, with a TypeError check
that mirrors the one already present in _make_conditional_node.
A GraphCondition can be passed directly as a kernel argument;
the kernel argument handler grew a GraphCondition fast path that
reads the underlying CUgraphConditionalHandle directly. No
__int__ is exposed on the public class.

KernelAttributes redesign:

The 16 attribute methods (max_threads_per_block, num_regs,
cluster_scheduling_policy_preference, etc.) that previously took
a device_id argument are now @property. The view returned by
Kernel.attributes is bound to the current device by default
(resolved at attribute-access time, so it follows context changes);
kernel.attributes[device] returns a sibling view bound to a
specific Device or device ordinal, sharing the underlying cache.
Old: kernel.attributes.num_regs(), kernel.attributes.num_regs(some_dev)
New: kernel.attributes.num_regs, kernel.attributes[some_dev].num_regs

Docs:

New "Breaking changes" entries in 1.0.0-notes.rst covering every
rename above with issue refs.
Three older release notes (0.3.0, 0.4.0, 0.7.0) had stray
Sphinx cross-references to renamed/removed names; converted to inline
literals so latest-only Sphinx builds keep working without rewriting
history.

No backward-compatibility aliases (pre-1.0).

Test Coverage

Existing tests updated to the new names throughout
(test_event.py, test_memory.py, test_module.py, test_launcher.py,
test_object_protocols.py, the tests/graph/ and
tests/memory_ipc/ suites).
pytest.raises blocks adjusted (_ = obj.property) where property
access replaced a method call, to satisfy ruff B018.
test_read_only_kernel_attributes (parametrized over all 16
attributes) was rewritten to exercise the new shape directly:
property access on the default view and indexed access through
kernel.attributes[device] and kernel.attributes[device.device_id],
with a type assertion at every access path (the original test only
checked the loop's last value).
New tests:
- test_kernel_attributes_per_device_view exercises the
  kernel.attributes[device] shape and value parity with the
  default view.
- test_kernel_attributes_indexing_rejects_invalid_device
  confirms kernel.attributes["bad"] errors via the Device(...)
  constructor.
Verified locally on a GPU host: tests/, tests/graph/, and
tests/memory_ipc/ all green.
Editable build clean, pre-commit clean (all-files).

Related Work

Closes the items assigned to me (and Leo's get_allocation_handle)
in cuda.core: method/member/property naming audit #1945, except the DeviceProperties.cooperative_launch rename
(deferred pending team discussion) and the system/fan items (Mike).
Builds on Rename GraphDef -> GraphDefinition and Condition -> GraphCondition #1984 (GraphDef -> GraphDefinition,
Condition -> GraphCondition), which is already merged.
Closes [DOC]: Description of EventOptions.busy_waited_sync is contradictory #657 (the contradictory EventOptions.busy_waited_sync
description is resolved by the rename to EventOptions.blocking_sync
/ Event.is_blocking_sync).

Apply the in-scope renames from the issue checklist. Each change is described in the 1.0.0 release notes' Breaking changes section. Property conversions (no-arg deterministic getters): Buffer.get_ipc_descriptor() -> Buffer.ipc_descriptor Event.get_ipc_descriptor() -> Event.ipc_descriptor DeviceMemoryResource.get_allocation_handle() -> .allocation_handle PinnedMemoryResource.get_allocation_handle() -> .allocation_handle Boolean / non-noun properties (renamed for clarity, polarity fixed): LaunchConfig.cooperative_launch -> .is_cooperative Event.is_timing_disabled -> .is_timing_enabled Event.is_sync_busy_waited -> .uses_blocking_sync EventOptions.busy_waited_sync -> .use_blocking_sync The Event boolean rename also corrects a long-standing inversion in the underlying handle module: the C++ field/accessor pair stored "BLOCKING_SYNC is set" but was named "busy_waited", which is the opposite of what the flag actually does. The new name (and the public property) match CUDA's terminology directly. Graph allocation method consistency (matches MemoryResource): GraphDefinition.alloc() -> .allocate() GraphDefinition.free() -> .deallocate() GraphNode.alloc() -> .allocate() GraphNode.free() -> .deallocate() Cross-API consistency for graph builders: GraphBuilder.add_child() -> .embed() GraphDefinition.record_event() / .wait_event() -> .record() / .wait() GraphNode.record_event() / .wait_event() -> .record() / .wait() Also updates the underlying handle layer to match the renamed public properties: C++ EventBox.timing_disabled -> .timing_enabled (polarity flipped) C++ EventBox.busy_waited -> .uses_blocking_sync C++ get_event_timing_disabled -> get_event_timing_enabled C++ get_event_busy_waited -> get_event_uses_blocking_sync Cython mirrors in _resource_handles.{pxd,pyx} IPCEventDescriptor._busy_waited -> ._uses_blocking_sync Historical references to renamed/removed names in 0.3.0 and 0.4.0 release notes are converted to inline literals so latest-only Sphinx builds do not break. Out of scope (deferred): - KernelAttributes redesign (separate design PR) - Conditional API rework (create_condition unification, if_then, GraphCondition.__int__) -- separate work in progress on this branch - DeviceProperties.cooperative_launch (pending team discussion) - All system/* and fan items (Mike) Made-with: Cursor

GraphBuilder, GraphDefinition, and GraphNode now share one conditional vocabulary. The factory returns a GraphCondition (no raw handle on the public surface), the conditional-node verbs are consistent, and a GraphCondition can be passed straight to launch(). Public API changes: GraphBuilder.create_conditional_handle() -> .create_condition() Returns GraphCondition (matches GraphDefinition.create_condition). The four conditional builder methods (if_then, if_else, while_loop, switch) on GraphBuilder now accept GraphCondition instead of a raw CUgraphConditionalHandle, and they validate the argument type. GraphBuilder.if_cond() -> .if_then() GraphDefinition.if_cond() -> .if_then() GraphNode.if_cond() -> .if_then() The new name parallels if_else / while_loop / switch (verb describing the construct) and matches Python's if/then/else. GraphCondition.__int__ (new) Returns the underlying CUgraphConditionalHandle value, so a GraphCondition can be passed directly as a kernel argument. _kernel_arg_handler grew an explicit GraphCondition fast path that packs it as CUgraphConditionalHandle, mirroring the existing driver.CUgraphConditionalHandle case. Internals: GraphCondition._from_handle (static cdef factory) keeps the GraphBuilder and GraphDefinition implementations of create_condition on the same construction path. Tests updated to use the new flow throughout, including passing GraphCondition directly to set_handle / countdown kernels rather than extracting .handle. Made-with: Cursor

…VIDIA#1945) Replace the 16 (device_id) accessor methods on KernelAttributes with properties, and expose per-device queries via __getitem__: Old: kernel.attributes.num_regs() kernel.attributes.num_regs(dev) New: kernel.attributes.num_regs # current device kernel.attributes[dev].num_regs # specific device The default view returned by Kernel.attributes is bound to the current device (resolved at attribute-access time, so it follows context changes). Indexing returns a sibling view bound to the given Device or device ordinal; the underlying cache (keyed by (device_id, attr_enum)) is shared across views of the same kernel, so a value queried through one view is visible through the others. Internals: _device_id == -1 sentinel marks the "current device" view. _effective_device_id() resolves it via Device().device_id per access. _view_for_device() builds a sibling view that aliases _cache. _resolve_device_id() is removed (no longer needed). The 6 in-tree call sites (5 in test_module.py, 3 in test_graph_definition_lifetime.py) just drop the (); two new tests exercise the per-device view shape and reject invalid device arguments. Made-with: Cursor

…#1945) The parametrized test in test_module.py exercised the previous KernelAttributes(method, optional device_id) shape via getattr() + call. Rewrite it to use property access on the default view and on the per-device views returned by kernel.attributes[device], for both Device-object and ordinal-int forms. Type-check the value at every access path rather than just the last one. Made-with: Cursor

mdboom

I think we shouldn't add __int__ to get underlying handles and pointers... It doesn't look like we do that elsewhere in cuda-core (cuda-bindings does it in a bunch of places).

Other than that, LGTM.

mdboom · 2026-04-30T12:52:44Z

        return hash(<unsigned long long>self._c_handle)

+    def __int__(self) -> int:
+        return <unsigned long long>self._c_handle


I'm not crazy about int implicitly returning the underlying handle (which should be an implementation detail). The other classes (of ours) that need this, can just reach to _c_handle, which would avoid making it accessible from Python.

I'll remove this

mdboom · 2026-04-30T12:54:04Z

-        cdef bint timing_disabled = False
-        cdef bint busy_waited = False
+        cdef bint timing_enabled = True
+        cdef bint uses_blocking_sync = False


Minor nit bc it's not API, but there is uses_blocking_sync vs. use_blocking_sync here.

Good catch. This points to larger issue in the API about matching language between *Options classes and the properties that reflect those options.

Here's what appears in events:

EventOptions Event

enable_timing is_timing_enabled

use_blocking_sync uses_blocking_sync

ipc_enabled is_ipc_enabled

The code base consistently uses adjective -> is_ + adjective, as in ipc_enabled -> is_ipc_enabled, and nonblocking -> is_nonblocking. The imperative forms (e.g. enable_timing) are the odd ones.

Following that, I'll change events to say timing_enabled -> is_timing_enabled and blocking_sync ->is_blocking_sync.

Side note: uses_blocking_sync would be the only instance of uses_. While is_blocking_sync is slightly unnatural, this option is overall more consistent.

Address review feedback on NVIDIA#1986: drop the implicit int() conversion on GraphCondition. Passing a GraphCondition directly to launch() is still supported; the kernel-arg handler now reaches into _c_handle directly instead of going through __int__. Made-with: Cursor

Address review feedback on NVIDIA#1986: align EventOptions field names with the convention used elsewhere in the API (option name = bare adjective, property name = is_/uses_ prefix). - EventOptions.enable_timing -> EventOptions.timing_enabled (matches Event.is_timing_enabled) - EventOptions.use_blocking_sync -> EventOptions.blocking_sync (matches Event.uses_blocking_sync) Same pattern as ipc_enabled / Event.is_ipc_enabled and StreamOptions.nonblocking / Stream.is_nonblocking. Made-with: Cursor

…A#1945) Adopt the is_-prefix convention used everywhere else (is_ipc_enabled, is_timing_enabled, is_nonblocking) and eliminate the lone uses_- form. Keep the _sync suffix so the name disambiguates from Stream's separate "blocking" concept and stays close to the underlying CU_EVENT_BLOCKING_SYNC driver flag. EventOptions.blocking_sync is unchanged. Made-with: Cursor

…ndition.__int__ (NVIDIA#1945) The conditional builder methods (if_then, if_else, switch, while_loop) were still calling int(condition) on the GraphCondition object after __int__ was removed in the prior commit. Use the public .handle property instead, which mirrors the kernel argument handler fix. This was caught by CI: 26 conditional graph tests failed with TypeError in cuda/core/graph/_graph_builder.pyx:549. Made-with: Cursor

mdboom

👍

github-actions · 2026-04-30T19:53:51Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

Andy-Jost added 3 commits April 28, 2026 16:39

Andy-Jost added this to the cuda.core v1.0.0 milestone Apr 29, 2026

Andy-Jost added enhancement Any code-related improvements cuda.core Everything related to the cuda.core module breaking Breaking changes are introduced labels Apr 29, 2026

Andy-Jost self-assigned this Apr 29, 2026

Andy-Jost requested review from leofang, mdboom and rparolin and removed request for leofang April 29, 2026 00:17

This comment has been minimized.

Sign in to view

mdboom mentioned this pull request Apr 30, 2026

Audit cuda.core API for 1.0 release #1919

Open

mdboom requested changes Apr 30, 2026

View reviewed changes

Andy-Jost added 4 commits April 30, 2026 09:50

Merge remote-tracking branch 'origin/main' into naming-audit-1945

813d3d9

Andy-Jost mentioned this pull request Apr 30, 2026

[DOC]: Description of EventOptions.busy_waited_sync is contradictory #657

Closed

1 task

Andy-Jost requested a review from mdboom April 30, 2026 17:23

mdboom approved these changes Apr 30, 2026

View reviewed changes

Andy-Jost merged commit 8b55a6c into NVIDIA:main Apr 30, 2026
95 checks passed

Andy-Jost deleted the naming-audit-1945 branch April 30, 2026 19:37

Andy-Jost mentioned this pull request Apr 30, 2026

cuda.core: fix _arr_is_c_contiguous discriminator for numba arrays #1998

Merged

leofang added the P0 High priority - Must do! label May 1, 2026

leofang mentioned this pull request May 1, 2026

Make Linker.backend a classmethod #1910

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda.core: complete the naming audit from #1945#1986

cuda.core: complete the naming audit from #1945#1986
Andy-Jost merged 9 commits intoNVIDIA:mainfrom
Andy-Jost:naming-audit-1945

Andy-Jost commented Apr 29, 2026 •

edited

Loading

This comment has been minimized.

mdboom left a comment

mdboom Apr 30, 2026

Andy-Jost Apr 30, 2026

mdboom Apr 30, 2026

Andy-Jost Apr 30, 2026 •

edited

Loading

mdboom left a comment

Uh oh!

github-actions Bot commented Apr 30, 2026

Labels

3 participants

`EventOptions`	`Event`
`enable_timing`	`is_timing_enabled`
`use_blocking_sync`	`uses_blocking_sync`
`ipc_enabled`	`is_ipc_enabled`

Conversation

Andy-Jost commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Coverage

Related Work

This comment has been minimized.

mdboom left a comment

Choose a reason for hiding this comment

mdboom Apr 30, 2026

Choose a reason for hiding this comment

Andy-Jost Apr 30, 2026

Choose a reason for hiding this comment

mdboom Apr 30, 2026

Choose a reason for hiding this comment

Andy-Jost Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

mdboom left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 30, 2026

Labels

3 participants

Andy-Jost commented Apr 29, 2026 •

edited

Loading

Andy-Jost Apr 30, 2026 •

edited

Loading