Skip to content

cuda.core: complete the naming audit from #1945#1986

Merged
Andy-Jost merged 9 commits intoNVIDIA:mainfrom
Andy-Jost:naming-audit-1945
Apr 30, 2026
Merged

cuda.core: complete the naming audit from #1945#1986
Andy-Jost merged 9 commits intoNVIDIA:mainfrom
Andy-Jost:naming-audit-1945

Conversation

@Andy-Jost
Copy link
Copy Markdown
Contributor

@Andy-Jost Andy-Jost commented Apr 29, 2026

Summary

Address the remaining items from the cuda.core naming audit (#1945)
in preparation for v1.0.0. Three logical chunks, three commits:

  • Mechanical batch: convert no-arg deterministic getters to
    properties, rename boolean / non-noun properties (with one polarity
    fix), align graph allocation/event/embed verbs across the API.
  • Conditional API: unify the conditional-graph factory on
    GraphCondition, rename if_cond -> if_then, and let a
    GraphCondition be passed straight to launch().
  • KernelAttributes: convert the 16 attribute methods to
    properties; expose per-device queries via kernel.attributes[device].

Changes

Property conversions (no-arg deterministic getters):

  • Buffer.get_ipc_descriptor() -> Buffer.ipc_descriptor
  • Event.get_ipc_descriptor() -> Event.ipc_descriptor
  • DeviceMemoryResource.get_allocation_handle() -> .allocation_handle
  • PinnedMemoryResource.get_allocation_handle() -> .allocation_handle

Boolean / non-noun cleanup (with polarity correction):

  • LaunchConfig.cooperative_launch -> .is_cooperative (also the
    constructor keyword argument).
  • Event.is_timing_disabled -> .is_timing_enabled (polarity flipped
    to avoid the double-negative form). The constructor keyword
    EventOptions.enable_timing is renamed to .timing_enabled (bare
    adjective).
  • Event.is_sync_busy_waited -> .is_blocking_sync. The new name
    matches the underlying CU_EVENT_BLOCKING_SYNC flag: when True,
    the calling CPU thread blocks (yields) on Event.sync instead of
    busy-waiting. The previous name was inverted with respect to its
    semantics. The constructor keyword EventOptions.busy_waited_sync
    is correspondingly renamed to .blocking_sync. This also closes
    [DOC]: Description of EventOptions.busy_waited_sync is contradictory #657.
  • This standardizes the *Options / property convention: bare
    adjective on the option (timing_enabled, blocking_sync) and
    is_-prefixed boolean on the property (is_timing_enabled,
    is_blocking_sync).
  • The C++ EventBox field/accessor pair and the Cython mirrors are
    renamed in parallel to is_blocking_sync /
    get_event_is_blocking_sync; IPCEventDescriptor._busy_waited
    -> ._is_blocking_sync.

Graph allocation method consistency (matches MemoryResource):

  • GraphDefinition.alloc / free -> .allocate / .deallocate
  • GraphNode.alloc / free -> .allocate / .deallocate

Cross-API consistency for graph builders:

  • GraphBuilder.add_child -> .embed (matches GraphDefinition.embed
    and GraphNode.embed).
  • GraphDefinition.record_event / wait_event -> .record / .wait
    and the same on GraphNode, matching Stream.record / Stream.wait.

Conditional graph API unification:

  • GraphBuilder.create_conditional_handle() -> .create_condition(),
    now returning a GraphCondition (matches GraphDefinition.create_condition).
  • if_cond -> if_then on GraphBuilder, GraphDefinition, and
    GraphNode. The new name parallels the existing if_else,
    while_loop, and switch methods (verb describing the construct,
    not an abbreviation of "condition") and matches Python's own
    if/then/else vocabulary.
  • The four conditional builder methods on GraphBuilder (if_then,
    if_else, while_loop, switch) now accept a GraphCondition
    instead of a raw CUgraphConditionalHandle, with a TypeError check
    that mirrors the one already present in _make_conditional_node.
  • A GraphCondition can be passed directly as a kernel argument;
    the kernel argument handler grew a GraphCondition fast path that
    reads the underlying CUgraphConditionalHandle directly. No
    __int__ is exposed on the public class.

KernelAttributes redesign:

  • The 16 attribute methods (max_threads_per_block, num_regs,
    cluster_scheduling_policy_preference, etc.) that previously took
    a device_id argument are now @property. The view returned by
    Kernel.attributes is bound to the current device by default
    (resolved at attribute-access time, so it follows context changes);
    kernel.attributes[device] returns a sibling view bound to a
    specific Device or device ordinal, sharing the underlying cache.
  • Old: kernel.attributes.num_regs(), kernel.attributes.num_regs(some_dev)
  • New: kernel.attributes.num_regs, kernel.attributes[some_dev].num_regs

Docs:

  • New "Breaking changes" entries in 1.0.0-notes.rst covering every
    rename above with issue refs.
  • Three older release notes (0.3.0, 0.4.0, 0.7.0) had stray
    Sphinx cross-references to renamed/removed names; converted to inline
    literals so latest-only Sphinx builds keep working without rewriting
    history.

No backward-compatibility aliases (pre-1.0).

Test Coverage

  • Existing tests updated to the new names throughout
    (test_event.py, test_memory.py, test_module.py, test_launcher.py,
    test_object_protocols.py, the tests/graph/ and
    tests/memory_ipc/ suites).
  • pytest.raises blocks adjusted (_ = obj.property) where property
    access replaced a method call, to satisfy ruff B018.
  • test_read_only_kernel_attributes (parametrized over all 16
    attributes) was rewritten to exercise the new shape directly:
    property access on the default view and indexed access through
    kernel.attributes[device] and kernel.attributes[device.device_id],
    with a type assertion at every access path (the original test only
    checked the loop's last value).
  • New tests:
    • test_kernel_attributes_per_device_view exercises the
      kernel.attributes[device] shape and value parity with the
      default view.
    • test_kernel_attributes_indexing_rejects_invalid_device
      confirms kernel.attributes["bad"] errors via the Device(...)
      constructor.
  • Verified locally on a GPU host: tests/, tests/graph/, and
    tests/memory_ipc/ all green.
  • Editable build clean, pre-commit clean (all-files).

Related Work

Apply the in-scope renames from the issue checklist. Each change is
described in the 1.0.0 release notes' Breaking changes section.

Property conversions (no-arg deterministic getters):
  Buffer.get_ipc_descriptor()                    -> Buffer.ipc_descriptor
  Event.get_ipc_descriptor()                     -> Event.ipc_descriptor
  DeviceMemoryResource.get_allocation_handle()   -> .allocation_handle
  PinnedMemoryResource.get_allocation_handle()   -> .allocation_handle

Boolean / non-noun properties (renamed for clarity, polarity fixed):
  LaunchConfig.cooperative_launch                -> .is_cooperative
  Event.is_timing_disabled                       -> .is_timing_enabled
  Event.is_sync_busy_waited                      -> .uses_blocking_sync
  EventOptions.busy_waited_sync                  -> .use_blocking_sync

The Event boolean rename also corrects a long-standing inversion in the
underlying handle module: the C++ field/accessor pair stored
"BLOCKING_SYNC is set" but was named "busy_waited", which is the
opposite of what the flag actually does. The new name (and the public
property) match CUDA's terminology directly.

Graph allocation method consistency (matches MemoryResource):
  GraphDefinition.alloc()  -> .allocate()
  GraphDefinition.free()   -> .deallocate()
  GraphNode.alloc()        -> .allocate()
  GraphNode.free()         -> .deallocate()

Cross-API consistency for graph builders:
  GraphBuilder.add_child()                       -> .embed()
  GraphDefinition.record_event() / .wait_event() -> .record() / .wait()
  GraphNode.record_event()    / .wait_event()    -> .record() / .wait()

Also updates the underlying handle layer to match the renamed public
properties:

  C++ EventBox.timing_disabled  -> .timing_enabled    (polarity flipped)
  C++ EventBox.busy_waited      -> .uses_blocking_sync
  C++ get_event_timing_disabled -> get_event_timing_enabled
  C++ get_event_busy_waited     -> get_event_uses_blocking_sync
  Cython mirrors in _resource_handles.{pxd,pyx}
  IPCEventDescriptor._busy_waited -> ._uses_blocking_sync

Historical references to renamed/removed names in 0.3.0 and 0.4.0
release notes are converted to inline literals so latest-only Sphinx
builds do not break.

Out of scope (deferred):

  - KernelAttributes redesign (separate design PR)
  - Conditional API rework (create_condition unification, if_then,
    GraphCondition.__int__) -- separate work in progress on this branch
  - DeviceProperties.cooperative_launch (pending team discussion)
  - All system/* and fan items (Mike)

Made-with: Cursor
GraphBuilder, GraphDefinition, and GraphNode now share one conditional
vocabulary. The factory returns a GraphCondition (no raw handle on the
public surface), the conditional-node verbs are consistent, and a
GraphCondition can be passed straight to launch().

Public API changes:

  GraphBuilder.create_conditional_handle()  -> .create_condition()
      Returns GraphCondition (matches GraphDefinition.create_condition).
      The four conditional builder methods (if_then, if_else, while_loop,
      switch) on GraphBuilder now accept GraphCondition instead of a raw
      CUgraphConditionalHandle, and they validate the argument type.

  GraphBuilder.if_cond()      -> .if_then()
  GraphDefinition.if_cond()   -> .if_then()
  GraphNode.if_cond()         -> .if_then()
      The new name parallels if_else / while_loop / switch (verb
      describing the construct) and matches Python's if/then/else.

  GraphCondition.__int__      (new)
      Returns the underlying CUgraphConditionalHandle value, so a
      GraphCondition can be passed directly as a kernel argument.
      _kernel_arg_handler grew an explicit GraphCondition fast path that
      packs it as CUgraphConditionalHandle, mirroring the existing
      driver.CUgraphConditionalHandle case.

Internals:

  GraphCondition._from_handle (static cdef factory) keeps the
  GraphBuilder and GraphDefinition implementations of create_condition
  on the same construction path.

Tests updated to use the new flow throughout, including passing
GraphCondition directly to set_handle / countdown kernels rather than
extracting .handle.

Made-with: Cursor
…VIDIA#1945)

Replace the 16 (device_id) accessor methods on KernelAttributes with
properties, and expose per-device queries via __getitem__:

  Old:  kernel.attributes.num_regs()
        kernel.attributes.num_regs(dev)
  New:  kernel.attributes.num_regs              # current device
        kernel.attributes[dev].num_regs        # specific device

The default view returned by Kernel.attributes is bound to the current
device (resolved at attribute-access time, so it follows context
changes). Indexing returns a sibling view bound to the given Device or
device ordinal; the underlying cache (keyed by (device_id, attr_enum))
is shared across views of the same kernel, so a value queried through
one view is visible through the others.

Internals:

  _device_id == -1 sentinel marks the "current device" view.
  _effective_device_id() resolves it via Device().device_id per access.
  _view_for_device() builds a sibling view that aliases _cache.
  _resolve_device_id() is removed (no longer needed).

The 6 in-tree call sites (5 in test_module.py, 3 in
test_graph_definition_lifetime.py) just drop the (); two new tests
exercise the per-device view shape and reject invalid device arguments.

Made-with: Cursor
@Andy-Jost Andy-Jost added this to the cuda.core v1.0.0 milestone Apr 29, 2026
@Andy-Jost Andy-Jost added enhancement Any code-related improvements cuda.core Everything related to the cuda.core module breaking Breaking changes are introduced labels Apr 29, 2026
@Andy-Jost Andy-Jost self-assigned this Apr 29, 2026
…#1945)

The parametrized test in test_module.py exercised the previous
KernelAttributes(method, optional device_id) shape via getattr() +
call. Rewrite it to use property access on the default view and on
the per-device views returned by kernel.attributes[device], for both
Device-object and ordinal-int forms. Type-check the value at every
access path rather than just the last one.

Made-with: Cursor
@Andy-Jost Andy-Jost requested review from leofang, mdboom and rparolin and removed request for leofang April 29, 2026 00:17
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@mdboom mdboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't add __int__ to get underlying handles and pointers... It doesn't look like we do that elsewhere in cuda-core (cuda-bindings does it in a bunch of places).

Other than that, LGTM.

return hash(<unsigned long long>self._c_handle)

def __int__(self) -> int:
return <unsigned long long>self._c_handle
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not crazy about int implicitly returning the underlying handle (which should be an implementation detail). The other classes (of ours) that need this, can just reach to _c_handle, which would avoid making it accessible from Python.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this

Comment thread cuda_core/cuda/core/_event.pyx Outdated
cdef bint timing_disabled = False
cdef bint busy_waited = False
cdef bint timing_enabled = True
cdef bint uses_blocking_sync = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit bc it's not API, but there is uses_blocking_sync vs. use_blocking_sync here.

Copy link
Copy Markdown
Contributor Author

@Andy-Jost Andy-Jost Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This points to larger issue in the API about matching language between *Options classes and the properties that reflect those options.

Here's what appears in events:

EventOptions Event
enable_timing is_timing_enabled
use_blocking_sync uses_blocking_sync
ipc_enabled is_ipc_enabled

The code base consistently uses adjective -> is_ + adjective, as in ipc_enabled -> is_ipc_enabled, and nonblocking -> is_nonblocking. The imperative forms (e.g. enable_timing) are the odd ones.

Following that, I'll change events to say timing_enabled -> is_timing_enabled and blocking_sync ->is_blocking_sync.

Side note: uses_blocking_sync would be the only instance of uses_. While is_blocking_sync is slightly unnatural, this option is overall more consistent.

Address review feedback on NVIDIA#1986: drop the implicit int() conversion on
GraphCondition. Passing a GraphCondition directly to launch() is still
supported; the kernel-arg handler now reaches into _c_handle directly
instead of going through __int__.

Made-with: Cursor
Address review feedback on NVIDIA#1986: align EventOptions field names with
the convention used elsewhere in the API (option name = bare adjective,
property name = is_/uses_ prefix).

- EventOptions.enable_timing -> EventOptions.timing_enabled
  (matches Event.is_timing_enabled)
- EventOptions.use_blocking_sync -> EventOptions.blocking_sync
  (matches Event.uses_blocking_sync)

Same pattern as ipc_enabled / Event.is_ipc_enabled and
StreamOptions.nonblocking / Stream.is_nonblocking.

Made-with: Cursor
…A#1945)

Adopt the is_-prefix convention used everywhere else (is_ipc_enabled,
is_timing_enabled, is_nonblocking) and eliminate the lone uses_- form.
Keep the _sync suffix so the name disambiguates from Stream's separate
"blocking" concept and stays close to the underlying CU_EVENT_BLOCKING_SYNC
driver flag.

EventOptions.blocking_sync is unchanged.

Made-with: Cursor
…ndition.__int__ (NVIDIA#1945)

The conditional builder methods (if_then, if_else, switch, while_loop)
were still calling int(condition) on the GraphCondition object after
__int__ was removed in the prior commit. Use the public .handle property
instead, which mirrors the kernel argument handler fix.

This was caught by CI: 26 conditional graph tests failed with
TypeError in cuda/core/graph/_graph_builder.pyx:549.

Made-with: Cursor
@Andy-Jost Andy-Jost requested a review from mdboom April 30, 2026 17:23
Copy link
Copy Markdown
Contributor

@mdboom mdboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Andy-Jost Andy-Jost merged commit 8b55a6c into NVIDIA:main Apr 30, 2026
95 checks passed
@Andy-Jost Andy-Jost deleted the naming-audit-1945 branch April 30, 2026 19:37
@github-actions
Copy link
Copy Markdown

Doc Preview CI
Preview removed because the pull request was closed or merged.
@leofang leofang mentioned this pull request May 1, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking changes are introduced cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

3 participants