cuda.core: complete the naming audit from #1945#1986
Conversation
Apply the in-scope renames from the issue checklist. Each change is
described in the 1.0.0 release notes' Breaking changes section.
Property conversions (no-arg deterministic getters):
Buffer.get_ipc_descriptor() -> Buffer.ipc_descriptor
Event.get_ipc_descriptor() -> Event.ipc_descriptor
DeviceMemoryResource.get_allocation_handle() -> .allocation_handle
PinnedMemoryResource.get_allocation_handle() -> .allocation_handle
Boolean / non-noun properties (renamed for clarity, polarity fixed):
LaunchConfig.cooperative_launch -> .is_cooperative
Event.is_timing_disabled -> .is_timing_enabled
Event.is_sync_busy_waited -> .uses_blocking_sync
EventOptions.busy_waited_sync -> .use_blocking_sync
The Event boolean rename also corrects a long-standing inversion in the
underlying handle module: the C++ field/accessor pair stored
"BLOCKING_SYNC is set" but was named "busy_waited", which is the
opposite of what the flag actually does. The new name (and the public
property) match CUDA's terminology directly.
Graph allocation method consistency (matches MemoryResource):
GraphDefinition.alloc() -> .allocate()
GraphDefinition.free() -> .deallocate()
GraphNode.alloc() -> .allocate()
GraphNode.free() -> .deallocate()
Cross-API consistency for graph builders:
GraphBuilder.add_child() -> .embed()
GraphDefinition.record_event() / .wait_event() -> .record() / .wait()
GraphNode.record_event() / .wait_event() -> .record() / .wait()
Also updates the underlying handle layer to match the renamed public
properties:
C++ EventBox.timing_disabled -> .timing_enabled (polarity flipped)
C++ EventBox.busy_waited -> .uses_blocking_sync
C++ get_event_timing_disabled -> get_event_timing_enabled
C++ get_event_busy_waited -> get_event_uses_blocking_sync
Cython mirrors in _resource_handles.{pxd,pyx}
IPCEventDescriptor._busy_waited -> ._uses_blocking_sync
Historical references to renamed/removed names in 0.3.0 and 0.4.0
release notes are converted to inline literals so latest-only Sphinx
builds do not break.
Out of scope (deferred):
- KernelAttributes redesign (separate design PR)
- Conditional API rework (create_condition unification, if_then,
GraphCondition.__int__) -- separate work in progress on this branch
- DeviceProperties.cooperative_launch (pending team discussion)
- All system/* and fan items (Mike)
Made-with: Cursor
GraphBuilder, GraphDefinition, and GraphNode now share one conditional
vocabulary. The factory returns a GraphCondition (no raw handle on the
public surface), the conditional-node verbs are consistent, and a
GraphCondition can be passed straight to launch().
Public API changes:
GraphBuilder.create_conditional_handle() -> .create_condition()
Returns GraphCondition (matches GraphDefinition.create_condition).
The four conditional builder methods (if_then, if_else, while_loop,
switch) on GraphBuilder now accept GraphCondition instead of a raw
CUgraphConditionalHandle, and they validate the argument type.
GraphBuilder.if_cond() -> .if_then()
GraphDefinition.if_cond() -> .if_then()
GraphNode.if_cond() -> .if_then()
The new name parallels if_else / while_loop / switch (verb
describing the construct) and matches Python's if/then/else.
GraphCondition.__int__ (new)
Returns the underlying CUgraphConditionalHandle value, so a
GraphCondition can be passed directly as a kernel argument.
_kernel_arg_handler grew an explicit GraphCondition fast path that
packs it as CUgraphConditionalHandle, mirroring the existing
driver.CUgraphConditionalHandle case.
Internals:
GraphCondition._from_handle (static cdef factory) keeps the
GraphBuilder and GraphDefinition implementations of create_condition
on the same construction path.
Tests updated to use the new flow throughout, including passing
GraphCondition directly to set_handle / countdown kernels rather than
extracting .handle.
Made-with: Cursor
…VIDIA#1945) Replace the 16 (device_id) accessor methods on KernelAttributes with properties, and expose per-device queries via __getitem__: Old: kernel.attributes.num_regs() kernel.attributes.num_regs(dev) New: kernel.attributes.num_regs # current device kernel.attributes[dev].num_regs # specific device The default view returned by Kernel.attributes is bound to the current device (resolved at attribute-access time, so it follows context changes). Indexing returns a sibling view bound to the given Device or device ordinal; the underlying cache (keyed by (device_id, attr_enum)) is shared across views of the same kernel, so a value queried through one view is visible through the others. Internals: _device_id == -1 sentinel marks the "current device" view. _effective_device_id() resolves it via Device().device_id per access. _view_for_device() builds a sibling view that aliases _cache. _resolve_device_id() is removed (no longer needed). The 6 in-tree call sites (5 in test_module.py, 3 in test_graph_definition_lifetime.py) just drop the (); two new tests exercise the per-device view shape and reject invalid device arguments. Made-with: Cursor
…#1945) The parametrized test in test_module.py exercised the previous KernelAttributes(method, optional device_id) shape via getattr() + call. Rewrite it to use property access on the default view and on the per-device views returned by kernel.attributes[device], for both Device-object and ordinal-int forms. Type-check the value at every access path rather than just the last one. Made-with: Cursor
This comment has been minimized.
This comment has been minimized.
mdboom
left a comment
There was a problem hiding this comment.
I think we shouldn't add __int__ to get underlying handles and pointers... It doesn't look like we do that elsewhere in cuda-core (cuda-bindings does it in a bunch of places).
Other than that, LGTM.
| return hash(<unsigned long long>self._c_handle) | ||
|
|
||
| def __int__(self) -> int: | ||
| return <unsigned long long>self._c_handle |
There was a problem hiding this comment.
I'm not crazy about int implicitly returning the underlying handle (which should be an implementation detail). The other classes (of ours) that need this, can just reach to _c_handle, which would avoid making it accessible from Python.
| cdef bint timing_disabled = False | ||
| cdef bint busy_waited = False | ||
| cdef bint timing_enabled = True | ||
| cdef bint uses_blocking_sync = False |
There was a problem hiding this comment.
Minor nit bc it's not API, but there is uses_blocking_sync vs. use_blocking_sync here.
There was a problem hiding this comment.
Good catch. This points to larger issue in the API about matching language between *Options classes and the properties that reflect those options.
Here's what appears in events:
EventOptions |
Event |
|---|---|
enable_timing |
is_timing_enabled |
use_blocking_sync |
uses_blocking_sync |
ipc_enabled |
is_ipc_enabled |
The code base consistently uses adjective -> is_ + adjective, as in ipc_enabled -> is_ipc_enabled, and nonblocking -> is_nonblocking. The imperative forms (e.g. enable_timing) are the odd ones.
Following that, I'll change events to say timing_enabled -> is_timing_enabled and blocking_sync ->is_blocking_sync.
Side note: uses_blocking_sync would be the only instance of uses_. While is_blocking_sync is slightly unnatural, this option is overall more consistent.
Address review feedback on NVIDIA#1986: drop the implicit int() conversion on GraphCondition. Passing a GraphCondition directly to launch() is still supported; the kernel-arg handler now reaches into _c_handle directly instead of going through __int__. Made-with: Cursor
Address review feedback on NVIDIA#1986: align EventOptions field names with the convention used elsewhere in the API (option name = bare adjective, property name = is_/uses_ prefix). - EventOptions.enable_timing -> EventOptions.timing_enabled (matches Event.is_timing_enabled) - EventOptions.use_blocking_sync -> EventOptions.blocking_sync (matches Event.uses_blocking_sync) Same pattern as ipc_enabled / Event.is_ipc_enabled and StreamOptions.nonblocking / Stream.is_nonblocking. Made-with: Cursor
…A#1945) Adopt the is_-prefix convention used everywhere else (is_ipc_enabled, is_timing_enabled, is_nonblocking) and eliminate the lone uses_- form. Keep the _sync suffix so the name disambiguates from Stream's separate "blocking" concept and stays close to the underlying CU_EVENT_BLOCKING_SYNC driver flag. EventOptions.blocking_sync is unchanged. Made-with: Cursor
…ndition.__int__ (NVIDIA#1945) The conditional builder methods (if_then, if_else, switch, while_loop) were still calling int(condition) on the GraphCondition object after __int__ was removed in the prior commit. Use the public .handle property instead, which mirrors the kernel argument handler fix. This was caught by CI: 26 conditional graph tests failed with TypeError in cuda/core/graph/_graph_builder.pyx:549. Made-with: Cursor
|
Summary
Address the remaining items from the
cuda.corenaming audit (#1945)in preparation for v1.0.0. Three logical chunks, three commits:
properties, rename boolean / non-noun properties (with one polarity
fix), align graph allocation/event/embed verbs across the API.
GraphCondition, renameif_cond->if_then, and let aGraphConditionbe passed straight tolaunch().KernelAttributes: convert the 16 attribute methods toproperties; expose per-device queries via
kernel.attributes[device].Changes
Property conversions (no-arg deterministic getters):
Buffer.get_ipc_descriptor()->Buffer.ipc_descriptorEvent.get_ipc_descriptor()->Event.ipc_descriptorDeviceMemoryResource.get_allocation_handle()->.allocation_handlePinnedMemoryResource.get_allocation_handle()->.allocation_handleBoolean / non-noun cleanup (with polarity correction):
LaunchConfig.cooperative_launch->.is_cooperative(also theconstructor keyword argument).
Event.is_timing_disabled->.is_timing_enabled(polarity flippedto avoid the double-negative form). The constructor keyword
EventOptions.enable_timingis renamed to.timing_enabled(bareadjective).
Event.is_sync_busy_waited->.is_blocking_sync. The new namematches the underlying
CU_EVENT_BLOCKING_SYNCflag: whenTrue,the calling CPU thread blocks (yields) on
Event.syncinstead ofbusy-waiting. The previous name was inverted with respect to its
semantics. The constructor keyword
EventOptions.busy_waited_syncis correspondingly renamed to
.blocking_sync. This also closes[DOC]: Description of
EventOptions.busy_waited_syncis contradictory #657.*Options/ property convention: bareadjective on the option (
timing_enabled,blocking_sync) andis_-prefixed boolean on the property (is_timing_enabled,is_blocking_sync).EventBoxfield/accessor pair and the Cython mirrors arerenamed in parallel to
is_blocking_sync/get_event_is_blocking_sync;IPCEventDescriptor._busy_waited->
._is_blocking_sync.Graph allocation method consistency (matches
MemoryResource):GraphDefinition.alloc/free->.allocate/.deallocateGraphNode.alloc/free->.allocate/.deallocateCross-API consistency for graph builders:
GraphBuilder.add_child->.embed(matchesGraphDefinition.embedand
GraphNode.embed).GraphDefinition.record_event/wait_event->.record/.waitand the same on
GraphNode, matchingStream.record/Stream.wait.Conditional graph API unification:
GraphBuilder.create_conditional_handle()->.create_condition(),now returning a
GraphCondition(matchesGraphDefinition.create_condition).if_cond->if_thenonGraphBuilder,GraphDefinition, andGraphNode. The new name parallels the existingif_else,while_loop, andswitchmethods (verb describing the construct,not an abbreviation of "condition") and matches Python's own
if/then/elsevocabulary.GraphBuilder(if_then,if_else,while_loop,switch) now accept aGraphConditioninstead of a raw
CUgraphConditionalHandle, with a TypeError checkthat mirrors the one already present in
_make_conditional_node.GraphConditioncan be passed directly as a kernel argument;the kernel argument handler grew a
GraphConditionfast path thatreads the underlying
CUgraphConditionalHandledirectly. No__int__is exposed on the public class.KernelAttributesredesign:max_threads_per_block,num_regs,cluster_scheduling_policy_preference, etc.) that previously tooka
device_idargument are now@property. The view returned byKernel.attributesis bound to the current device by default(resolved at attribute-access time, so it follows context changes);
kernel.attributes[device]returns a sibling view bound to aspecific
Deviceor device ordinal, sharing the underlying cache.kernel.attributes.num_regs(),kernel.attributes.num_regs(some_dev)kernel.attributes.num_regs,kernel.attributes[some_dev].num_regsDocs:
1.0.0-notes.rstcovering everyrename above with issue refs.
0.3.0,0.4.0,0.7.0) had straySphinx cross-references to renamed/removed names; converted to inline
literals so latest-only Sphinx builds keep working without rewriting
history.
No backward-compatibility aliases (pre-1.0).
Test Coverage
(
test_event.py,test_memory.py,test_module.py,test_launcher.py,test_object_protocols.py, thetests/graph/andtests/memory_ipc/suites).pytest.raisesblocks adjusted (_ = obj.property) where propertyaccess replaced a method call, to satisfy
ruff B018.test_read_only_kernel_attributes(parametrized over all 16attributes) was rewritten to exercise the new shape directly:
property access on the default view and indexed access through
kernel.attributes[device]andkernel.attributes[device.device_id],with a type assertion at every access path (the original test only
checked the loop's last value).
test_kernel_attributes_per_device_viewexercises thekernel.attributes[device]shape and value parity with thedefault view.
test_kernel_attributes_indexing_rejects_invalid_deviceconfirms
kernel.attributes["bad"]errors via theDevice(...)constructor.
tests/,tests/graph/, andtests/memory_ipc/all green.Related Work
get_allocation_handle)in cuda.core: method/member/property naming audit #1945, except the
DeviceProperties.cooperative_launchrename(deferred pending team discussion) and the system/fan items (Mike).
GraphDef->GraphDefinition,Condition->GraphCondition), which is already merged.EventOptions.busy_waited_syncis contradictory #657 (the contradictoryEventOptions.busy_waited_syncdescription is resolved by the rename to
EventOptions.blocking_sync/
Event.is_blocking_sync).