Skip to content

Move driver and nvrtc cython and internal layers to new generator#1972

Open
mdboom wants to merge 20 commits into
NVIDIA:mainfrom
mdboom:driver-v2
Open

Move driver and nvrtc cython and internal layers to new generator#1972
mdboom wants to merge 20 commits into
NVIDIA:mainfrom
mdboom:driver-v2

Conversation

@mdboom

@mdboom mdboom commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

This is a continuation of the work in #1900. Now adds driver to the mix and both nvrtc and driver are generated from the "real" new generator.

@copy-pr-bot

copy-pr-bot Bot commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the cuda.bindings Everything related to the cuda.bindings module label Apr 24, 2026
@mdboom

mdboom commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@mdboom

mdboom commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@mdboom

mdboom commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@mdboom

mdboom commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

1 similar comment
@mdboom

mdboom commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@github-actions github-actions Bot added CI/CD CI/CD infrastructure cuda.core Everything related to the cuda.core module labels Apr 24, 2026
@mdboom

mdboom commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@mdboom

mdboom commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@github-actions

Copy link
Copy Markdown

@leofang leofang self-requested a review April 24, 2026 23:59
@leofang leofang added this to the cuda.bindings 13.3.0 & 12.9.7 milestone Apr 24, 2026
@mdboom

mdboom commented Apr 25, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@mdboom

mdboom commented Apr 25, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

1 similar comment
@mdboom

mdboom commented Apr 25, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@mdboom

mdboom commented Apr 29, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@leofang leofang added P0 High priority - Must do! enhancement Any code-related improvements labels May 1, 2026
@mdboom mdboom requested a review from leofang May 13, 2026 13:38
@mdboom mdboom marked this pull request as draft May 19, 2026 20:32
@mdboom

mdboom commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@mdboom mdboom marked this pull request as ready for review June 9, 2026 14:17
@github-actions github-actions Bot added the cuda.core Everything related to the cuda.core module label Jun 9, 2026
Comment thread cuda_bindings/cuda/bindings/_internal/_fast_enum.py Outdated
Comment thread cuda_bindings/cuda/bindings/_lib/utils.pxi
Comment thread .gitignore
cuda_bindings/cuda/bindings/_bindings/cyruntime_ptds.pxd
cuda_bindings/cuda/bindings/_bindings/cyruntime_ptds.pyx
cuda_bindings/cuda/bindings/_internal/_nvml.pyx
cuda_bindings/cuda/bindings/_internal/cufile.pyx

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should add them here now that both are auto-gen'd?

Suggested change
cuda_bindings/cuda/bindings/_internal/cufile.pyx
cuda_bindings/cuda/bindings/_internal/driver.pyx
cuda_bindings/cuda/bindings/_internal/nvrtc.pyx
cuda_bindings/cuda/bindings/_internal/cufile.pyx

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to ignore the .pyx files that come from .pyx.in files. .pyx files that are checked into cuda-python (even if generated by cybind) should not be on this list.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah @mdboom I see the confusion. Like cufile.pyx & co below, they are all auto-generated at build time too (from cufile_<platform>.pyx). It's different from the .pyx.in templates but still is generated code, hence it'd be nice to be consistent and ignore the generated files.

@leofang

leofang commented Jun 9, 2026

Copy link
Copy Markdown
Member

LGTM overall, @mdboom it seems the build fails at cythonization again 😛

@mdboom

mdboom commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

LGTM overall, @mdboom it seems the build fails at cythonization again 😛

Yeah, my agent found something easy, but it just fixed Python 3.15 by breaking everyone else ;) Need to work deeper.

@mdboom

mdboom commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

LGTM overall, @mdboom it seems the build fails at cythonization again 😛

Yeah, my agent found something easy, but it just fixed Python 3.15 by breaking everyone else ;) Need to work deeper.

It turns out the exception signature needs to match exactly when declaring function pointer types (which cuda-core does in one spot). cython-gen was generating except ?CUDA_ERROR_NOT_FOUND. cybind generates except ?_CURESULT_INTERNAL_LOADING_ERROR. The frustrating thing is that these are equivalent values. (There is a line _CURESULT_INTERNAL_LOADING_ERROR = CUDA_ERROR_NOT_FOUND)... But Cython can't see through that. The solution I arrived at was to add another override so cybind will generate CUDA_ERROR_NOT_FOUND for driver only. (Since we have to make cuda-core work for cuda-bindings from both before and after this PR). Yet another special case for these "legacy" core libraries, but not too bad, I suppose.

@mdboom mdboom requested a review from leofang June 10, 2026 18:32
@mdboom

mdboom commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

@leofang: This is passing all the tests now. I think this is good to go (and let's merge the generator-side thing at roughly the same time, if we can).

@leofang

leofang commented Jun 10, 2026

Copy link
Copy Markdown
Member

It turns out the exception signature needs to match exactly when declaring function pointer types (which cuda-core does in one spot).

I assume that this also helps us avoid any potential (Cython) ABI breakage?

@leofang

leofang commented Jun 10, 2026

Copy link
Copy Markdown
Member

I wish we have the ABI test running in the CI 😛 Eyeball'd as much as I can. Thanks, Mike!

The CI flakiness should be fixed in #2195. I retried a few times here to make it green.

@leofang leofang left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewing the symbol-loading rework end-to-end against main. Two file-anchored concerns below, plus one branch-state issue:

Missing CUDA 13.3 symbols. The following are declared and loaded in _bindings/cydriver.pyx.in on main (all at cudaVersion=13030) but absent from every new file in this PR (driver_linux.pyx, driver_windows.pyx, _internal/driver.pxd, cydriver.pyx):

  • cuLogicalEndpoint{Create,Destroy,AddDevice,BindAddr,BindMem,Unbind,Export,Import,Query,GetLimits,IdRelease,IdReserve} — 12 functions
  • cuStreamBeginRecaptureToGraph — 1 function

Looks like the branch was regenerated before these landed on main. Calling any of them post-merge would be a hard breakage, so this needs a rebase + regen before going in. Not a design fix.

Cancelling my prior approval until that and the inline comments are addressed.

-- Leo's bot

_F_cuGetProcAddress_v2('cuMemHostUnregister', <void **>&__cuMemHostUnregister, 4000, ptds_mode, NULL)

global __cuMemcpy
_F_cuGetProcAddress_v2('cuMemcpy', <void **>&__cuMemcpy, 4000, ptds_mode, NULL)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ~48 symbols (every cuMemcpy*, cuMemset*, and cuStream* core op), the old code used different cudaVersion args for the PTDS vs DEFAULT load pathscuMemcpy for example was loaded with cudaVersion=7000 in the PTDS branch and cudaVersion=4000 in the default branch. The new unified path uses the default-mode value (4000) for both. The PTDS flag should still steer the driver to the _ptds variant, but this is a real behavior change that I don't think CI covers today.

Could we run a manual pass with CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM=1 on the driver range we support, and confirm the resolved function pointers actually point at the _ptds/_ptsz variants (and not the plain ones)? Easiest check would be to snapshot the __cuMemcpy / __cuMemcpyAsync / __cuMemsetD8 etc. addresses from a fixture under both modes against a known-good build.

-- Leo's bot

_F_cuGetProcAddress_v2('cuStreamGetCtx', <void **>&__cuStreamGetCtx, 9020, ptds_mode, NULL)

global __cuStreamGetCtx_v2
_F_cuGetProcAddress_v2('cuStreamGetCtx_v2', <void **>&__cuStreamGetCtx_v2, 12050, ptds_mode, NULL)

@leofang leofang Jun 11, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new generator passes the literal _v2-suffixed name to cuGetProcAddress_v2, while the old generator passed the base name and let cudaVersion disambiguate:

- _F_cuGetProcAddress_v2('cuStreamGetCtx',    &__cuStreamGetCtx_v2, 12050, PTDS,      NULL)  # main, _bindings/cydriver.pyx.in:799
+ _F_cuGetProcAddress_v2('cuStreamGetCtx_v2', &__cuStreamGetCtx_v2, 12050, ptds_mode, NULL)  # this PR

Same pattern applies to cuLaunchHostFunc_v2 (line 2090), cuCtxGetDevice_v2, cuCtxSynchronize_v2, cuMulticastBindAddr_v2, cuMulticastBindMem_v2 — six symbols in total.

The documented cuGetProcAddress contract is to pass the base name; that's the whole point of the API and what gives us versioned-symbol mapping for free. Passing the literal versioned name happens to work today (the driver likely also indexes by the literal name), but it's not the documented contract and bypasses the version arg. If a future driver tightens resolution to base-name-only, these six symbols silently become NULL and we get a hard-to-diagnose crash — exactly the breakage cuGetProcAddress was designed to prevent.

This is also the "re-creating the versioned symbol mapping" piece you flagged as missing from the new generator — could cybind be updated to emit the base name + cudaVersion and reconstruct the mapping, matching the old behavior?

-- Leo's bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants