cuda

FFI binding to the CUDA driver interface for programming NVIDIA GPUs

https://github.com/tmcdonell/cuda

LTS Haskell 23.28:0.11.0.1
Stackage Nightly 2026-03-31:0.13.0.0
Latest on Hackage:0.13.0.0

See all snapshots cuda appears in

Used by 1 package in nightly-2026-03-31(full list with versions):

Haskell FFI Bindings to CUDA

CI-Linux CI-Windows Stackage LTS Stackage Nightly Hackage

The CUDA library provides a direct, general purpose C-like SPMD programming model for NVIDIA graphics cards (G8x series onwards). This is a collection of bindings to allow you to call and control, although not write, such functions from Haskell-land. You will need to install the CUDA driver and developer toolkit.

http://developer.nvidia.com/object/cuda.html

The configure step will look for your CUDA installation in the standard places, and if the nvcc compiler is found in your PATH, relative to that.

For important information on installing on Windows, see:

https://github.com/tmcdonell/cuda/blob/master/WINDOWS.md

Missing functionality

This library is currently in maintenance mode. While we are happy to release updates to keep the existing interface working with newer CUDA versions (as long as the underlying APIs remain available), no binding of new features is planned at the moment. Get in touch if you want to contribute.

Here is an incomplete historical list of missing bindings. Pull requests welcome!

CUDA-9

  • cuLaunchCooperativeKernelMultiDevice

CUDA-10.0

  • cuDeviceGetLuid (windows only?)
  • cuLaunchHostFunc
  • cuGraphHostNode[Get/Set]Params
  • cuGraphKernelNode[Get/Set]Params
  • cuGraphMemcpyNode[Get/Set]Params
  • cuGraphMemsetNode[Get/Set]Params

CUDA-10.2

  • cuDeviceGetNvSciSyncAttributes
  • cuMemAddressFree
  • cuMemAddressReserve
  • cuMemCreate
  • cuMemExportToShareableHandle
  • cuMemGetAccess
  • cuMemGetAllocationGranularity
  • cuMemGetAllocationPrepertiesFromHandle
  • cuMemImportFromShareableHandle
  • cuMemMap
  • cuMemRelease
  • cuMemSetAccess
  • cuMemUnmap
  • cuGraphExecHostNodeSetParams
  • cuGraphExecMemcpyNodeSetParams
  • cuGraphExecMemsetNodeSetParams
  • cuGraphExecUpdate

CUDA-11.0

  • cuCtxResetPersistentingL2Cache
  • cuMemRetainAllocationHandle
  • cuStreamCopyAttributes
  • cuStreamGetAttribute
  • cuStreamSetAttribute
  • cuGraphKernelNodeCopyAttributes
  • cuGraphKernelNodeGetAttribute
  • cuGraphKernelNodeSetAttribute
  • cuOccupancyAvailableDynamicSMemPerBlock

CUDA-11.1

  • cuDeviceGetTexture1DLinearMaxWidth
  • cuArrayGetSparseProperties
  • cuMipmappedArrayGetSparseProperties
  • cuMemMapArrayAsync
  • cuEventRecordWithFlags
  • cuGraphAddEventRecordNode
  • cuGraphAddEventWaitNode
  • cuGraphEventRecordNodeGetEvent
  • cuGraphEventRecordNodeSetEvent
  • cuGraphEventWaitNodeGetEvent
  • cuGraphEventWaitNodeSetEvent
  • cuGraphExecChildGraphNodeSetParams
  • cuGraphExecEventRecordNodeSetEvent
  • cuGraphExecEventWaitNodeSetEvent
  • cuGraphUpload

CUDA-11.2

  • cuDeviceGetDefaultMemPool
  • cuDeviceGetMemPool
  • cuDeviceSetMemPool
  • cuArrayGetPlane
  • cuMemAllocAsync
  • cuMemAllocFromPoolAsync
  • cuMemFreeAsync
  • cuMemPoolCreate
  • cuMemPoolDestroy
  • cuMemPoolExportPointer
  • cuMemPoolExportToShareableHandle
  • cuMemPoolGetAccess
  • cuMemPoolGetAttribute
  • cuMemPoolImportFromShareableHandle
  • cuMemPoolImportPointer
  • cuMemPoolSetAccess
  • cuMemPoolSetAttribute
  • cuMemPoolTrimTo
  • cuGraphAddExternalSemaphoresSignalNode
  • cuGraphAddExternalSemaphoresWaitNode
  • cuGraphExecExternalSemaphoresSignalNodeSetParams
  • cuGraphExecExternalSemaphoresWaitNodeSetParams
  • cuGraphExternalSemaphoresSignalNodeGetParams
  • cuGraphExternalSemaphoresSignalNodeSetParams
  • cuGraphExternalSemaphoresWaitNodeGetParams
  • cuGraphExternalSemaphoresWaitNodeSetParams

CUDA-11.3

  • cuStreamGetCaptureInfo_v2
  • cuFuncGetModule
  • cuGraphDebugDotPrint
  • cuGraphReleaseUserObject
  • cuGraphRetainUserObject
  • cuUserObjectCreate
  • cuUserObjectRelease
  • cuUserObjectRetain
  • cuGetProcAddress

CUDA-11.4

  • cuDeviceGetUuid_v2
  • cuCtxCreate_v3
  • cuCtxGetExecAffinity
  • cuDeviceGetGraphMemAttribute
  • cuDeviceGraphMemTrim
  • cuDeviceSetGraphMemAttribute
  • cuGraphAddMemAllocNode
  • cuGraphAddMemFreeNode
  • cuGraphInstantiateWithFlags
  • cuGraphMemAllocNodeGetParams
  • cuGraphMemFreeNodeGetParams

CUDA >= 12

A lot. PRs welcome.

  • CUDA-12.3
    • Edge data in the driver Graph API (cuGraphAddDependencies_v2 etc.)

Old compatibility notes

The setup script for this package requires at least Cabal-1.24. If you run into trouble with this:

  • Cabal users: ensure you are using a new cabal executable and have run cabal update anywhere in the last few years. If you have previously run cabal install on libraries and have a broken environment as a result, remove ~/.ghc/<platfom>/environments/default.
  • Stack users: one may attempt @stack setup –upgrade-cabal@.

Due to an interaction between GHC-8 and unified virtual address spaces in CUDA, this package does not currently work with GHCi on ghc-8.0.1 (compiled programs should work). See the following for more details:

The bug should be fixed in ghc-8.0.2 and beyond.

Changes

Change Log

Notable changes to the project will be documented in this file.

The format is based on Keep a Changelog.

NOTE: The version numbers of this package roughly align to the latest version of the CUDA API this package is built against. This means that this package DOES NOT follow the PVP, or indeed any sensible version scheme, because NVIDIA are A-OK introducing breaking changes in minor updates.

[0.13.0.0] - 2026-03-30

Added

  • Support for CUDA-13

Removed

  • Support for the runtime API (Foreign.CUDA.RUntime). There is an experimental cuda-runtime package in the Git repository; contact us if you depend on this.

[0.12.8.0] - 2025-08-21

Added

  • Support for CUDA-12
    • Thanks to @noahmartinwilliams on GitHub for helping out!

Removed

  • The following modules have been deprecated for a long time, and have finally been removed in CUDA-12:
    • Foreign.CUDA.Driver.Texture
    • Foreign.CUDA.Runtime.Texture Support for Texture Objects (their replacement) is missing in these bindings so far. Contributions welcome.

0.11.0.1 - 2023-08-15

Fixed

  • Build fixes for GHC 9.2 .. 9.6

0.11.0.0 - 2021-07-05

Added

  • Add support for CUDA-11.[0..4]

0.10.2.0 - 2020-08-26

Added

  • Add support for CUDA-10.2
  • Add support for Cabal-3
  • Add device properties for SM7.x, SM8

0.10.1.0 - 2019-04-29

Added

  • Add support for CUDA-10.1

Changed

  • The function Foreign.CUDA.Driver.Graph.Capture.start has an extra parameter to specify the capture mode

Removed

  • The following functions have been deprecated (since at least CUDA-8) and are finally removed:
    • Foreign.CUDA.Runtime.Exec.launch
    • Foreign.CUDA.Runtime.Exec.setParams
    • Foreign.CUDA.Runtime.Exec.setConfig

0.10.0.0 - 2018-10-02

Added

  • Device properties for SM7

  • Functions from CUDA-9.2

    • Device.uuid
    • Stream.getContext
  • Functions from CUDA-10.0

    • Foreign.CUDA.Driver.Graph*
  • Additional bindings from older CUDA releases

Changed

  • Replace uses of String with ShortByteString

Removed

  • Support for ghc-7.6

0.9.0.3 - 2018-03-12

Fixed

  • Build fix for Cabal-2.2 (ghc-8.4)

0.9.0.2 - 2018-03-07

Fixed

  • Build fix for Nix (#53)

0.9.0.1 - 2018-02-16

Fixed

  • Build fix for macOS High Sierra (10.13)

0.9.0.0 - 2017-11-15

Fixed

  • Build fixes for CUDA-9

Added

  • Peer.getAttribute
  • Exec.launchKernelCooperative

Changed

  • Changed type of Stream.wait and Stream.write to support 64-bit values

0.8.0.1 - 2017-10-24

Fixed

  • Escape backslashes used in -D flags on Windows (#50)

0.8.0.0 - 2017-08-24

Changed

  • Tested with CUDA toolkit 8.0

Added

  • Add operations for unified addressing in the device API
  • Add write and wait operations for streams in the device API
  • (internals) The paths this module was configured against are exposed by the module Foreign.CUDA.Paths.

0.7.5.3 - 2017-03-23

Fixed

  • Bug fix in occupancy calculator

0.7.5.2 - 2017-01-06

Fixed

  • Build fails with library profiling (#43)
  • On Windows, the Cabal installer is looking in the wrong place (#45)
  • Windows install fix (#47)

0.7.5.1 - 2016-10-21

Fixed

  • Re-enable support for Cabal-1.22
  • Unknown CUDA device compute capability 6.1 (#40)
  • Compilation fails for CUDA-8 [was: ghc 7.10.3 fail to install] (#44)

0.7.5.0 - 2016-10-07

Changed

  • Tested with CUDA toolkit 7.5

Added

  • Add functions from CUDA-7.5
  • Add profiler control functions
  • Add function mallocHostForeignPtr

0.7.0.0 - 2015-11-30

Changed

  • Add support for operations from CUDA-7.0
  • Add support for online linking
  • Add support for inter-process communication
  • Bug fixes, extra documentation, improve library coverage.
  • Mac OS X no longer requires the DYLD_LIBRARY_PATH environment variable in order to compile or run programs that use this package.

0.6.7.0 - 2015-09-12

Added

  • Add support for building on Windows (thanks to @mwu-tow)

0.6.6.2 - 2015-04-04

Fixed

  • Build fix

0.6.6.1 - 2015-04-04 [YANKED]

Fixed

  • Build fixes for ghc-7.6 and ghc-7.10

0.6.6.0 - 2015-03-10

Added

  • Add compute-capability data for 3.7, 5.2 devices.

Changed

  • Combine the definition of the ‘Event’ and ‘Stream’ data types. As of CUDA-3.1 these data structures are equivalent, and can be safely shared between runtime and driver API calls and libraries.

  • Mark FFI imports of potentially long-running API functions as safe. This allows them to be safely called from Haskell threads without blocking the entire HEC.

Removed

  • Drop support for CUDA 3.0 and older.

0.6.5.1 - 2014-12-02

Fixed

  • Build fix for Mac OS X 10.10 (Yosemite)

0.6.5.0 - 2014-09-03

Changed

  • Tested with CUDA toolkit 6.5

Added

  • Add functions from CUDA-6.5