Data.Array.Accelerate
defines an embedded array language for computations
for high-performance computing in Haskell. Computations on multi-dimensional,
regular arrays are expressed in the form of parameterised collective
operations, such as maps, reductions, and permutations. These computations may
then be online compiled and executed on a range of architectures.
- A simple example
As a simple example, consider the computation of a dot product of two vectors
of floating point numbers:
dotp :: Acc (Vector Float) -> Acc (Vector Float) -> Acc (Scalar Float)
dotp xs ys = fold (+) 0 (zipWith (*) xs ys)
Except for the type, this code is almost the same as the corresponding Haskell
code on lists of floats. The types indicate that the computation may be
online-compiled for performance - for example, using
Data.Array.Accelerate.CUDA
it may be on-the-fly off-loaded to the GPU.
- Available backends
Currently, there are two backends:
An interpreter that serves as a reference implementation of the intended
semantics of the language, which is included in this package.
A CUDA backend generating code for CUDA-capable NVIDIA GPUs:
http://hackage.haskell.org/package/accelerate-cuda
Several experimental and/or incomplete backends also exist. If you are
particularly interested in any of these, especially with helping to finish
them, please contact us.
Cilk/ICC and OpenCL: https://github.com/AccelerateHS/accelerate-backend-kit
Another OpenCL backend: https://github.com/HIPERFIT/accelerate-opencl
A backend to the Repa array library: https://github.com/blambo/accelerate-repa
An infrastructure for generating LLVM code, with backends targeting
multicore CPUs and NVIDIA GPUs: https://github.com/AccelerateHS/accelerate-llvm/
- Additional components
The following support packages are available:
accelerate-cuda
: A high-performance parallel backend targeting
CUDA-enabled NVIDIA GPUs. Requires the NVIDIA CUDA SDK and, for full
functionality, hardware with compute capability 1.1 or greater. See the
table on Wikipedia for supported GPUs:
http://en.wikipedia.org/wiki/CUDA#Supported_GPUs
accelerate-examples
: Computational kernels and applications showcasing
Accelerate, as well as performance and regression tests.
accelerate-io
: Fast conversion between Accelerate arrays and other
formats, including vector
and repa
.
accelerate-fft
: Computation of Discrete Fourier Transforms.
Install them from Hackage with cabal install PACKAGE
- Examples and documentation
Haddock documentation is included in the package, and a tutorial is available
on the GitHub wiki: https://github.com/AccelerateHS/accelerate/wiki
The accelerate-examples
package demonstrates a range of computational
kernels and several complete applications, including:
An implementation of the Canny edge detection algorithm
An interactive Mandelbrot set generator
A particle-based simulation of stable fluid flows
An n-body simulation of gravitational attraction between solid particles
A cellular automata simulation
A "password recovery" tool, for dictionary lookup of MD5 hashes
A simple interactive ray tracer
- Mailing list and contacts
- Release notes
0.15.0.0: Bug fixes and performance improvements.
0.14.0.0: New iteration constructs. Additional Prelude-like functions.
Improved code generation and fusion optimisation. Concurrent kernel
execution. Bug fixes.
0.13.0.0: New array fusion optimisation. New foreign function
interface for array and scalar expressions. Additional Prelude-like
functions. New example programs. Bug fixes and performance improvements.
0.12.0.0: Full sharing recovery in scalar expressions and array
computations. Two new example applications in package
accelerate-examples
: Real-time Canny edge detection and fluid flow
simulator (both including a graphical frontend). Bug fixes.
0.11.0.0: New Prelude-like functions zip*
, unzip*
,
fill
, enumFrom*
, tail
, init
, drop
, take
, slit
, gather*
,
scatter*
, and shapeSize
. New simplified AST (in package
accelerate-backend-kit
) for backend writers who want to avoid the
complexities of the type-safe AST.
0.10.0.0: Complete sharing recovery for scalar expressions (but
currently disabled by default). Also bug fixes in array sharing recovery
and a few new convenience functions.
0.9.0.0: Streaming, precompilation, Repa-style indices,
stencil
s, more scan
s, rank-polymorphic fold
, generate
, block I/O &
many bug fixes.
0.8.1.0: Bug fixes and some performance tweaks.
0.8.0.0: replicate
, slice
and foldSeg
supported in the
CUDA backend; frontend and interpreter support for stencil
. Bug fixes.
0.7.1.0: The CUDA backend and a number of scalar functions.
- Hackage note
The module documentation list generated by Hackage is incorrect. The only
exposed modules should be: