An Embedded Language for Accelerated Array Computations
Data.Array.Accelerate defines an embedded language of array computations for high-performance computing in Haskell. Computations on multi-dimensional, regular arrays are expressed in the form of parameterised collective operations (such as maps, reductions, and permutations). These computations are online-compiled and executed on a range of architectures.
For more details, see our papers:
- Accelerating Haskell Array Codes with Multicore GPUs
- Optimising Purely Functional GPU Programs (slides)
- Embedding Foreign Code
- Type-safe Runtime Code Generation: Accelerate to LLVM (slides) (video)
There are also slides from some fairly recent presentations:
- Embedded Languages for High-Performance Computing in Haskell
- GPGPU Programming in Haskell with Accelerate (video) (workshop)
Chapter 6 of Simon Marlow's book Parallel and Concurrent Programming in Haskell contains a tutorial introduction to Accelerate.
Trevor's PhD thesis details the design and implementation of frontend optimisations and CUDA backend.
Table of Contents
An Embedded Language for Accelerated Array Computations - A simple example - Availability - Additional components - Requirements - Documentation - Examples - Mailing list and contacts - Citing Accelerate - What's missing?
A simple example
As a simple example, consider the computation of a dot product of two vectors of single-precision floating-point numbers:
dotp :: Acc (Vector Float) -> Acc (Vector Float) -> Acc (Scalar Float) dotp xs ys = fold (+) 0 (zipWith (*) xs ys)
Except for the type, this code is almost the same as the corresponding Haskell code on lists of floats. The types indicate that the computation may be online-compiled for performance; for example, using
Data.Array.Accelerate.LLVM.PTX.run it may be on-the-fly off-loaded to a GPU.
Package accelerate is available from
- Hackage: accelerate - install with
cabal install accelerate
- GitHub: AccelerateHS/accelerate - get the source with
git clone https://github.com/AccelerateHS/accelerate.git. The easiest way to compile the source distributions is via the Haskell stack tool.
The following supported add-ons are available as separate packages:
- accelerate-llvm-native: Backend targeting multicore CPUs
- accelerate-llvm-ptx: Backend targeting CUDA-enabled NVIDIA GPUs. Requires a GPU with compute capability 2.0 or greater (see the table on Wikipedia)
- accelerate-examples: Computational kernels and applications showcasing the use of Accelerate as well as a regression test suite (supporting function and performance testing)
- accelerate-io: Fast conversion between Accelerate arrays and other array formats (for example, Repa and Vector)
- accelerate-fft: Fast Fourier transform implementation, with FFI bindings to optimised implementations
- accelerate-blas: BLAS and LAPACK operations, with FFI bindings to optimised implementations
- accelerate-bignum: Fixed-width large integer arithmetic
- colour-accelerate: Colour representations in Accelerate (RGB, sRGB, HSV, and HSL)
- gloss-accelerate: Generate gloss pictures from Accelerate
- gloss-raster-accelerate: Parallel rendering of raster images and animations
- lens-accelerate: Lens operators for Accelerate types
- linear-accelerate: Linear vector spaces in Accelerate
- mwc-random-accelerate: Generate Accelerate arrays filled with high quality pseudorandom numbers
- numeric-prelude-accelerate: Lifting the numeric-prelude to Accelerate
- wigner-ville-accelerate: Wigner-Ville time-frequency distribution.
Install them from Hackage with
cabal install PACKAGENAME.
- Haddock documentation is included and linked with the individual package releases on Hackage.
- Haddock documentation for in-development components can be found here.
- The idea behind the HOAS (higher-order abstract syntax) to de-Bruijn conversion used in the library is described separately.
The accelerate-examples package provides a range of computational kernels and a few complete applications. To install these from Hackage, issue
cabal install accelerate-examples. The examples include:
- An implementation of canny edge detection
- An interactive mandelbrot set generator
- An N-body simulation of gravitational attraction between solid particles
- An implementation of the PageRank algorithm
- A simple ray-tracer
- A particle based simulation of stable fluid flows
- A cellular automata simulation
- A "password recovery" tool, for dictionary lookup of MD5 hashes
LULESH-accelerate is in implementation of the Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH) mini-app. LULESH represents a typical hydrodynamics code such as ALE3D, but is a highly simplified application, hard-coded to solve the Sedov blast problem on an unstructured hexahedron mesh.
Λ ○ λ (Lol)
Λ ○ λ (Lol) is a general-purpose library for ring-based lattice cryptography. Lol has applications in, for example, symmetric-key somewhat-homomorphic encryption schemes. The lol-accelerate package provides an Accelerate backend for Lol.
Accelerate users have also built some substantial applications of their own. Please feel free to add your own examples!
- Henning Thielemann, patch-image: Combine a collage of overlapping images
- apunktbau, bildpunkt: A ray-marching distance field renderer
- klarh, hasdy: Molecular dynamics in Haskell using Accelerate
- Alexandros Gremm used Accelerate as part of the 2014 CSCS summer school (code)
Mailing list and contacts
- Mailing list:
firstname.lastname@example.org(discussions on both use and development are welcome)
- Sign up for the mailing list at the Accelerate Google Groups page.
- Bug reports and issues tracking: GitHub project page.
The maintainers of Accelerate are Manuel M T Chakravarty and Trevor L McDonell .
If you use Accelerate for academic research, you are encouraged (though not required) to cite the following papers (BibTeX):
Manuel M. T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. Accelerating Haskell Array Codes with Multicore GPUs. In DAMP '11: Declarative Aspects of Multicore Programming, ACM, 2011.
Trevor L. McDonell, Manuel M. T. Chakravarty, Gabriele Keller, and Ben Lippmeier. Optimising Purely Functional GPU Programs. In ICFP '13: The 18th ACM SIGPLAN International Conference on Functional Programming, ACM, 2013.
Robert Clifton-Everest, Trevor L. McDonell, Manuel M. T. Chakravarty, and Gabriele Keller. Embedding Foreign Code. In PADL '14: The 16th International Symposium on Practical Aspects of Declarative Languages, Springer-Verlag, LNCS, 2014.
Trevor L. McDonell, Manuel M. T. Chakravarty, Vinod Grover, and Ryan R. Newton. Type-safe Runtime Code Generation: Accelerate to LLVM. In Haskell '15: The 8th ACM SIGPLAN Symposium on Haskell, ACM, 2015.
Accelerate is primarily developed by academics, so citations matter a lot to us. As an added benefit, you increase Accelerate's exposure and potential user (and developer!) base, which is a benefit to all users of Accelerate. Thanks in advance!
Here is a list of features that are currently missing:
Preliminary API (parts of the API may still change in subsequent releases)
Notable changes to the project will be documented in this file.
126.96.36.199 - 2018-04-03
- Internal debugging/RTS options handling has been changed. Compiling this package now implies that backends are also compiled in debug mode (no need to set the
-fdebugcabal flag for those packages as well).
- Complex numbers are stored in the C-style array-of-struct representation.
- Improve numeric handling of complex numbers.
- Coercions (
bitcast) now occur between the underlying representation types
- Front-end performance improvements
- Support for half-precision floating-point numbers.
- Support for struct-of-array-of-struct representations. Currently this is limited to fields of 2,3,4,8, or 16-elements wide.
- Add equivalents for
- Add instances and helper functions for
- Add rank generalised versions of
- Implement counters and reporting for
Special thanks to those who contributed patches as part of this release:
- Trevor L. McDonell (@tmcdonell)
- Ryan Scott (@ryanglscott)
- Rinat Striungis (@Haskell-mouse)
188.8.131.52 - 2017-09-26
Improve and colourise the pretty-printer
184.108.40.206 - 2017-09-21
Additional EKG monitoring hooks (#340)
- Changed type of
scanr'to return an
Acctuple, rather than a tuple of
- Specialised folds
allnow reduce along the innermost dimension only, rather than reducing all elements. You can recover the old behaviour by first
flatten-ing the input array.
- Add new stencil boundary condition
function, to apply the given function to out-of-bounds indices.
220.127.116.11 - 2017-03-31
- Many API and internal changes
- Bug fixes and other enhancements
Fix type of
Bug fixes and performance improvements.
- New iteration constructs.
- Additional Prelude-like functions.
- Improved code generation and fusion optimisation.
- Concurrent kernel execution in the CUDA backend.
- Bug fixes.
- New array fusion optimisation.
- New foreign function interface for array and scalar expressions.
- Additional Prelude-like functions.
- New example programs.
- Bug fixes and performance improvements.
- Full sharing recovery in scalar expressions and array computations.
- Two new example applications in package
accelerate-examples(both including a graphical frontend):
- A real-time Canny edge detection
- An interactive fluid flow simulator
- Bug fixes.
- New Prelude-like functions
- New simplified AST (in package
accelerate-backend-kit) for backend writers who want to avoid the complexities of the type-safe AST.
- Complete sharing recovery for scalar expressions (but currently disabled by default).
- Also bug fixes in array sharing recovery and a few new convenience functions.
- Streaming computations
- Repa-style array indices
- Additional collective operations supported by the CUDA backend:
- Conversions to other array formats
- Bug fixes
Bug fixes and some performance tweaks.
- More collective operations supported by the CUDA backend:
foldSeg. Frontend and interpreter support for
- Bug fixes.
Initial release of the CUDA backend