accelerate
An embedded language for accelerated array processing https://github.com/AccelerateHS/accelerate/
LTS Haskell 11.22:  1.1.1.0 
Stackage Nightly 20180312:  1.1.1.0 
Latest on Hackage:  1.2.0.1 
Module documentation for 1.1.1.0
An Embedded Language for Accelerated Array Computations
Data.Array.Accelerate
defines an embedded language of array computations for highperformance computing in Haskell. Computations on multidimensional, regular arrays are expressed in the form of parameterised collective operations (such as maps, reductions, and permutations). These computations are onlinecompiled and executed on a range of architectures.
For more details, see our papers:
 Accelerating Haskell Array Codes with Multicore GPUs
 Optimising Purely Functional GPU Programs (slides)
 Embedding Foreign Code
 Typesafe Runtime Code Generation: Accelerate to LLVM (slides) (video)
There are also slides from some fairly recent presentations:
 Embedded Languages for HighPerformance Computing in Haskell
 GPGPU Programming in Haskell with Accelerate (video) (workshop)
Chapter 6 of Simon Marlow’s book Parallel and Concurrent Programming in Haskell contains a tutorial introduction to Accelerate.
Trevor’s PhD thesis details the design and implementation of frontend optimisations and CUDA backend.
Table of Contents
A simple example
As a simple example, consider the computation of a dot product of two vectors of singleprecision floatingpoint numbers:
dotp :: Acc (Vector Float) > Acc (Vector Float) > Acc (Scalar Float)
dotp xs ys = fold (+) 0 (zipWith (*) xs ys)
Except for the type, this code is almost the same as the corresponding Haskell code on lists of floats. The types indicate that the computation may be onlinecompiled for performance; for example, using Data.Array.Accelerate.LLVM.PTX.run
it may be onthefly offloaded to a GPU.
Availability
Package accelerate is available from
 Hackage: accelerate  install with
cabal install accelerate
 GitHub: AccelerateHS/accelerate  get the source with
git clone https://github.com/AccelerateHS/accelerate.git
. The easiest way to compile the source distributions is via the Haskell stack tool.
Additional components
The following supported addons are available as separate packages:
 acceleratellvmnative: Backend targeting multicore CPUs
 acceleratellvmptx: Backend targeting CUDAenabled NVIDIA GPUs. Requires a GPU with compute capability 2.0 or greater (see the table on Wikipedia)
 accelerateexamples: Computational kernels and applications showcasing the use of Accelerate as well as a regression test suite (supporting function and performance testing)
 accelerateio: Fast conversion between Accelerate arrays and other array formats (for example, Repa and Vector)
 acceleratefft: Fast Fourier transform implementation, with FFI bindings to optimised implementations
 accelerateblas: BLAS and LAPACK operations, with FFI bindings to optimised implementations
 acceleratebignum: Fixedwidth large integer arithmetic
 colouraccelerate: Colour representations in Accelerate (RGB, sRGB, HSV, and HSL)
 glossaccelerate: Generate gloss pictures from Accelerate
 glossrasteraccelerate: Parallel rendering of raster images and animations
 lensaccelerate: Lens operators for Accelerate types
 linearaccelerate: Linear vector spaces in Accelerate
 mwcrandomaccelerate: Generate Accelerate arrays filled with high quality pseudorandom numbers
 numericpreludeaccelerate: Lifting the numericprelude to Accelerate
 wignervilleaccelerate: WignerVille timefrequency distribution.
Install them from Hackage with cabal install PACKAGENAME
.
Documentation
 Haddock documentation is included and linked with the individual package releases on Hackage.
 Haddock documentation for indevelopment components can be found here.
 The idea behind the HOAS (higherorder abstract syntax) to deBruijn conversion used in the library is described separately.
Examples
accelerateexamples
The accelerateexamples package provides a range of computational kernels and a few complete applications. To install these from Hackage, issue cabal install accelerateexamples
. The examples include:
 An implementation of canny edge detection
 An interactive mandelbrot set generator
 An Nbody simulation of gravitational attraction between solid particles
 An implementation of the PageRank algorithm
 A simple raytracer
 A particle based simulation of stable fluid flows
 A cellular automata simulation
 A “password recovery” tool, for dictionary lookup of MD5 hashes
LULESH
LULESHaccelerate is in implementation of the Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH) miniapp. LULESH represents a typical hydrodynamics code such as ALE3D, but is a highly simplified application, hardcoded to solve the Sedov blast problem on an unstructured hexahedron mesh.
Λ ○ λ (Lol)
Λ ○ λ (Lol) is a generalpurpose library for ringbased lattice cryptography. Lol has applications in, for example, symmetrickey somewhathomomorphic encryption schemes. The lolaccelerate package provides an Accelerate backend for Lol.
Additional examples
Accelerate users have also built some substantial applications of their own. Please feel free to add your own examples!
 Henning Thielemann, patchimage: Combine a collage of overlapping images
 apunktbau, bildpunkt: A raymarching distance field renderer
 klarh, hasdy: Molecular dynamics in Haskell using Accelerate
 Alexandros Gremm used Accelerate as part of the 2014 CSCS summer school (code)
Mailing list and contacts
 Mailing list:
acceleratehaskell@googlegroups.com
(discussions on both use and development are welcome)  Sign up for the mailing list at the Accelerate Google Groups page.
 Bug reports and issues tracking: GitHub project page.
The maintainers of Accelerate are Manuel M T Chakravarty chak@cse.unsw.edu.au and Trevor L McDonell tmcdonell@cse.unsw.edu.au.
Citing Accelerate
If you use Accelerate for academic research, you are encouraged (though not required) to cite the following papers (BibTeX):

Manuel M. T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. Accelerating Haskell Array Codes with Multicore GPUs. In DAMP ’11: Declarative Aspects of Multicore Programming, ACM, 2011.

Trevor L. McDonell, Manuel M. T. Chakravarty, Gabriele Keller, and Ben Lippmeier. Optimising Purely Functional GPU Programs. In ICFP ’13: The 18th ACM SIGPLAN International Conference on Functional Programming, ACM, 2013.

Robert CliftonEverest, Trevor L. McDonell, Manuel M. T. Chakravarty, and Gabriele Keller. Embedding Foreign Code. In PADL ’14: The 16th International Symposium on Practical Aspects of Declarative Languages, SpringerVerlag, LNCS, 2014.

Trevor L. McDonell, Manuel M. T. Chakravarty, Vinod Grover, and Ryan R. Newton. Typesafe Runtime Code Generation: Accelerate to LLVM. In Haskell ’15: The 8th ACM SIGPLAN Symposium on Haskell, ACM, 2015.
Accelerate is primarily developed by academics, so citations matter a lot to us. As an added benefit, you increase Accelerate’s exposure and potential user (and developer!) base, which is a benefit to all users of Accelerate. Thanks in advance!
What’s missing?
Here is a list of features that are currently missing:
 Preliminary API (parts of the API may still change in subsequent releases)
Changes
Change Log
Notable changes to the project will be documented in this file.
The format is based on Keep a Changelog and the project adheres to the Haskell Package Versioning Policy (PVP)
1.2.0.1  20181006
Fixed
 Build fix for ghc8.6
1.2.0.0  20180403
Changed
 Internal debugging/RTS options handling has been changed. Compiling this package now implies that backends are also compiled in debug mode (no need to set the
fdebug
cabal flag for those packages as well).  Complex numbers are stored in the Cstyle arrayofstruct representation.
 Improve numeric handling of complex numbers.
 Coercions (
bitcast
) now occur between the underlying representation types  Frontend performance improvements
Added
 Support for halfprecision floatingpoint numbers.
 Support for structofarrayofstruct representations. Currently this is limited to fields of 2,3,4,8, or 16elements wide.
 Add equivalents for
Data.Functor
,Data.Semigroup
(ghc8+)  Add instances and helper functions for
Maybe
andEither
types  Add rank generalised versions of
take
,drop
,head
,tail
,init
,slit
,reverse
andtranspose
.  Implement counters and reporting for
ddumpgcstats
Contributors
Special thanks to those who contributed patches as part of this release:
 Trevor L. McDonell (@tmcdonell)
 Ryan Scott (@ryanglscott)
 Rinat Striungis (@Haskellmouse)
1.1.1.0  20170926
Changed
 Improve and colourise the prettyprinter
1.1.0.0  20170921
Added

Additional EKG monitoring hooks (#340)

Operations from
RealFloat
Changed
 Changed type of
scanl'
,scanr'
to return anAcc
tuple, rather than a tuple ofAcc
arrays.  Specialised folds
sum
,product
,minimum
,maximum
,and
,or
,any
,all
now reduce along the innermost dimension only, rather than reducing all elements. You can recover the old behaviour by firstflatten
ing the input array.  Add new stencil boundary condition
function
, to apply the given function to outofbounds indices.
Fixed
 #390: Wrong number of arguments in printf
1.0.0.0  20170331
 Many API and internal changes
 Bug fixes and other enhancements
0.15.1.0
 Fix type of
allocateArray
0.15.0.0
 Bug fixes and performance improvements.
0.14.0.0
 New iteration constructs.
 Additional Preludelike functions.
 Improved code generation and fusion optimisation.
 Concurrent kernel execution in the CUDA backend.
 Bug fixes.
0.13.0.0
 New array fusion optimisation.
 New foreign function interface for array and scalar expressions.
 Additional Preludelike functions.
 New example programs.
 Bug fixes and performance improvements.
0.12.0.0
 Full sharing recovery in scalar expressions and array computations.
 Two new example applications in package
accelerateexamples
(both including a graphical frontend): A realtime Canny edge detection
 An interactive fluid flow simulator
 Bug fixes.
0.11.0.0
 New Preludelike functions
zip*
,unzip*
,fill
,enumFrom*
,tail
,init
,drop
,take
,slit
,gather*
,scatter*
, andshapeSize
.  New simplified AST (in package
acceleratebackendkit
) for backend writers who want to avoid the complexities of the typesafe AST.
0.10.0.0
 Complete sharing recovery for scalar expressions (but currently disabled by default).
 Also bug fixes in array sharing recovery and a few new convenience functions.
0.9.0.0
 Streaming computations
 Precompilation
 Repastyle array indices
 Additional collective operations supported by the CUDA backend:
stencil
s, morescan
s, rankpolymorphicfold
,generate
.  Conversions to other array formats
 Bug fixes
0.8.1.0
 Bug fixes and some performance tweaks.
0.8.0.0
 More collective operations supported by the CUDA backend:
replicate
,slice
andfoldSeg
. Frontend and interpreter support forstencil
.  Bug fixes.
0.7.1.0
 Initial release of the CUDA backend