massiv

massiv is a Haskell library for array manipulation. Performance is one of its main goals, thus it is capable of seamless parallelization of most of the operations provided by the library

The name for this library comes from the Russian word Massiv (Масси́в), which means an Array.

Status

Language Github Actions Coveralls Gitter.im
GitHub top language Build Status Coverage Status Join the chat at https://gitter.im/haskell-massiv/Lobby
Package Hackage Nightly LTS
massiv Hackage Nightly Nightly
massiv-io Hackage Nightly Nightly
massiv-test Hackage Nightly Nightly
haskell-scheduler Hackage Nightly Nightly

Introduction

Everything in the library revolves around an Array r ix e - a data family for anything that can be thought of as an array. The type variables, from the end, are:

  • e - element of an array.
  • ix - an index that will map to an actual element. The index must be an instance of the Index class with the default one being an Ix n type family and an optional being tuples of Ints.
  • r - underlying representation. There are two main categories of representations described below.

Manifest

These are your classical arrays that are located in memory and allow constant time lookup of elements. Another main property they share is that they have a mutable interface. An Array with manifest representation can be thawed into a mutable MArray and then frozen back into its immutable counterpart after some destructive operation is applied to the mutable copy. The differences among representations below is in the way that elements are being accessed in memory:

  • P - Array with elements that are an instance of Prim type class, i.e. common Haskell primitive types: Int, Word, Char, etc. It is backed by unpinned memory and based on ByteArray.
  • U - Unboxed arrays. The elements are instances of the Unbox type class. Usually just as fast as P, but has a slightly wider range of data types that it can work with. Notable data types that can be stored as elements are Bool, tuples and Ix n.
  • S - Storable arrays. Backed by pinned memory and based on ForeignPtr, while elements are instances of the Storable type class.
  • B - Boxed arrays that don’t have restrictions on their elements, since they are represented as pointers to elements, thus making them the slowest type of array, but also the most general. Arrays of this representation are element strict, in other words its elements are kept in Weak-Head Normal Form (WHNF).
  • BN - Also boxed arrays, but unlike the other representation B, its elements are in Normal Form, i.e. in a fully evaluated state and no thunks or memory leaks are possible. It does require an NFData instance for the elements though.
  • BL - Boxed lazy array. Just like B and BN, except values are evaluated on demand.

Delayed

Main trait of delayed arrays is that they do not exist in memory and instead describe the contents of an array as a function or a composition of functions. In fact all of the fusion capabilities in massiv can be attributed to delayed arrays.

  • D - Delayed “pull” array is just a function from an index to an element: (ix -> e). Therefore indexing into this type of array is not possible, instead elements are evaluated with the evaluateM function each time when applied to an index. It gives us a nice ability to compose functions together when applied to an array and possibly even fold over without ever allocating intermediate manifest arrays.
  • DW - Delayed windowed array is very similar to the version above, except it has two functions that describe it, one for the near border elements and one for the interior, aka. the window. This is used for Stencil computation and things that derive from it, such as convolution, for instance.
  • DL - Delayed “push” array contains a monadic action that describes how an array can be loaded into memory. This is most useful for composing arrays together.
  • DS - Delayed stream array is a sequence of elements, possibly even an infinite one. This is most useful for situations when we don’t know the size of our resulting array ahead of time, which is common in operations such as filter, mapMaybe, unfold etc. Naturally, in the end we can only load such an array into a flat vector.
  • DI - Is just like D, except loading is interleaved and is useful for parallel loading arrays with unbalanced computation, such as Mandelbrot set or ray tracing, for example.

Construct

Creating a delayed type of array allows us to fuse any future operations we decide to perform on it. Let’s look at this example:

λ> import Data.Massiv.Array as A
λ> makeVectorR D Seq 10 id
Array D Seq (Sz1 10)
  [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]

Here we created a delayed vector of size 10, which is in reality just an id function from its index to an element (see the Computation section for the meaning of Seq). So let’s go ahead and square its elements

λ> makeVectorR D Seq 10 id
λ> evaluateM vec 4
4
λ> vec2 = A.map (^ (2 :: Int)) vec
λ> evaluateM vec2 4
16

It’s not that exciting, since every time we call evaluateM it will recompute the element, every time, therefore this function should be avoided at all costs! Instead we can use all of the functions that take Source like arrays and then fuse that computation together by calling compute, or a handy computeAs function and only afterwards apply an indexM function or its partial synonym: (!). Any delayed array can also be reduced using one of the folding functions, thus completely avoiding any memory allocation, or converted to a list, if that’s what you need:

λ> vec2U = computeAs U vec2
λ> vec2U
Array U Seq (Sz1 10)
  [ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81 ]
λ> vec2U ! 4
16
λ> toList vec2U
[0,1,4,9,16,25,36,49,64,81]
λ> A.sum vec2U
285

There is a whole multitude of ways to construct arrays:

  • by using one of many helper functions: makeArray, range, rangeStepFrom, enumFromN, etc.
  • through conversion: from lists, from Vectors in vector library, from ByteStrings in bytestring;
  • with a mutable interface in PrimMonad (IO, ST, etc.), eg: makeMArray, generateArray, unfoldrPrim, etc.

It’s worth noting that, in the next example, nested lists will be loaded into an unboxed manifest array and the sum of its elements will be computed in parallel on all available cores.

λ> A.sum (fromLists' Par [[0,0,0,0,0],[0,1,2,3,4],[0,2,4,6,8]] :: Array U Ix2 Double)
30.0

The above wouldn’t run in parallel in ghci of course, as the program would have to be compiled with ghc using -threaded -with-rtsopts=-N flags in order to use all available cores. Alternatively we could compile with the -threaded flag and then pass the number of capabilities directly to the runtime with +RTS -N<n>, where <n> is the number of cores you’d like to utilize.

Index

The main Ix n closed type family can be somewhat confusing, but there is no need to fully understand how it works in order to start using it. GHC might ask you for the DataKinds language extension if IxN n is used in a type signature, but there are type and pattern synonyms for the first five dimensions: Ix1, Ix2, Ix3, Ix4 and Ix5.

There are three distinguishable constructors for the index:

  • The first one is simply an int: Ix1 = Ix 1 = Int, therefore vectors can be indexed in a usual way without some extra wrapping data type, just as it was demonstrated in a previous section.
  • The second one is Ix2 for operating on 2-dimensional arrays and has a constructor :.
λ> makeArrayR D Seq (Sz (3 :. 5)) (\ (i :. j) -> i * j)
Array D Seq (Sz (3 :. 5))
  [ [ 0, 0, 0, 0, 0 ]
  , [ 0, 1, 2, 3, 4 ]
  , [ 0, 2, 4, 6, 8 ]
  ]
  • The third one is IxN n and is designed for working with N-dimensional arrays, and has a similar looking constructor :>, except that it can be chained indefinitely on top of :.
λ> arr3 = makeArrayR P Seq (Sz (3 :> 2 :. 5)) (\ (i :> j :. k) -> i * j + k)
λ> :t arr3
arr3 :: Array P (IxN 3) Int
λ> arr3
Array P Seq (Sz (3 :> 2 :. 5))
  [ [ [ 0, 1, 2, 3, 4 ]
    , [ 0, 1, 2, 3, 4 ]
    ]
  , [ [ 0, 1, 2, 3, 4 ]
    , [ 1, 2, 3, 4, 5 ]
    ]
  , [ [ 0, 1, 2, 3, 4 ]
    , [ 2, 3, 4, 5, 6 ]
    ]
  ]
λ> arr3 ! (2 :> 1 :. 4)
6
λ> ix10 = 10 :> 9 :> 8 :> 7 :> 6 :> 5 :> 4 :> 3 :> 2 :. 1
λ> :t ix10
ix10 :: IxN 10
λ> ix10 -- 10-dimensional index
10 :> 9 :> 8 :> 7 :> 6 :> 5 :> 4 :> 3 :> 2 :. 1

Here is how we can construct a 4-dimensional array and sum its elements in constant memory:

λ> arr = makeArrayR D Seq (Sz (10 :> 20 :> 30 :. 40)) $ \ (i :> j :> k :. l) -> (i * j + k) * k + l
λ> :t arr -- a 4-dimensional array
arr :: Array D (IxN 4) Int
λ> A.sum arr
221890000

There are quite a few helper functions that can operate on indices, but these are only needed when writing functions that work for arrays of arbitrary dimension, as such they are scarcely used:

λ> pullOutDim' ix10 5
(5,10 :> 9 :> 8 :> 7 :> 6 :> 4 :> 3 :> 2 :. 1)
λ> unconsDim ix10
(10,9 :> 8 :> 7 :> 6 :> 5 :> 4 :> 3 :> 2 :. 1)
λ> unsnocDim ix10
(10 :> 9 :> 8 :> 7 :> 6 :> 5 :> 4 :> 3 :. 2,1)

All of the Ix n indices are instances of Num so basic numeric operations are made easier:

λ> (1 :> 2 :. 3) + (4 :> 5 :. 6)
5 :> 7 :. 9
λ> 5 :: Ix4
5 :> 5 :> 5 :. 5

It is important to note that the size type is distinct from the index by the newtype wrapper Sz ix. There is a constructor Sz, which will make sure that none of the dimensions are negative:

λ> Sz (2 :> 3 :. 4)
Sz (2 :> 3 :. 4)
λ> Sz (10 :> 2 :> -30 :. 4)
Sz (10 :> 2 :> 0 :. 4)

Same as with indices, there are helper pattern synonyms: Sz1, Sz2, Sz3, Sz4 and Sz5.

λ> Sz3 2 3 4
Sz (2 :> 3 :. 4)
λ> Sz4 10 2 (-30) 4
Sz (10 :> 2 :> 0 :. 4)

As well as the Num instance:

λ> 4 :: Sz5
Sz (4 :> 4 :> 4 :> 4 :. 4)
λ> (Sz2 1 2) + 3
Sz (4 :. 5)
λ> (Sz2 1 2) - 3
Sz (0 :. 0)

Alternatively tuples of Ints can be used for working with arrays, up to and including 5-tuples (type synonyms: Ix2T .. Ix5T), but since tuples are polymorphic it is necessary to restrict the resulting array type. Not all operations in the library support tuples, so it is advised to avoid them for indexing.

λ> makeArray Seq (4, 20) (uncurry (*)) :: Array P Ix2T Int
(Array P Seq ((4,20))
  [ [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
  , [ 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 ]
  , [ 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 ]
  , [ 0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57 ]
  ])
λ> :i Ix2T
type Ix2T = (Int, Int)

There are helper functions that can go back and forth between tuples and Ix n indices.

λ> fromIx4 (3 :> 4 :> 5 :. 6)
(3,4,5,6)
λ> toIx5 (3, 4, 5, 6, 7)
3 :> 4 :> 5 :> 6 :. 7

Slicing

In order to get a subsection of an array there is no need to recompute it, unless we want to free up the no longer memory, of course. So, there are a few slicing, resizing and extraction operators that can do it all in constant time, modulo the index manipulation:

λ> arr = makeArrayR U Seq (Sz (4 :> 2 :. 6)) fromIx3
λ> arr !> 3 !> 1
Array M Seq (Sz1 6)
  [ (3,1,0), (3,1,1), (3,1,2), (3,1,3), (3,1,4), (3,1,5) ]

As you might suspect all of the slicing, indexing, extracting, resizing operations are partial, and those are frowned upon in Haskell. So there are matching functions that can do the same operations safely by using MonadThrow and thus returning Nothing, Left SomeException or throwing an exception in case of IO on failure, for example:

λ> arr !?> 3 ??> 1
Array M Seq (Sz1 6)
  [ (3,1,0), (3,1,1), (3,1,2), (3,1,3), (3,1,4), (3,1,5) ]
λ> arr !?> 3 ??> 1 ?? 0 :: Maybe (Int, Int, Int)
Just (3,1,0)

In above examples we first take a slice at the 4th page (index 3, since we start at 0), then another one at the 2nd row (index 1). While in the last example we also take 1st element at position 0. Pretty neat, huh? Naturally, by doing a slice we always reduce dimension by one. We can do slicing from the outside as well as from the inside:

λ> Ix1 1 ... 9
Array D Seq (Sz1 10)
  [ 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
λ> a <- resizeM (Sz (3 :> 2 :. 4)) $ Ix1 11 ... 34
λ> a
Array D Seq (Sz (3 :> 2 :. 4))
  [ [ [ 11, 12, 13, 14 ]
    , [ 15, 16, 17, 18 ]
    ]
  , [ [ 19, 20, 21, 22 ]
    , [ 23, 24, 25, 26 ]
    ]
  , [ [ 27, 28, 29, 30 ]
    , [ 31, 32, 33, 34 ]
    ]
  ]
λ> a !> 0
Array D Seq (Sz (2 :. 4))
  [ [ 11, 12, 13, 14 ]
  , [ 15, 16, 17, 18 ]
  ]
λ> a <! 0
Array D Seq (Sz (3 :. 2))
  [ [ 11, 15 ]
  , [ 19, 23 ]
  , [ 27, 31 ]
  ]

Or we can slice along any other available dimension:

λ> a <!> (Dim 2, 0)
Array D Seq (Sz (3 :. 4))
  [ [ 11, 12, 13, 14 ]
  , [ 19, 20, 21, 22 ]
  , [ 27, 28, 29, 30 ]
  ]

In order to extract sub-array while preserving dimensionality we can use extractM or extractFromToM.

λ> extractM (0 :> 1 :. 1) (Sz (3 :> 1 :. 2)) a
Array D Seq (Sz (3 :> 1 :. 2))
  [ [ [ 16, 17 ]
    ]
  , [ [ 24, 25 ]
    ]
  , [ [ 32, 33 ]
    ]
  ]
λ> extractFromToM (1 :> 0 :. 1) (3 :> 2 :. 4) a
Array D Seq (Sz (2 :> 2 :. 3))
  [ [ [ 20, 21, 22 ]
    , [ 24, 25, 26 ]
    ]
  , [ [ 28, 29, 30 ]
    , [ 32, 33, 34 ]
    ]
  ]

Computation and parallelism

There is a data type Comp that controls how elements will be computed when calling the compute function. It has a few constructors, although most of the time either Seq or Par will be sufficient:

  • Seq - computation will be done sequentially on one core (capability in ghc).
  • ParOn [Int] - perform computation in parallel while pinning the workers to particular cores. Providing an empty list will result in the computation being distributed over all available cores, or better known in Haskell as capabilities.
  • ParN Word16 - similar to ParOn, except it simply specifies the number of cores to use, with 0 meaning all cores.
  • Par - isn’t really a constructor but a pattern for constructing ParOn [], which will result in Scheduler using all cores, thus should be used instead of ParOn.
  • Par' - similar to Par, except it uses ParN 0 underneath.

Just to make sure a simple novice mistake is prevented, which I have seen in the past, make sure your source code is compiled with ghc -O2 -threaded -with-rtsopts=-N, otherwise no parallelization and poor performance are waiting for you. Also a bit later you might notice the {-# INLINE funcName #-} pragma being used, oftentimes it is a good idea to do that, but not always required. It is worthwhile to benchmark and experiment.

Stencil

Instead of manually iterating over a multi-dimensional array and applying a function to each element, while reading its neighboring elements (as you would do in an imperative language) in a functional language it is much more efficient to apply a stencil function and let the library take care of all of bounds checking and iterating in a cache friendly manner.

What’s a stencil? It is a declarative way of specifying a pattern for how elements of an array in a neighborhood will be used in order to update each element of the newly created array. In massiv a Stencil is a function that can read the neighboring elements of the stencil’s center (the zero index), and only those, and then outputs a new value for the center element.

stencil

Let’s create a simple, but somewhat meaningful array and create an averaging stencil. There is nothing special about the array itself, but the averaging filter is a stencil that sums the elements in a Moore neighborhood and divides the result by 9, i.e. finds the average of a 3 by 3 square.

arrLightIx2 :: Comp -> Sz Ix2 -> Array D Ix2 Double
arrLightIx2 comp arrSz = makeArray comp arrSz $ \ (i :. j) -> sin (fromIntegral (i * i + j * j))
{-# INLINE arrLightIx2 #-}

average3x3Filter :: Fractional a => Stencil Ix2 a a
average3x3Filter = makeStencil (Sz (3 :. 3)) (1 :. 1) $ \ get ->
  (  get (-1 :. -1) + get (-1 :. 0) + get (-1 :. 1) +
     get ( 0 :. -1) + get ( 0 :. 0) + get ( 0 :. 1) +
     get ( 1 :. -1) + get ( 1 :. 0) + get ( 1 :. 1)   ) / 9
{-# INLINE average3x3Filter #-}

Here is what it would look like in GHCi. We create a delayed array with some funky periodic function, and make sure it is computed prior to mapping an average stencil over it:

λ> arr = computeAs U $ arrLightIx2 Par (Sz (600 :. 800))
λ> :t arr
arr :: Array U Ix2 Double
λ> :t mapStencil Edge average3x3Filter arr
mapStencil Edge average3x3Filter arr :: Array DW Ix2 Double

As you can see, that operation produced an array of the earlier mentioned representation Delayed Windowed DW. In its essence DW is an array type that does no bounds checking in order to gain performance, except when it’s near the border, where it uses a border resolution technique supplied by the user (Edge in the example above). Currently it is used only in stencils and not much else can be done to an array of this type besides further computing it into a manifest representation.

This example will be continued in the next section, but before that I would like to mention that some might notice that it looks very much like convolution, and in fact convolution can be implemented with a stencil. There is a helper function makeConvolutionStencil that lets you do just that. For the sake of example we’ll do a sum of all neighbors by hand instead:

sum3x3Filter :: Fractional a => Stencil Ix2 a a
sum3x3Filter = makeConvolutionStencil (Sz (3 :. 3)) (1 :. 1) $ \ get ->
  get (-1 :. -1) 1 . get (-1 :. 0) 1 . get (-1 :. 1) 1 .
  get ( 0 :. -1) 1 . get ( 0 :. 0) 1 . get ( 0 :. 1) 1 .
  get ( 1 :. -1) 1 . get ( 1 :. 0) 1 . get ( 1 :. 1) 1
{-# INLINE sum3x3Filter #-}

There is not a single plus or multiplication sign, that is because convolutions is actually summation of elements multiplied by a kernel element, so instead we have composition of functions applied to an offset index and a multiplier. After we map that stencil, we can further divide each element of the array by 9 in order to get the average. Yeah, I lied a bit, Array DW ix is an instance of Functor class, so we can map functions over it, which will be fused as with a regular Delayed array:

computeAs U $ fmap (/9) $ mapStencil Edge sum3x3Filter arr

If you are still confused of what a stencil is, but you are familiar with Conway’s Game of Life this should hopefully clarify it a bit more. The function life below is a single iteration of Game of Life:

lifeRules :: Word8 -> Word8 -> Word8
lifeRules 0 3 = 1
lifeRules 1 2 = 1
lifeRules 1 3 = 1
lifeRules _ _ = 0

lifeStencil :: Stencil Ix2 Word8 Word8
lifeStencil = makeStencil (Sz (3 :. 3)) (1 :. 1) $ \ get ->
  lifeRules (get (0 :. 0)) $ get (-1 :. -1) + get (-1 :. 0) + get (-1 :. 1) +
                             get ( 0 :. -1)         +         get ( 0 :. 1) +
                             get ( 1 :. -1) + get ( 1 :. 0) + get ( 1 :. 1)

life :: Array S Ix2 Word8 -> Array S Ix2 Word8
life = compute . mapStencil Wrap lifeStencil

The full working example that uses GLUT and OpenGL is located in GameOfLife. You can run it if you have the GLUT dependencies installed:

$ cd massiv-examples && stack run GameOfLife

massiv-io

In order to do anything useful with arrays we often need to be able to read some data from a file. Considering that most common array-like files are images, massiv-io provides an interface to read, write and display images in common formats using Haskell native JuicyPixels and Netpbm packages.

Color package provides a variety of color spaces and conversions between them, which are used by massiv-io package as pixels during reading and writing images.

An earlier example wasn’t particularly interesting, since we couldn’t visualize what is actually going on, so let’s expand on it:

import Data.Massiv.Array
import Data.Massiv.Array.IO

main :: IO ()
main = do
  let arr = computeAs S $ arrLightIx2 Par (600 :. 800)
      toImage ::
           (Functor (Array r Ix2), Load r Ix2 (Pixel (Y' SRGB) Word8))
        => Array r Ix2 Double
        -> Image S (Y' SRGB) Word8
      toImage = computeAs S . fmap (PixelY' . toWord8)
      lightPath = "files/light.png"
      lightImage = toImage $ delay arr
      lightAvgPath = "files/light_avg.png"
      lightAvgImage = toImage $ mapStencil Edge (avgStencil 3) arr
      lightSumPath = "files/light_sum.png"
      lightSumImage = toImage $ mapStencil Edge (sumStencil 3) arr
  writeImage lightPath lightImage
  putStrLn $ "written: " ++ lightPath
  writeImage lightAvgPath lightAvgImage
  putStrLn $ "written: " ++ lightAvgPath
  writeImage lightSumPath lightSumImage
  putStrLn $ "written: " ++ lightSumPath
  displayImageUsing defaultViewer True . computeAs S
    =<< concatM 1 [lightAvgImage, lightImage, lightSumImage]

massiv-examples/vision/files/light.png:

Light

massiv-examples/vision/files/light_avg.png:

Light Average

The full example is in the example vision package and if you have stack installed you can run it as:

$ cd massiv-examples && stack run avg-sum

Other libraries

A natural question might come to mind: Why even bother with a new array library when we already have a few really good ones in the Haskell world? The main reasons for me are performance and usability. I personally felt like there was much room for improvement before I even started working on this package, and it seems like it turned out to be true. For example, the most common goto library for dealing with multidimensional arrays and parallel computation used to be Repa, which I personally was a big fan of for quite some time, to the point that I even wrote a Haskell Image Processing library based on top of it.

Here is a quick summary of how massiv is better than Repa:

  • It is actively maintained.
  • Much more sophisticated scheduler. It is resumable and is capable of handling nested parallel computation.
  • Improved indexing data types.
  • Safe stencils for arbitrary dimensions, not only 2D convolution. Stencils are composable
  • Improved performance on almost all operations.
  • Structural parallel folds (i.e. left/right - direction is preserved)
  • Super easy slicing.
  • Extensive mutable interface
  • More fusion capabilities with delayed stream and push array representations.
  • Delayed arrays aren’t indexable, only Manifest are (saving user from common pitfall in Repa of trying to read elements of delayed array)

As far as usability of the library goes, it is very subjective, thus I’ll let you be a judge of that. When talking about performance it is the facts that do matter. Thus, let’s not continue this discussion in pure abstract words, below is a glimpse into benchmarks against Repa library running with GHC 8.8.4 on Intel® Core™ i7-3740QM CPU @ 2.70GHz × 8

Matrix multiplication:

benchmarking Repa/MxM U Double - (500x800 X 800x500)/Par
time                 120.5 ms   (115.0 ms .. 127.2 ms)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 124.1 ms   (121.2 ms .. 127.3 ms)
std dev              5.212 ms   (2.422 ms .. 6.620 ms)
variance introduced by outliers: 11% (moderately inflated)

benchmarking Massiv/MxM U Double - (500x800 X 800x500)/Par
time                 41.46 ms   (40.67 ms .. 42.45 ms)
                     0.998 R²   (0.994 R² .. 0.999 R²)
mean                 38.45 ms   (37.22 ms .. 39.68 ms)
std dev              2.342 ms   (1.769 ms .. 3.010 ms)
variance introduced by outliers: 19% (moderately inflated)

Sobel operator:

benchmarking Sobel/Par/Operator - Repa
time                 17.82 ms   (17.30 ms .. 18.32 ms)
                     0.997 R²   (0.994 R² .. 0.998 R²)
mean                 17.42 ms   (17.21 ms .. 17.69 ms)
std dev              593.0 μs   (478.1 μs .. 767.5 μs)
variance introduced by outliers: 12% (moderately inflated)

benchmarking Sobel/Par/Operator - Massiv
time                 7.421 ms   (7.230 ms .. 7.619 ms)
                     0.994 R²   (0.991 R² .. 0.997 R²)
mean                 7.537 ms   (7.422 ms .. 7.635 ms)
std dev              334.3 μs   (281.3 μs .. 389.9 μs)
variance introduced by outliers: 20% (moderately inflated)

Sum all elements of a 2D array:

benchmarking Sum/Seq/Repa
time                 539.7 ms   (523.2 ms .. 547.9 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 540.1 ms   (535.7 ms .. 543.2 ms)
std dev              4.727 ms   (2.208 ms .. 6.609 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking Sum/Seq/Vector
time                 16.95 ms   (16.78 ms .. 17.07 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 17.23 ms   (17.13 ms .. 17.43 ms)
std dev              331.4 μs   (174.1 μs .. 490.0 μs)

benchmarking Sum/Seq/Massiv
time                 16.78 ms   (16.71 ms .. 16.85 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 16.80 ms   (16.76 ms .. 16.88 ms)
std dev              127.8 μs   (89.95 μs .. 186.2 μs)

benchmarking Sum/Par/Repa
time                 81.76 ms   (78.52 ms .. 84.37 ms)
                     0.997 R²   (0.990 R² .. 1.000 R²)
mean                 79.20 ms   (78.03 ms .. 80.91 ms)
std dev              2.613 ms   (1.565 ms .. 3.736 ms)

benchmarking Sum/Par/Massiv
time                 8.102 ms   (7.971 ms .. 8.216 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 7.967 ms   (7.852 ms .. 8.028 ms)
std dev              236.4 μs   (168.4 μs .. 343.2 μs)
variance introduced by outliers: 11% (moderately inflated)

Here is also a blog post that compares Performance of Haskell Array libraries through Canny edge detection

Further resources on learning massiv:

Changes

1.0.1

  • Relax constraint on computeInto by removing requirement for Size
  • Fix BL, which due to a forgotten seq was not lazy.

1.0.0

  • Addition of sumArrays', sumArraysM and productArrays', productArraysM.
  • Remove Num/Fractional/Floating instances for D and DI arrays. This was done to prevent surprises as in: #97
  • Remove helper class Nested and type family NestedStuct
  • Make negate in Num instance throw error for Sz in order to avoid surprising behavior reported in: #114
  • Add of munsafeResize
  • Add uniformArray and uniformRangeArray
  • Replace isNonEmpty with isNotZeroSz and added isZeroSz
  • Consolidate Construct class into Load
  • Introduce Shape, the parent of Size
  • Move size from Load into new class Size
  • Consolidate Resize into Size
  • Removed maxSize and replaced it with maxLinearSize
  • Remove specialized DW instances that used tuples as indices.
  • Get rid of M representation
  • Remove R type family and Slice, InnerSlice and Extract classes in favor of D.
  • Consolidate OuterSlice into Source
  • Add Strategy and move setComp (from Construct) and getComp (from Load) in there.
  • Remove ix from Mutable, Manifest, Source
  • Remove liftArray2. Instead add liftArray2' and liftArray2M that don’t behave like a map for singleton argument.
  • Expose liftNumArray2M
  • Prevent showsArrayPrec from changing index type
  • Change function argument to monadic action for unstablePartitionM and unsafeUnstablePartitionM
  • Replace snull with a more generic isNull
  • Switch DL loading function to run in ST monad, rather than in any Monad m.
  • Rename msize -> sizeOfMArray
  • Add unsafeResizeMArray and unsafeLinearSliceMArray
  • Rename:
    • loadArrayM -> iterArrayLinearM_
    • loadArrayWithSetM -> iterArrayLinearWithSetM_.
    • loadArrayWithStrideM -> iterArrayLinearWithStrideM_.
  • Add iterArrayLinearST_ and iterArrayLinearWithSetST_ to Load class instead of loadArrayM and loadArrayWithSetM.
  • Add iterArrayLinearWithStrideST_ to LoadStride class instead of loadArrayWithStrideM.
  • Add new mutable functions:
    • resizeMArrayM and flattenMArray,
    • outerSliceMArrayM and outerSlicesMArray,
    • for2PrimM_ and ifor2PrimM_,
    • zipSwapM_
  • Switch effectful mapping functions to use the representation specific iteration. This means that they are now restricted to Load instead of Source. Functions affected:
    • mapIO_, imapIO_, forIO_ and iforIO_
    • mapIO, imapIO, forIO and iforIO
  • Add Uniform, UniformRange and Random instances for Ix2, IxN, Dim, Sz and Stride
  • Consolidate Mutable into Manifest type class and move the MArray data family outside of the class.
  • Make sure empty arrays are always equal, regardless of their size.
  • Remove LN representation in favor of a standalone List newtype wrapper around lists.

0.6.1

  • Addition of withLoadMArray_, withLoadMArrayS, withLoadMArrayS_, withLoadMArrayST, withLoadMArrayST_
  • Addition of replaceSlice and replaceOuterSlice
  • Addition of quicksortBy, quicksortByM and quicksortByM_
  • Fix performance regression for quicksort and quicksortM_ introduced in previous release.

0.6.0

  • Fix semantics of Applicative, Num and Fractional instance for D arrays: mismatched sizes will throw an error.
  • 20% speed improvement of matrix multiplication: multiplyMatrices, .><. and !><!. Type signature has changed to Mutable for both arguments, thus it’s a breaking change.
  • Switch ><. and ><! from returning a delayed array to mutable, since that’s what multiplyVectorByMatrix returns.
  • Addition of synonym HighIxN and removing redundant 1 <= n constraint.
  • Deprecating makeStencilDef, unsafeMapStencil and fix dangers of invalid stencils reading out of bounds. Get rid of Value. Fix for #109.
  • Addition of appComp
  • Addition of mkSzM
  • Addition of SizeOverflowException and SizeNegativeException
  • Fix setting computation for boxed vector when converted with fromVectorM and fromVector'
  • Add computation strategy argument to fromUnboxedVector, just so it matches other vector conversion functions.
  • Removed defaultElement
  • Removed deprecated functions: #>, |*|, multiplyTransposed, fromIntegerA, fromRationalA, piA
  • Addition of BL representation and related functionality, fix for #111.
    • Addition of functions: wrapLazyArray, unwrapLazyArray, toLazyArray, evalLazyArray, forceLazyArray, unwrapMutableLazyArray, fromBoxedVector, fromBoxedMVector.
    • Rename:
      • unsafeNormalBoxedArray -> coerceNormalBoxedArray
      • unsafeBoxedArray -> coerceBoxedArray
    • Remove unsafeFromBoxedVector
    • Conversion from vector with castFromVector will return BL representation for boxed vector
    • Change type B -> BL for functions: toBoxedVector and toBoxedMVector
  • Rename N -> BN and add backwards compatibility shim.
  • Make replicate a function in Construct class
  • Add newMArray, newMArray' and deprecate new
  • Add custom implementation for <$ in Functor instances for BL and B.

0.5.9

  • Add mallocCompute, mallocCopy and unsafeMallocMArray
  • Fix .><., ><. and .>< on empty matrices. Result is now guaranteed to be empty too.
  • Add unwrapByteArrayOffset and unwrapMutableByteArrayOffset
  • Add fromByteArrayOffsetM and fromMutableByteArrayOffsetM

0.5.8

  • Improve loading of push arrays by adding loadArrayWithSetM and deprecating defaultElement.

0.5.9

  • Add mallocCompute, mallocCopy and unsafeMallocMArray

0.5.8

  • Improve loading of push arrays by adding loadArrayWithSetM and deprecating defaultElement.

0.5.7

  • Improve performance of ><. and ><! while making their constraints a bit more relaxed.
  • Expose unsafeLoadIntoM and unsafeLoadIntoS
  • Expose eqArrays and compareArrays
  • Add multiplyMatrixByVector and multiplyVectorByMatrix

0.5.6

  • Fix (-.) (it was incorrectly implemented as a flip of (.-)
  • Addition of numeric functions:
    • Partial: !+!, !-!, !*!, !/!
    • Reciprocal division /.
    • More efficient matrix-matrix multiplication: .><. and !><! (also helpers multiplyMatrices and multiplyMatricesTransposed)
    • More efficient matrix-vector multiplication: .>< and !><
    • New vector-matrix multiplication: ><. and ><!
    • Dot product dotM and !.!
    • Norm normL2
  • Deprecated |*| and #>

0.5.5

  • Add takeWhile, dropWhile and findIndex
  • Improve performance of any, and, or, all
  • Add elem

0.5.4

  • Addition of unsafeTransformStencil
  • Add zip4, unzip4, zipWith4 and izipWith4
  • Make Resize a superclass of Source
  • Addition of outerSlices, innerSlices, withinSlices and withinSlicesM
  • Addition of stackSlicesM, stackOuterSlicesM and stackInnerSlicesM
  • Addition of computeP
  • Fix perfomrmance issue of folding functions applied to arrays with Seq computation strategy.

0.5.3

  • Fix tanA and tanhA. #96
  • Relax argument of snoc and cons constraint to Load vectors
  • Improve unsnocM and unconsM by switching to unsafeLinearSlice, instead of delaying the array.
  • Fix parallelization for windowed array when computed with stride
  • Fix massiv doctests not being able to find massiv.h under NixOS

0.5.2

  • Addition of lowerTriangular and upperTriangular
  • Relax identityMatrix type to return an array of any Num type, not just Int.
  • Addition of unsafeMakeLoadArrayAdjusted
  • Add matrix-vector product ((#>))
  • Addition of siterate

0.5.1

  • Fix sfromListN accepting a plain Int instead of Sz1, as well as switch to upper bound.
  • Fix order of argumetns in iforM
  • Restrict szip*, szipWith* and sizipWith* functions to flat vectors.
  • Addition of unsafeSUnfoldrN, unsafeSUnfoldrNM and unsafeSFromListN
  • Fix sunfoldrN, sunfoldrNM and sfromListN to not trust the supplied size.
  • Move isEmpty into Load class
  • Add isNotEmpty

0.5.0

  • Remove Show instance from Value.
  • Addition of unsafeCreateArray, unsafeCreateArray_ and unsafeCreateArrayS
  • Remove Comp argument from functions that ignore it and set it to Seq:
    • createArrayS_, createArrayS, createArrayST_, createArrayST
    • unfoldrPrimM_, iunfoldrPrimM_, unfoldrPrimM, iunfoldrPrimM
    • unfoldlPrimM_, iunfoldlPrimM_, unfoldlPrimM, iunfoldlPrimM
  • Addition of fromStorableVector and fromStorableMVector
  • Modify toMutableByteArray to produce a copy if dealing with slice.
  • Addition of toByteArrayM, toMutableByteArrayM
  • Change replicate to produce delayed load array DL
  • Export unsafe stencil functions from Data.Array.Massiv.Unsafe, rather than from Data.Massiv.Array.Stencil.Unsafe.
  • Implement unsafeMapStencil and deprecate mapStencilUnsafe and forStencilUnsafe
  • Addition of castToBuilder
  • Addition of conversion functions:
    • unwrapNormalForm and evalNormalForm
    • toBoxedVector, toBoxedMVector, evalBoxedVector and evalBoxedMVector
    • unwrapByteArray and unwrapMutableByteArray
    • toPrimitiveVector, toPrimitiveMVector, fromPrimitiveVector and fromPrimitiveMVector
    • toStorableVector, toStorableMVector, fromStorableVector and fromStorableMVector
    • fromUnboxedVector and fromUnboxedMVector
    • unsafeBoxedArray, unsafeNormalBoxedArray, unsafeFromBoxedVector
  • Removed deprecated traverseAR, itraverseAR, traversePrimR and itraversePrimR
  • Removed: imapMR, imapMR, iforMR, and iforMR
  • Renamed:
    • withMArray to withMArray_,
    • withMArrayS to withMArrayS_ and
    • withMArrayST to withMArrayST_
  • Added versions that keep the artifact of mutable action: withMArray, withMArrayS, withMArrayST.

0.4.5

  • Addition of computeIO and computePrimM
  • Addition of makeArrayLinearA
  • Addition of traverseS
  • Fix regression in performance introduced in massiv-0.4.0

0.4.4

  • Addition of appendOuterM and concatOuterM
  • Addition of zoom
  • Addition of write_, modify_ and swap_

0.4.3

  • Addition of catMaybesS and tally

0.4.3

  • Addition of applyStencil and Padding with helper functions noPadding and samePadding.
  • Addition of foldlStencil, foldrStencil and monoidal foldStencil.
  • Addition of common generic stencils: sumStencil, productStencil, avgStencil, maxStencil, minStencil and idStencil.
  • Addition of mapStencilUnsafe for the brave.
  • Improve compile time error reporting for invalid dimensions.
  • Fix incorrect loading of DW arrays of dimension higher than 3
  • Addition of foldOuterSlice, ifoldOuterSlice, foldInnerSlice and ifoldInnerSlice. Fix for #56

0.4.2

  • Fix loading empty DS stream arrays of unknown size. Fix for #83.

0.4.1

  • Introduction of Stream and DS representation:
    • filterS, filterM, ifilterS, ifilterM
    • mapMaybeS, mapMaybeM, imapMaybeS, imapMaybeM
    • unfoldr, unfoldrN
    • takeS and dropS
  • Deprecated traverseAR, itraverseAR, traversePrimR, itraversePrimR (not feasible to keep duplicate functions just for representation, TypeApplications or ScopedVariables should be used instead.)
  • Fix performance issue with copying of unboxed arrays and initialization of storable array.
  • Addition of unsafeLoadIntoS, unsafeLoadInto and maxSize
  • Addition of reverse, reverse' and reverseM
  • Addition of modifyDimension, modifyDimM, and modifyDim'

0.4.0

  • Made Construct a super class of Mutable
  • Reimplement a safe version of makeLoadArray, that is parallelizable.
  • Switch from EltRepr r ix to much simpler R r
  • Remove Construct instance for M representation.
  • unsafeLinearSet - length argument now accepts Sz1 instead of an Int
  • Renamed:
    • forPrimM_ -> forPrimM
    • iforPrimM_ -> iforPrimM
    • iforLinearPrimM_ -> iforLinearPrimM
  • Introduced new functions that do not mutate the original array: forPrimM_, iforPrimM_ and iforLinearPrimM_
  • Addition of readM, writeM, modifyM, swapM, modifyM_, swapM_
  • Add an orphan instance of MonadThrow for ST monad for older versions of exceptions. See ekmett/exceptions#72
  • Deprecation of read', write' modify' and swap'
  • Make modify accept a monadic action, rather than a pure function. Also now it returns the old element.
  • Make swap return the swapped elements.
  • Addition of unsafeLinearSwap and unsafeSwap
  • Expose unsafeLinearModify and unsafeModify
  • Expose Data.Massiv.Core.List
  • Expose indexWith, so macro INDEX_CHECK from massiv.h could be used outside massiv.
  • Addition of liftSz
  • Fixed expand* functions by making them accept Sz1 instead of an Int
  • Addition of expandWithinM
  • Bunch of minor fixes to Show instances
  • Extracted test-suite into it’s own package.
  • Stop accepting computation strategy for all functions that can be performed sequentially only:
    • iterateN
    • iiterateN
    • unfoldrS_
    • iunfoldrS_
    • unfoldlS_
    • iunfoldlS_
    • makeArrayA
    • makeArrayAR
    • generateArrayLinearS
    • generateArrayS
  • Redefined most of the numeric operators with Numeric and NumericFloat. Will be required for SIMD operations.
  • Num, Fractional and Applicative for D and DI changed behavior: instead of treating singleton as a special array of any size it is treated as singleton.

0.3.6

  • Addition of unsafeArrayLinearCopy, unsafeLinearCopy, unsafeLinearShrink, unsafeLinearGrow
  • Implementation of iterateUntil and iterateUntilM
  • identityMatrix - generation of identity matrix

0.3.5

  • Fix and export guardNumberOfElements
  • Eq instances for IndexException and SizeException
  • Fix upsample implementation and improve its performance.
  • Addition of deleteRegionM, deleteRowsM and deleteColumnsM

0.3.4

  • Use the the new stateful workers feature of scheduler-1.4.0
  • Addition of:
    • randomArrayS
    • randomArrayWS
    • generateArrayWS
    • generateArrayLinearWS
    • mapWS, forWS, imapWS and iforWS
    • and splitLinearlyWithStatefulM_

0.3.3

  • Fix type signature for createArray.
  • Support for new version of scheduler
  • Addition of randomArray

0.3.2.1

  • Fix sqrtA function: #76

0.3.2

  • Exported withMArrayS
  • Switch to pure exception throwing for read', write', modify' and swap'. MonadThrow constraint prevented those functions to be used in ST monad.
  • Addition of quicksort, quicksortM_, unstablePartitionRegionM and unsafeUnstablePartitionRegionM

0.3.1

  • Addition of rangeStepInclusive'
  • Addition of flatten
  • makeLoadArray has been deprecated into unsafeMakeLoadArray.
  • A new safe makeLoadArrayS has been added.
  • Fix infix 4 for (...) and (..:) range functions, so they can be easily composed with numeric operations
  • Addition of imapSchedulerM_ and iforSchedulerM_

0.3.0

  • Class hierarchy an associated methods:
    • getComp moved from Construct to Load
    • Size class lost array value parameter e. unsafeResize and unsafeExtract became their own classes
  • New classes:
    • Resize with unsafeResize from old Size, except with array type parameter for applicability to mutable MArrays
    • Extract with unsafeExtract from old Size
    • StrideLoad, child of Load
  • ifoldlIO and related no longer take list of capabilities, but instead respect the inner computation strategy. For that reason these folds have been removed: foldlOnP, ifoldlOnP, foldrOnP, ifoldrOnP
  • fold now is just like the one from Data.Foldable takes no arguments and requires elements to be a monoid
  • singleton does not accept computation strategy any more and creates Seq array by default
  • New function empty.
  • Ragged functions are no longer exported, until the interface stabilizes and proper implementation of ragged arrays is in place.
  • Partial functions read', write' and swap' now live in IO and throw proper exceptions.
  • loadArray is renamed to loadArrayM and there is a new separate function (not part of Load class) with the name loadArray that actually uses loadArrayM
  • Moved unsafeWithPtr into Data.Massiv.Array.Unsafe
  • Addition of:
    • unsafeArrayToForeignPtr,
    • unsafeMArrayToForeignPtr,
    • unsafeArrayFromForeignPtr,
    • unsafeArrayFromForeignPtr0,
    • unsafeMArrayFromForeignPtr,
    • unsafeMArrayFromForeignPtr0
  • Addition of castToByteString, castFromByteString
  • Addition of makeUnsafeStencil
  • Window now has an windowUnrollIx2 field.
  • Addition of insertWindow and dropWindow

0.2.8.1

  • Fix sqrtA function. Backport of #76

0.2.8

  • Fixed a problem where convolution stencil size was not inverted, causing out of bounds memory read: #72
  • Fixed an issue with windowed array where a stencil size is smaller than the array it is applied to
  • Fixed incorrect cross-correlation stencil construction

0.2.7

  • Fixed a serious performance regression in Stencil’s Functor instance, which was introduced in version 0.2.3
  • Added type and pattern synonyms Sz for future compatibility with version 0.3. Could be useful for migration.

0.2.6

  • Add expand* family of functions.
  • Long awaited makeArrayM/makeArrayA and mapM/forM/imapM/iforM/traverseA/itraverseA alnog with corresponding functions allowing for supplying representation.
  • Deprecate mapP and mapP_ in favor of mapIO and mapIO_, while making latter respect the Comp.
  • Addition of a whole collection of mutable operators:
    • mapIO/mapIO_/imapIO/imapIO_/forIO/forIO_/iforIO/iforIO_
    • createArray/createArrayST/createArrayST_
    • generateArray/generateArrayIO
    • unfoldlPrim/unfoldlPrim_
    • makeArrayA, makeArrayAR
  • Addition of cute synonyms: (...) and (..:)

0.2.5

  • Fix for insertDimension #62

0.2.4.1

  • Fix a bug in zip functions, where resulting array size would not take into account the size of one of the input arrays.

0.2.4

  • Addition of inner folds: ifoldlInner, foldlInner, ifoldrInner and foldrInner
  • Addition of functions that can fold over any dimension (foldlWithin, foldlWithin', etc.)
  • Addition of ifoldMono and ifoldSemi, thus fixing: #54
  • Improvement over manipulating index dimensions with addition of type level Dimension n data type and functions like getDimension, dropDimension.
  • Addition of insertDim and type level insertDimension as well as pullOutDim and pullOutDimension
  • Add partial extractFromTo'

0.2.3

  • Addition of Profunctor functions for Stencil: lmapStencil, rmapStencil and bimapStencil
  • Addition of integration approximation: Data.Massiv.Array.Numeric.Integral
  • Removed overlapping instances for DW in favor of concrete instances.
  • Relaxed contraint restrictions on matrix multiplication (|*|) and slightly improved performance with rewrite rules to avoid double transform.

0.2.2

  • Addition of withMArray, withMArrayST.
  • Improved preformance of matrix multiplication

0.2.1

  • Addition of Stride and related functions computeWithStride and computeWithStrideAs.

  • Addition of Window

  • Addition of loadArray adn loadArrayWithStride with default implementations that will become new loading functions in a subsequent release. loadArray will replace loadS and loadP, which will be deprecated in the next release and removed in the next major release. Some of this is discussed in #41

  • Addition of various conversion functions:

    • fromByteString, toByteString and toBuilder
    • unwrapArray, evalArray, unwrapMutableArray, evalMutableArray
    • unwrapNormalFormArray, evalNormalFormArray, unwrapNormalFormMutableArray, evalNormalFormMutableArray
  • Fix: Eq instance for Array M ix e

0.2.0

  • Fixed type signatures for convertAs and convertProxy
  • Added type constructors for DW and DI
  • Show instance for DW arrays.
  • Addition of unsafeBackpermuteDW.
  • Breaking changes:
    • Create new Data.Massiv.Array.Stencil.Unsafe module and move forStencilUnsafe into it.
    • Rename of rank -> dimensions #25
      • Removal Eq and Ord instances for Value #19
      • Move border resolution to mapStencil from makeStencil.
    • Updated iterators iterM, iterM_, etc. to have a separate step per dimension.

0.1.6

  • Semigroup and Monoid instance for Value.
  • Addition of forStencilUnsafe.
  • Fix minimum behaving as maximum.
  • Addition of foldSemi.

0.1.5

  • Fix inverted stencil index calculation #12
  • Add support for cross-correlation.

0.1.4

  • Addition of Monoidal folding foldMono.
  • Expose liftArray2.

0.1.3

  • Addition of withPtr and unsafeWithPtr for Storable arrays
  • Addition of computeInto.
  • Exposed makeWindowedArray.

0.1.2

  • Support for GHC-8.4 - instance of Comp for Semigroup
  • Brought back support for GHC-7.10

0.1.1

  • Addition of experimental mapM, imapM, forM, iforM, generateM and generateLinearM functions. Fixes #5
  • Addition of Ord instances for some array representations.

0.1.0

  • Initial Release