Heterogeneous automatic differentation (backpropagation)

Version on this page:
LTS Haskell 11.10:
Stackage Nightly 2018-03-12:
Latest on Hackage:

See all snapshots backprop appears in


Join the chat at

backprop on Hackage Build Status

Introductory blog post

Automatic heterogeneous back-propagation.

Write your functions to compute your result, and the library will automatically generate functions to compute your gradient.

Differs from ad by offering full heterogeneity -- each intermediate step and the resulting value can have different types. Mostly intended for usage with gradient descent and other numeric optimization techniques.

Currently up on hackage (with 100% documentation coverage), but more up-to-date documentation is currently rendered on github pages!

If you want to provide backprop for users of your library, see this guide to equipping your library with backprop.

MNIST Digit Classifier Example

My blog post introduces the concepts in this library in the context of training a handwritten digit classifier. I recommend reading that first.

There are some literate haskell examples in the source, though (rendered as pdf here), which can be built (if stack is installed) using:

$ ./Build.hs exe

There is a follow-up tutorial on using the library with more advanced types, with extensible neural networks a la this blog post, available as literate haskell and also rendered as a PDF.

Brief example

(This is a really brief version of my blog post)

The quick example below describes the running of a neural network with one hidden layer to calculate its squared error with respect to target targ, which is parameterized by two weight matrices and two bias vectors. Vector/matrix types are from the hmatrix package.

Let's make a data type to store our parameters, with convenient accessors using lens:

data Network i h o = Net { _weight1 :: L h i
                         , _bias1   :: R h
                         , _weight2 :: L o h
                         , _bias2   :: R o

makeLenses ''Network

Normally, we might write code to "run" a neural network on an input like this:

    :: R i
    -> Network i h o
    -> R h
neuralNet x n = z
    y = logistic $ (n ^. weight1) #> x + (n ^. bias1)
    z = logistic $ (n ^. weight2) #> y + (n ^. bias2)

logistic :: Floating a => a -> a
logistic x = 1 / (1 + exp (-x))

(R i is an i-length vector, L h i is an h-by-i matrix, etc., #> is matrix-vector multiplication, and ^. is access to a field via lens.)

When given an input vector and the network, we compute the result of the neural network ran on the input vector.

We can write it, instead, using backprop:

    :: Reifies s W
    => BVar s (R i)
    -> BVar s (Network i h o)
    -> BVar s (R o)
neuralNet x n = z
    y = logistic $ (n ^^. weight1) #> x + (n ^^. bias1)
    z = logistic $ (n ^^. weight2) #> y + (n ^^. bias2)

logistic :: Floating a => a -> a
logistic x = 1 / (1 + exp (-x))

(#>! is a backprop-aware version of #>, and ^^. is access to a field via lens in a BVar)

And that's it! neuralNet is now backpropagatable!

We can "run" it using evalBP:

evalBP (neuralNet (constVar x)) :: Network i h o -> R o

And we can find the gradient using gradBP:

gradBP (neuralNet (constVar x)) :: Network i h o -> Network i h o

If we write a function to compute errors:

    :: Reifies s W
    => BVar s (R i)
    -> BVar s (R o)
    -> BVar s (Network i h o)
    -> BVar s Double
netError x targ n = norm_2 (neuralNet x - t)

(norm_2 is a backprop-aware euclidean norm)

Now, we can perform gradient descent!

    :: R i
    -> R o
    -> Network i h o
    -> Network i h o
gradDescent x targ n0 = n0 - 0.1 * gradient
    gradient = gradBP (netError (constVar x) (constVar targ)) n0

Ta dah! We were able to compute the gradient of our error function, just by only saying how to compute the error itself.

For a more fleshed out example, see my blog post and the MNIST tutorial (also rendered as a pdf)

Lens Access

A lot of the friction of dealing with BVar s as instead of as directly is alleviated with the lens interface.

With a lens, you can "view" and "set" items inside a BVar, as if they were the actual values:

(^.)  ::        a -> Lens' a b ->        b
(^^.) :: BVar s a -> Lens' a b -> BVar s b

(.~)  :: Lens' a b ->        b ->        a ->        a
(.~~) :: Lens' a b -> BVar s b -> BVar s a -> BVar s a

And you can also extract multiple potential targets, as well, using Traversals and Prisms:

-- | Actually takes a Traversal, to be more general.
-- Can be used to implement "pattern matching" on BVars
(^?)  ::        a -> Prism' a b -> Maybe (       b)
(^^?) :: BVar s a -> Prism' a b -> Maybe (BVar s b)

(^..)  ::        a -> Traversal' a b -> [       b]
(^^..) :: BVar s a -> Traversal' a b -> [BVar s b]

Note that the library itself has no lens dependency, using microlens instead.


Here are some basic benchmarks comparing the library's automatic differentiation process to "manual" differentiation by hand. When using the MNIST tutorial as an example:


  • For computing the gradient, there is about a 2.5ms overhead (or about 3.5x) compared to computing the gradients by hand. Some more profiling and investigation can be done, since there are two main sources of potential slow-downs:

    1. "Inefficient" gradient computations, because of automated differentiation not being as efficient as what you might get from doing things by hand and simplifying. This sort of cost is probably not avoidable.
    2. Overhead incurred by the book-keeping and actual automatic differentiating system, which involves keeping track of a dependency graph and propagating gradients backwards in memory. This sort of overhead is what we would be aiming to reduce.

    It is unclear which one dominates the current slowdown.

  • However, it may be worth noting that this isn't necessarily a significant bottleneck. Updating the networks using hmatrix actually dominates the runtime of the training. Manual gradient descent takes 3.2ms, so the extra overhead is about 60%-70%.

  • Running the network (and the backprop-aware functions) incurs virtually zero overhead (about 4%), meaning that library authors could actually export backprop-aware functions by default and not lose any performance.


  1. Benchmark against competing back-propagation libraries like ad, and auto-differentiating tensor libraries like [grenade][]

  2. Write tests!

  3. Explore opportunities for parallelization. There are some naive ways of directly parallelizing right now, but potential overhead should be investigated.

  4. Some open questions:

    a. Is it possible to support constructors with existential types?

    b. How to support "monadic" operations that depend on results of previous operations? (ApBP already exists for situations that don't)




May 12, 2018

  • evalBP0 added, for convenience for no-argument values that need to be evaluated without backpropagation.
  • splitBV and joinBV for "higher-kinded data" style BVar manipulation, via the BVGroup helper typeclass.
  • toList, mapAccumL, and mapAccumR for Prelude.Backprop modules
  • Backprop instance for BVar
  • COMPLETE pragmas for T2 and T3
  • Un-exported gzero, gadd, and gone from Numeric.Backprop.Class
  • Many, many more instances of Backprop
  • Backprop instance for Proxy made non-strict for add
  • Swapped type variable order for a few library functions, which might potentially be breaking changes.


  • Fixed documentation for Num and Explicit Prelude modules, and rewrote normal and Num Prelude modules in terms of canonical Prelude definitions
  • Switched to errorWithoutStackTrace wherever appropriate (in Internal module)


May 8, 2018

  • Added ABP newtype wrapper to Numeric.Backprop.Class (re-exported from Numeric.Backprop and Numeric.Backprop.Explicit) to give free Backprop instances for Applicative actions.
  • Added NumBP newtype wrapper to Numeric.Backprop.Class (re-exported in the same places as ABP) to give free Backprop instances for Num instances.
  • Added ^^?! (unsafe access) to Numeric.Backprop and Numeric.Backprop.Num.
  • Backprop instance for Natural from Numeric.Natural. Should actually be safe, unlike its Num instance!
  • zfFunctor and ofFunctor for instances of Functor for Numeric.Backprop.Explicit.
  • realToFrac and fromIntegral to Prelude modules
  • T2 and T3 patterns for Numeric.Backprop, for conveniently constructing and deconstructing tuples.


May 1, 2018

  • Added Backprop class in Numeric.Backprop.Class, which is a typeclass specifically for "backpropagatable" values. This will replace Num.
  • API of Numeric.Backprop completely re-written to require values be instances of Backprop instead of Num. This closes some outstanding issues with the reliance of Num, and allows backpropagation to work with non-Num instances like variable-length vectors, matrices, lists, tuples, etc. (including types from accelerate)
  • Numeric.Backprop.Num and Prelude.Backprop.Num modules added, providing the old interface that uses Num instances instead of Backprop instances, for those who wish to avoid writing orphan instances when working with external types.
  • Numeric.Backprop.Explicit and Prelude.Backprop.Explicit modules added, providing an interface that allows users to manually specify how zeroing, addition, and one-ing works on a per-value basis. Useful for those who wish to avoid writing orphan instances of Backprop for types with no Num instances, or if you are mixing and matching styles.
  • backpropWith variants added, allowing you to specify a "final gradient", instead of assuming it to be 1.
  • Added auto, a shorter alias for constVar inspired by the ad library.
  • Numeric.Backprop.Tuple module removed. I couldn't find a significant reason to keep it now that Num is no longer required for backpropagation.


Apr 26, 2018

  • Added coerceVar to Numeric.Backprop
  • Added Random instaces for all tuple types. Same as for Binary, this does incur a random and time dependency only from the tuple types. Again, because these packages are a part of GHC's boot libraries, this is hopefully not too bad.


Apr 9, 2018

  • Fixed NFData instance for T; before, was shallow.
  • Added Typeable instances for all tuple types, and for BVar.
  • Added Eq, Ord, Show, etc. instances for T.
  • Added Binary instances for all tuple types. Note that this does incur a binary dependency only because of the tuple types; however, this will hopefully be not too much of an issue because binary is a GHC library anyway.


Mar 30, 2018

  • T added to Numeric.Backprop.Tuple: basically an HList with a Num instance.
  • Eq and Ord instances for BVar. Is this sound?


  • Refactored Monoid instances in Numeric.Backprop.Tuple


Mar 25, 2018

  • isoVar, isoVar2, isoVar3, and isoVarN: convenient aliases for applying isomorphisms to BVars. Helpful for use with constructors and deconstructors.
  • opIso2 and opIso3 added to Numeric.Backprop.Op, for convenience.
  • T0 (Unit with numeric instances) added to Numeric.Backprop.Tuple.


  • Completely decoupled the internal implementation from Num, which showed some performance benefits. Mostly just to make the code slightly cleaner, and to prepare for some day potentially decoupling the external API from Num as well.


Feb 12, 2018

  • Preulude.Backprop module added with lifted versions of several Prelude and base functions.
  • liftOpX family of operators now have a more logical ordering for type variables. This change breaks backwards-compatibility.
  • opIsoN added to export list of Numeric.Backprop
  • noGrad and noGrad1 added to Numeric.Backprop.Op, for functions with no defined gradient.


  • Completely decoupled the internal implementation from Num, which showed some performance benefits.


Feb 7, 2018

  • Added currying and uncurrying functions for tuples in Numeric.Backprop.Tuple.
  • opIsoN, for isomorphisms between a tuple of values and a value.
  • (Internal) AD engine now using Any from ghc-prim instead of Some I from type-combinators


Feb 6, 2018

  • Added canonical strict tuple types with Num instances, in the module Numeric.Backprop.Tuple. This is meant to be a band-aid for the problem of orphan instances and potential mismatched tuple types.
  • Fixed bug in collectVar that occurs if container sizes change


  • Internal tweaks to the underlying automatic differentiation types that decouple backpropagation from Num, internally. Num is now just used externally as a part of the API, which might someday be made optional.


Feb 5, 2018

  • First non-alpha release.
  • More or less complete redesign of library. The entire API is completely changed, and there is no backwards compatibility!

    • Everything is now "implicit" style, and there is no more BP monad.
    • Accessing items in BVars is now lens-, prism-, and traversal- based, instead of iso- and generics-based.
    • Op is no longer monadic
    • Mono modules are removed.
    • Implicit modules are removed, since they are the default
    • Iso module is removed, since Isos no longer play major role in the implementation of the library.
  • Removed dependency on ad and ad-based ops, which had been pulling in the vast majority of dependencies.
  • Moved from .cabal file to hpack system.



  • Removed samples as registered executables in the cabal file, to reduce dependences to a bare minimum. For convenience, build script now also compiles the samples into the local directory if stack is installed.

  • Added experimental (unsafe) combinators for working with GADTs with existential types, withGADT, to Numeric.Backprop module.

  • Fixed broken links in changelog.



  • Added optimized numeric Ops, and re-write Num/Fractional/Floating instances in terms of them.

  • Removed all traces of Summer/Unity from the library, eliminating a whole swath of "explicit-Summer"/"explicit-Unity" versions of functions. As a consequence, the library now only works with Num instances. The API, however, is now much more simple.

  • Benchmark suite added for MNIST example.



  • Initial pre-release, as a request for comments. API is in a usable form and everything is fully documented, but there are definitely some things left to be done. (See [][readme-])

comments powered byDisqus