ad
Automatic Differentiation
LTS Haskell 17.0:  4.4.1 
Stackage Nightly 20210127:  4.4.1 
Latest on Hackage:  4.4.1 
ad4.4.1@sha256:4e6b44de894b6edef317b928a75cd2a0c83a5f0595bbf4f23fcac0de4cbfa3e0,6551
Module documentation for 4.4.1
 Numeric
 Numeric.AD
 Numeric.AD.Halley
 Numeric.AD.Internal
 Numeric.AD.Jacobian
 Numeric.AD.Jet
 Numeric.AD.Mode
 Numeric.AD.Newton
 Numeric.AD.Rank1
 Numeric.AD
ad
A package that provides an intuitive API for Automatic Differentiation (AD) in Haskell. Automatic differentiation provides a means to calculate the derivatives of a function while evaluating it. Unlike numerical methods based on running the program with multiple inputs or symbolic approaches, automatic differentiation typically only decreases performance by a small multiplier.
AD employs the fact that any program y = F(x)
that computes one or more value does so by composing multiple primitive operations. If the (partial) derivatives of each of those operations is known, then they can be composed to derive the answer for the derivative of the entire program at a point.
This library contains at its core a single implementation that describes how to compute the partial derivatives of a wide array of primitive operations. It then exposes an API that enables a user to safely combine them using standard higherorder functions, just as you would with any other Haskell numerical type.
There are several ways to compose these individual Jacobian matrices. We hide the choice used by the API behind an explicit “Mode” typeclass and universal quantification. This prevents users from confusing infinitesimals. If you want to risk infinitesimal confusion in order to get greater flexibility in how you curry, flip and generally combine the differential operators, then the Rank1.*
modules are probably your cup of tea.
Features
 Provides forward and reverse mode AD combinators with a common API.
 Optional typelevel “branding” is available to prevent the end user from confusing infinitesimals
 Each mode has a separate module full of combinators, with a consistent look and feel.
Examples
You can compute derivatives of functions
Prelude Numeric.AD> diff sin 0 { cos 0 }
1.0
Or both the answer and the derivative of a function:
Prelude Numeric.AD> diff' (exp . log) 2
(2.0,1.0)
You can compute the derivative of a function with a constant parameter using auto
:
Prelude Numeric.AD> let t = 2.0 :: Double
Prelude Numeric.AD> diff (\ x > auto t * sin x) 0
2.0
You can use a symbolic numeric type, like the one from simplereflect
to obtain symbolic derivatives:
Prelude Debug.SimpleReflect Numeric.AD> diff atanh x
recip (1  x * x) * 1
You can compute gradients for functions that take nonscalar values in the form of a Traversable functor full of AD variables.
Prelude Numeric.AD Debug.SimpleReflect> grad (\[x,y,z] > x * sin (x + log y)) [x,y,z]
[ 0 + (0 + sin (x + log y) * 1 + 1 * (0 + cos (x + log y) * (0 + x * 1)))
, 0 + (0 + recip y * (0 + 1 * (0 + cos (x + log y) * (0 + x * 1))))
, 0
]
which one can simplify to:
[ sin (x + log y) + cos (x + log y) * x, recip y * cos (x + log y) * x, 0 ]
If you need multiple derivatives you can calculate them with diffs
:
Prelude Numeric.AD> take 10 $ diffs sin 1
[0.8414709848078965,0.5403023058681398,0.8414709848078965,0.5403023058681398,0.8414709848078965,0.5403023058681398,0.8414709848078965,0.5403023058681398,0.8414709848078965,0.5403023058681398]
or if your function takes multiple inputs, you can use grads, which returns an ‘fbranching stream’ of derivatives, that you can
inspect lazily. Somewhat more intuitive answers can be obtained by converting the stream into the polymorphically recursive
Jet
data type. With that we can look at a single “layer” of the answer at a time:
The answer:
Prelude Numeric.AD> headJet $ jet $ grads (\[x,y] > exp (x * y)) [1,2]
7.38905609893065
The gradient:
Prelude Numeric.AD> headJet $ tailJet $ jet $ grads (\[x,y] > exp (x * y)) [1,2]
[14.7781121978613,7.38905609893065]
The hessian (n * n matrix of 2nd derivatives)
Prelude Numeric.AD> headJet $ tailJet $ tailJet $ jet $ grads (\[x,y] > exp (x * y)) [1,2]
[[29.5562243957226,22.16716829679195],[22.16716829679195,7.38905609893065]]
Or even higher order tensors of derivatives as a jet.
Prelude Numeric.AD> headJet $ tailJet $ tailJet $ tailJet $ jet $ grads (\[x,y] > exp (x * y)) [1,2]
[[[59.1124487914452,44.3343365935839],[44.3343365935839,14.7781121978613]],[[44.3343365935839,14.7781121978613],[14.7781121978613,7.38905609893065]]]
Note the redundant values caused by the various symmetries in the tensors. The ad
library is careful to compute
each distinct derivative only once, lazily and to share the resulting computation.
Overview
Modules
Numeric.AD
computes using whichever mode or combination thereof is suitable to each individual combinator. This mode is the default, reexported byNumeric.AD
Numeric.AD.Mode.Forward
provides basic forwardmode AD. It is good for computing simple derivatives.Numeric.AD.Mode.Sparse
computes a sparse forwardmode AD tower. It is good for higher derivatives or large numbers of outputs.Numeric.AD.Mode.Kahn
computes with reversemode AD. It is good for computing a few outputs given many inputs.Numeric.AD.Mode.Reverse
computes with reversemode AD. It is good for computing a few outputs given many inputs, when not using sparks heavily.Numeric.AD.Mode.Tower
computes a dense forwardmode AD tower useful for higher derivatives of single input functions.Numeric.AD.Newton
provides a number of combinators for root finding using Newton’s method with quadratic convergence.Numeric.AD.Halley
provides a number of combinators for root finding using Halley’s method with cubic convergence.Numeric.AD.Rank1.*
provides combinators for AD that are strictly rank1. This makes it easier to flip and contort them with higher order functions at the expense of type safety when it comes to infinitsimal confusion.
Combinators
While not every mode can provide all operations, the following basic operations are supported, modified as appropriate by the suffixes below:
grad
computes the gradient (vector of partial derivatives at a given point) of a function.jacobian
computes the Jacobian matrix of a function at a point.diff
computes the derivative of a function at a point.du
computes a directional derivative of a function at a point.hessian
computes the Hessian matrix (matrix of second partial derivatives) of a function at a point.
Combinator Suffixes
The following suffixes alter the meanings of the functions above as follows:
'
also return the answerWith
lets the user supply a function to blend the input with the outputF
is a version of the base function lifted to return aTraversable
(orFunctor
) results
means the function returns all higher derivatives in a list or fbranchingStream
T
means the result is transposed with respect to the traditional formulation (usually to avoid paying for transposing back)0
means that the resulting derivative list is padded with 0s at the end.NoEq
means that an infinite list of converging values is returned rather than truncating the list when they become constant
Contact Information
Contributions and bug reports are welcome!
Please feel free to contact me through github or on the #haskell IRC channel on irc.freenode.net.
Edward Kmett
Changes
4.4.1 [2020.10.13]
 Change the fixity of
:
inNumeric.AD.Jet
to be rightassociative. Previously, it wasinfixl
, which made things likex : y : z
nearly unusable.  Fix backpropagation error in Kahn mode.
 Fix bugs in the
Erf
instance forForwardDouble
.  Add
Numeric.AD.Mode.Reverse.Double
, a variant ofNumeric.AD.Mode.Reverse
that is specialized toDouble
.  Reexport
Jet(..)
,headJet
,tailJet
andjet
fromNumeric.AD
.
4.4 [2020.02.03]

Generalize the type of
stochasticGradientDescent
:stochasticGradientDescent :: (Traversable f, Fractional a, Ord a) => (forall s. Reifies s Tape => f (Scalar a) > f (Reverse s a) > Reverse s a) > [f (Scalar a)] > f a > [f a] +stochasticGradientDescent :: (Traversable f, Fractional a, Ord a) => (forall s. Reifies s Tape => e > f (Reverse s a) > Reverse s a) > [e] > f a > [f a]
4.3.6 [2019.02.28]
 Make the test suite pass when built against
musl
libc
.
4.3.5 [2018.01.18]
 Add
Semigroup
instance forId
.
4.3.4
 Support
doctest0.12
4.3.3
 Revamp
Setup.hs
to usecabaldoctest
. This makes it build withCabal2.0
, and makes thedoctest
s work withcabal newbuild
and sandboxes.
4.3.2.1
 GHC 8 support
 Fix Kahn mode’s
**
implementation  Fix multiple problems in Erf and InvErf methods
4.3.2
 Added
NoEq
versions of several combinators that can be used whenEq
isn’t available on the numeric type involved.
4.3.1
 Further improvements have been made in the performance of
Sparse
mode, at least asymptotically, when used on functions with many variables. Since this is the target usecase forSparse
in the first place, this seems like a good tradeoff. Note: this results in an API change, but only in the API of anInternal
module, so this is treated as a minor version bump.
4.3
 Made drastic improvements in the performance of
Tower
andSparse
modes thanks to the help of Björn von Sydow.  Added constrained convex optimization.
 Incorporated some suggestions from herbie for improving floating point accuracy.
4.2.4
 Added
Newton.Double
modules for performance.
4.2.3
reflection
2 support
4.2.2
 Major bug fix for
grads
,jacobians
, and anything that usesSparse
mode inNumeric.AD
. Derivatives after the first two were previously incorrect.
4.2.1.1
 Support
nats
version 1
4.2.1
 Added
stochasticGradientDescent
.
4.2
 Removed broken
Directed
mode.  Added
Numeric.AD.Rank1
combinators and moved most infinitesimal handling back out of the modes and into anAD
wrapper.
4.1
 Fixed a bug in the type of
conjugateGradientAscent
andconjugateGradientDescent
that prevent users from being able to ever call it.
4.0.0.1
 Added the missing
instances.h
header file toextrasourcefiles
.
4.0
 An overhaul permitting monomorphic modes was completed by @alang9.
 Add a
ForwardDouble
monomorphic mode
3.4
 Added support for
erf
andinverf
, etc. fromData.Number.Erf
.  Split the infinitesimal and mode into two separate parameters to facilitate inlining and easier extension of the API.
3.3.1
 Build system improvements
 Removed unused LANGUAGE pragmas
 Added HLint configuration
 We now use exactly the same versions of the packages used to build
ad
when running the doctests.
3.3
 Renamed
Reverse
toKahn
andWengert
toReverse
. We use Arthur Kahn’s topological sorting algorithm to sort the tape after the fact in Kahn mode, while the stock Reverse mode builds a Wengert list as it goes, which is more efficient in practice.
3.2.2
 Export of the
conjugateGradientDescent
andgradientDescent
fromNumeric.AD
3.2.1
conjugateGradientDescent
now stops before it starts returning NaN results.
3.2
 Renamed
Chain
toWengert
to reflect its use of Wengert lists for reverse mode.  Renamed
lift
toauto
to avoid conflict with the more prevalenttransformers
library.  Fixed a bug in
Numeric.AD.Forward.gradWith'
, which caused it to return the wrong value for the primal.
3.1.4
 Added a better “convergence” test for
findZero
 Compute
tan
andtanh
derivatives directly.
3.1.3
 Added
conjugateGradientDescent
andconjugateGradientAscent
toNumeric.AD.Newton
.
3.1.2
 Dependency bump
3.1
 Added
Chain
mode, which isReverse
using a linear tape that doesn’t need to be sorted.  Added a suite of doctests.
 Bug fix in
Forward
mode. It was previously yielding incorrect results for anything that usedbind
orbind'
internally.
3.0
 Moved the contents of
Numeric.AD.Mode.Mixed
intoNumeric.AD
 Split off
Numeric.AD.Variadic
for the variadic combinators  Removed the
UU
,FU
,UF
, andFF
type aliases.  Stopped exporting the types for
Mode
andAD
from almost every module. ImportNumeric.AD.Types
if necessary.  Renamed
Tensors
toJet
 Dependency bump to be compatible with ghc 7.4.1 and mtl 2.1
 More aggressive zero tracking.
diff (**n) 0
for constant n anddiff (0**)
both now yield the correct answer for all modes.