ad
Automatic Differentiation
Version on this page: | 4.5.4@rev:1 |
LTS Haskell 22.37: | 4.5.6 |
Stackage Nightly 2024-10-11: | 4.5.6 |
Latest on Hackage: | 4.5.6 |
ad-4.5.4@sha256:6fcf0e8538b265145298b9dec8219e0780cd162d6ab97221254c80d2688df1d9,7456
Module documentation for 4.5.4
- Numeric
- Numeric.AD
- Numeric.AD.Double
- Numeric.AD.Halley
- Numeric.AD.Internal
- Numeric.AD.Jacobian
- Numeric.AD.Jet
- Numeric.AD.Mode
- Numeric.AD.Newton
- Numeric.AD.Rank1
- Numeric.AD
ad
A package that provides an intuitive API for Automatic Differentiation (AD) in Haskell. Automatic differentiation provides a means to calculate the derivatives of a function while evaluating it. Unlike numerical methods based on running the program with multiple inputs or symbolic approaches, automatic differentiation typically only decreases performance by a small multiplier.
AD employs the fact that any program y = F(x)
that computes one or more value does so by composing multiple primitive operations. If the (partial) derivatives of each of those operations is known, then they can be composed to derive the answer for the derivative of the entire program at a point.
This library contains at its core a single implementation that describes how to compute the partial derivatives of a wide array of primitive operations. It then exposes an API that enables a user to safely combine them using standard higher-order functions, just as you would with any other Haskell numerical type.
There are several ways to compose these individual Jacobian matrices. We hide the choice used by the API behind an explicit “Mode” type-class and universal quantification. This prevents users from confusing infinitesimals. If you want to risk infinitesimal confusion in order to get greater flexibility in how you curry, flip and generally combine the differential operators, then the Rank1.*
modules are probably your cup of tea.
Features
- Provides forward- and reverse- mode AD combinators with a common API.
- Optional type-level “branding” is available to prevent the end user from confusing infinitesimals
- Each mode has a separate module full of combinators, with a consistent look and feel.
Examples
You can compute derivatives of functions
Prelude Numeric.AD> diff sin 0 {- cos 0 -}
1.0
Or both the answer and the derivative of a function:
Prelude Numeric.AD> diff' (exp . log) 2
(2.0,1.0)
You can compute the derivative of a function with a constant parameter using auto
:
Prelude Numeric.AD> let t = 2.0 :: Double
Prelude Numeric.AD> diff (\ x -> auto t * sin x) 0
2.0
You can use a symbolic numeric type, like the one from simple-reflect
to obtain symbolic derivatives:
Prelude Debug.SimpleReflect Numeric.AD> diff atanh x
recip (1 - x * x) * 1
You can compute gradients for functions that take non-scalar values in the form of a Traversable functor full of AD variables.
Prelude Numeric.AD Debug.SimpleReflect> grad (\[x,y,z] -> x * sin (x + log y)) [x,y,z]
[ 0 + (0 + sin (x + log y) * 1 + 1 * (0 + cos (x + log y) * (0 + x * 1)))
, 0 + (0 + recip y * (0 + 1 * (0 + cos (x + log y) * (0 + x * 1))))
, 0
]
which one can simplify to:
[ sin (x + log y) + cos (x + log y) * x, recip y * cos (x + log y) * x, 0 ]
If you need multiple derivatives you can calculate them with diffs
:
Prelude Numeric.AD> take 10 $ diffs sin 1
[0.8414709848078965,0.5403023058681398,-0.8414709848078965,-0.5403023058681398,0.8414709848078965,0.5403023058681398,-0.8414709848078965,-0.5403023058681398,0.8414709848078965,0.5403023058681398]
or if your function takes multiple inputs, you can use grads, which returns an ‘f-branching stream’ of derivatives, that you can
inspect lazily. Somewhat more intuitive answers can be obtained by converting the stream into the polymorphically recursive
Jet
data type. With that we can look at a single “layer” of the answer at a time:
The answer:
Prelude Numeric.AD> headJet $ jet $ grads (\[x,y] -> exp (x * y)) [1,2]
7.38905609893065
The gradient:
Prelude Numeric.AD> headJet $ tailJet $ jet $ grads (\[x,y] -> exp (x * y)) [1,2]
[14.7781121978613,7.38905609893065]
The hessian (n * n matrix of 2nd derivatives)
Prelude Numeric.AD> headJet $ tailJet $ tailJet $ jet $ grads (\[x,y] -> exp (x * y)) [1,2]
[[29.5562243957226,22.16716829679195],[22.16716829679195,7.38905609893065]]
Or even higher order tensors of derivatives as a jet.
Prelude Numeric.AD> headJet $ tailJet $ tailJet $ tailJet $ jet $ grads (\[x,y] -> exp (x * y)) [1,2]
[[[59.1124487914452,44.3343365935839],[44.3343365935839,14.7781121978613]],[[44.3343365935839,14.7781121978613],[14.7781121978613,7.38905609893065]]]
Note the redundant values caused by the various symmetries in the tensors. The ad
library is careful to compute
each distinct derivative only once, lazily and to share the resulting computation.
Overview
Modules
Numeric.AD
computes using whichever mode or combination thereof is suitable to each individual combinator. This mode is the default, re-exported byNumeric.AD
Numeric.AD.Mode.Forward
provides basic forward-mode AD. It is good for computing simple derivatives.Numeric.AD.Mode.Sparse
computes a sparse forward-mode AD tower. It is good for higher derivatives or large numbers of outputs.Numeric.AD.Mode.Kahn
computes with reverse-mode AD. It is good for computing a few outputs given many inputs.Numeric.AD.Mode.Reverse
computes with reverse-mode AD. It is good for computing a few outputs given many inputs, when not using sparks heavily.Numeric.AD.Mode.Tower
computes a dense forward-mode AD tower useful for higher derivatives of single input functions.Numeric.AD.Newton
provides a number of combinators for root finding using Newton’s method with quadratic convergence.Numeric.AD.Halley
provides a number of combinators for root finding using Halley’s method with cubic convergence.Numeric.AD.Rank1.*
provides combinators for AD that are strictly rank-1. This makes it easier to flip and contort them with higher order functions at the expense of type safety when it comes to infinitsimal confusion.
Combinators
While not every mode can provide all operations, the following basic operations are supported, modified as appropriate by the suffixes below:
grad
computes the gradient (vector of partial derivatives at a given point) of a function.jacobian
computes the Jacobian matrix of a function at a point.diff
computes the derivative of a function at a point.du
computes a directional derivative of a function at a point.hessian
computes the Hessian matrix (matrix of second partial derivatives) of a function at a point.
Combinator Suffixes
The following suffixes alter the meanings of the functions above as follows:
'
also return the answerWith
lets the user supply a function to blend the input with the outputF
is a version of the base function lifted to return aTraversable
(orFunctor
) results
means the function returns all higher derivatives in a list or f-branchingStream
T
means the result is transposed with respect to the traditional formulation (usually to avoid paying for transposing back)0
means that the resulting derivative list is padded with 0s at the end.NoEq
means that an infinite list of converging values is returned rather than truncating the list when they become constant
Contact Information
Contributions and bug reports are welcome!
Please feel free to contact me through github or on the #haskell IRC channel on irc.freenode.net.
-Edward Kmett
Changes
4.5.4 [2023.02.19]
-
Add a
Num (Scalar (Scalar t))
constraint toOn
’sMode
instance, which is required to make it typecheck with GHC 9.6.(Note that this constraint was already present implicitly due to superclass expansion, so this is not a breaking change. The only reason that it must be added explicitly with GHC 9.6 or later is due to 9.6 being more conservative with superclass expansion.)
4.5.3 [2023.01.21]
- Support building with GHC 9.6.
4.5.2 [2022.06.17]
- Fix a bug that would cause
Numeric.AD.Mode.Reverse.diff
andNumeric.AD.Mode.Reverse.Double.diff
to compute different answers under certain circumstances whenad
was compiled with the+ffi
flag.
4.5.1 [2022.05.18]
- Allow building with
transformers-0.6.*
.
4.5 [2021.11.07]
- The build-type has been changed from
Custom
toSimple
. To achieve this, thedoctests
test suite has been removed in favor of usingcabal-docspec
to run the doctests. - Expose
Dense
mode AD again. - Add a
Dense.Representable
mode, which is a variant ofDense
that exploitsRepresentable
functors rather thanTraversable
functors. Representable
can now also be useful as it can allow us tounjet
to convert a value of typeJet f a
safely back intoCofree f a
.- Improve
Reverse.Double
mode performance by increasing strictness and using an FFI-based tape. - Reverse mode AD uses
reifyTypeable
internally. This means the region parameter/infinitesimals that mark each tape areTypeable
, allowing you to do things like define instances ofException
that name the region parameter and perform similar shenanigans. - Drastically reduce code duplication in
Double
-based modes, enabling more of them. - Fixed a number of modes that were handling
(**)
improperly due to the aforementioned code duplication problem. - Add a
Tower.Double
mode (internally) that uses lazy lists of strict doubles. - Add a
Kahn.Double
mode (internally) that holds strict doubles in the graph. - Switch to using pattern synonyms internally for detecting “known” zeros.
- Drop support for versions of GHC before 8.0
- The
.Double
modes have been modified to exploit the fact that we can definitely check a Double for equality with 0. In future releases we may require a typeclass that offers the ability to check for known zeroes for all types you process. This will allow us to improve the quality of the results, but may require you to either write an small instance declaration if you are processing some esoteric data type of your own, or put on/off a newtype that indicates to skip known zero optimizations or to use Eq. If there are particularly common types with tricky cases, a futuread-instances
package might be the right way forward for them to find a home. - Add
Numeric.AD.Double
, which tries to mix and match between all the different AD modes to produce optimal results but uses the various.Double
specializations to reduce the amount of boxing and indirection on the heap. - Add
Numeric.AD.Halley.Double
. - Removed the
fooNoEq
variants fromNewton.Double
,Double
s always have anEq
instance.
4.4.1 [2020.10.13]
- Change the fixity of
:-
inNumeric.AD.Jet
to be right-associative. Previously, it wasinfixl
, which made things likex :- y :- z
nearly unusable. - Fix backpropagation error in Kahn mode.
- Fix bugs in the
Erf
instance forForwardDouble
. - Add
Numeric.AD.Mode.Reverse.Double
, a variant ofNumeric.AD.Mode.Reverse
that is specialized toDouble
. - Re-export
Jet(..)
,headJet
,tailJet
andjet
fromNumeric.AD
.
4.4 [2020.02.03]
-
Generalize the type of
stochasticGradientDescent
:-stochasticGradientDescent :: (Traversable f, Fractional a, Ord a) => (forall s. Reifies s Tape => f (Scalar a) -> f (Reverse s a) -> Reverse s a) -> [f (Scalar a)] -> f a -> [f a] +stochasticGradientDescent :: (Traversable f, Fractional a, Ord a) => (forall s. Reifies s Tape => e -> f (Reverse s a) -> Reverse s a) -> [e] -> f a -> [f a]
4.3.6 [2019.02.28]
- Make the test suite pass when built against
musl
libc
.
4.3.5 [2018.01.18]
- Add
Semigroup
instance forId
.
4.3.4
- Support
doctest-0.12
4.3.3
- Revamp
Setup.hs
to usecabal-doctest
. This makes it build withCabal-2.0
, and makes thedoctest
s work withcabal new-build
and sandboxes.
4.3.2.1
- GHC 8 support
- Fix Kahn mode’s
**
implementation - Fix multiple problems in Erf and InvErf methods
4.3.2
- Added
NoEq
versions of several combinators that can be used whenEq
isn’t available on the numeric type involved.
4.3.1
- Further improvements have been made in the performance of
Sparse
mode, at least asymptotically, when used on functions with many variables. Since this is the target use-case forSparse
in the first place, this seems like a good trade-off. Note: this results in an API change, but only in the API of anInternal
module, so this is treated as a minor version bump.
4.3
- Made drastic improvements in the performance of
Tower
andSparse
modes thanks to the help of Björn von Sydow. - Added constrained convex optimization.
- Incorporated some suggestions from herbie for improving floating point accuracy.
4.2.4
- Added
Newton.Double
modules for performance.
4.2.3
reflection
2 support
4.2.2
- Major bug fix for
grads
,jacobians
, and anything that usesSparse
mode inNumeric.AD
. Derivatives after the first two were previously incorrect.
4.2.1.1
- Support
nats
version 1
4.2.1
- Added
stochasticGradientDescent
.
4.2
- Removed broken
Directed
mode. - Added
Numeric.AD.Rank1
combinators and moved most infinitesimal handling back out of the modes and into anAD
wrapper.
4.1
- Fixed a bug in the type of
conjugateGradientAscent
andconjugateGradientDescent
that prevent users from being able to ever call it.
4.0.0.1
- Added the missing
instances.h
header file toextra-source-files
.
4.0
- An overhaul permitting monomorphic modes was completed by @alang9.
- Add a
ForwardDouble
monomorphic mode
3.4
- Added support for
erf
andinverf
, etc. fromData.Number.Erf
. - Split the infinitesimal and mode into two separate parameters to facilitate inlining and easier extension of the API.
3.3.1
- Build system improvements
- Removed unused LANGUAGE pragmas
- Added HLint configuration
- We now use exactly the same versions of the packages used to build
ad
when running the doctests.
3.3
- Renamed
Reverse
toKahn
andWengert
toReverse
. We use Arthur Kahn’s topological sorting algorithm to sort the tape after the fact in Kahn mode, while the stock Reverse mode builds a Wengert list as it goes, which is more efficient in practice.
3.2.2
- Export of the
conjugateGradientDescent
andgradientDescent
fromNumeric.AD
3.2.1
conjugateGradientDescent
now stops before it starts returning NaN results.
3.2
- Renamed
Chain
toWengert
to reflect its use of Wengert lists for reverse mode. - Renamed
lift
toauto
to avoid conflict with the more prevalenttransformers
library. - Fixed a bug in
Numeric.AD.Forward.gradWith'
, which caused it to return the wrong value for the primal.
3.1.4
- Added a better “convergence” test for
findZero
- Compute
tan
andtanh
derivatives directly.
3.1.3
- Added
conjugateGradientDescent
andconjugateGradientAscent
toNumeric.AD.Newton
.
3.1.2
- Dependency bump
3.1
- Added
Chain
mode, which isReverse
using a linear tape that doesn’t need to be sorted. - Added a suite of doctests.
- Bug fix in
Forward
mode. It was previously yielding incorrect results for anything that usedbind
orbind'
internally.
3.0
- Moved the contents of
Numeric.AD.Mode.Mixed
intoNumeric.AD
- Split off
Numeric.AD.Variadic
for the variadic combinators - Removed the
UU
,FU
,UF
, andFF
type aliases. - Stopped exporting the types for
Mode
andAD
from almost every module. ImportNumeric.AD.Types
if necessary. - Renamed
Tensors
toJet
- Dependency bump to be compatible with ghc 7.4.1 and mtl 2.1
- More aggressive zero tracking.
diff (**n) 0
for constant n anddiff (0**)
both now yield the correct answer for all modes.