ad

Hackage Build Status

A package that provides an intuitive API for Automatic Differentiation (AD) in Haskell. Automatic differentiation provides a means to calculate the derivatives of a function while evaluating it. Unlike numerical methods based on running the program with multiple inputs or symbolic approaches, automatic differentiation typically only decreases performance by a small multiplier.

AD employs the fact that any program y = F(x) that computes one or more value does so by composing multiple primitive operations. If the (partial) derivatives of each of those operations is known, then they can be composed to derive the answer for the derivative of the entire program at a point.

This library contains at its core a single implementation that describes how to compute the partial derivatives of a wide array of primitive operations. It then exposes an API that enables a user to safely combine them using standard higher-order functions, just as you would with any other Haskell numerical type.

There are several ways to compose these individual Jacobian matrices. We hide the choice used by the API behind an explicit “Mode” type-class and universal quantification. This prevents users from confusing infinitesimals. If you want to risk infinitesimal confusion in order to get greater flexibility in how you curry, flip and generally combine the differential operators, then the Rank1.* modules are probably your cup of tea.

Features

  • Provides forward- and reverse- mode AD combinators with a common API.
  • Optional type-level “branding” is available to prevent the end user from confusing infinitesimals
  • Each mode has a separate module full of combinators, with a consistent look and feel.

Examples

You can compute derivatives of functions

Prelude Numeric.AD> diff sin 0 {- cos 0 -}
1.0

Or both the answer and the derivative of a function:

Prelude Numeric.AD> diff' (exp . log) 2
(2.0,1.0)

You can compute the derivative of a function with a constant parameter using auto:

Prelude Numeric.AD> let t = 2.0 :: Double
Prelude Numeric.AD> diff (\ x -> auto t * sin x) 0
2.0

You can use a symbolic numeric type, like the one from simple-reflect to obtain symbolic derivatives:

Prelude Debug.SimpleReflect Numeric.AD> diff atanh x
recip (1 - x * x) * 1

You can compute gradients for functions that take non-scalar values in the form of a Traversable functor full of AD variables.

Prelude Numeric.AD Debug.SimpleReflect> grad (\[x,y,z] -> x * sin (x + log y)) [x,y,z]
[ 0 + (0 + sin (x + log y) * 1 + 1 * (0 + cos (x + log y) * (0 + x * 1)))
, 0 + (0 + recip y * (0 + 1 * (0 + cos (x + log y) * (0 + x * 1))))
, 0
]

which one can simplify to:

[ sin (x + log y) + cos (x + log y) * x, recip y * cos (x + log y) * x, 0 ]

If you need multiple derivatives you can calculate them with diffs:

Prelude Numeric.AD> take 10 $ diffs sin 1
[0.8414709848078965,0.5403023058681398,-0.8414709848078965,-0.5403023058681398,0.8414709848078965,0.5403023058681398,-0.8414709848078965,-0.5403023058681398,0.8414709848078965,0.5403023058681398]

or if your function takes multiple inputs, you can use grads, which returns an ‘f-branching stream’ of derivatives, that you can inspect lazily. Somewhat more intuitive answers can be obtained by converting the stream into the polymorphically recursive Jet data type. With that we can look at a single “layer” of the answer at a time:

The answer:

Prelude Numeric.AD> headJet $ jet $  grads (\[x,y] -> exp (x * y)) [1,2]
7.38905609893065

The gradient:

Prelude Numeric.AD> headJet $ tailJet $ jet $  grads (\[x,y] -> exp (x * y)) [1,2]
[14.7781121978613,7.38905609893065]

The hessian (n * n matrix of 2nd derivatives)

Prelude Numeric.AD> headJet $ tailJet $ tailJet $ jet $  grads (\[x,y] -> exp (x * y)) [1,2]
[[29.5562243957226,22.16716829679195],[22.16716829679195,7.38905609893065]]

Or even higher order tensors of derivatives as a jet.

Prelude Numeric.AD> headJet $ tailJet $ tailJet $ tailJet $ jet $  grads (\[x,y] -> exp (x * y)) [1,2]
[[[59.1124487914452,44.3343365935839],[44.3343365935839,14.7781121978613]],[[44.3343365935839,14.7781121978613],[14.7781121978613,7.38905609893065]]]

Note the redundant values caused by the various symmetries in the tensors. The ad library is careful to compute each distinct derivative only once, lazily and to share the resulting computation.

Overview

Modules

  • Numeric.AD computes using whichever mode or combination thereof is suitable to each individual combinator. This mode is the default, re-exported by Numeric.AD
  • Numeric.AD.Mode.Forward provides basic forward-mode AD. It is good for computing simple derivatives.
  • Numeric.AD.Mode.Sparse computes a sparse forward-mode AD tower. It is good for higher derivatives or large numbers of outputs.
  • Numeric.AD.Mode.Kahn computes with reverse-mode AD. It is good for computing a few outputs given many inputs.
  • Numeric.AD.Mode.Reverse computes with reverse-mode AD. It is good for computing a few outputs given many inputs, when not using sparks heavily.
  • Numeric.AD.Mode.Tower computes a dense forward-mode AD tower useful for higher derivatives of single input functions.
  • Numeric.AD.Newton provides a number of combinators for root finding using Newton’s method with quadratic convergence.
  • Numeric.AD.Halley provides a number of combinators for root finding using Halley’s method with cubic convergence.
  • Numeric.AD.Rank1.* provides combinators for AD that are strictly rank-1. This makes it easier to flip and contort them with higher order functions at the expense of type safety when it comes to infinitsimal confusion.

Combinators

While not every mode can provide all operations, the following basic operations are supported, modified as appropriate by the suffixes below:

  • grad computes the gradient (vector of partial derivatives at a given point) of a function.
  • jacobian computes the Jacobian matrix of a function at a point.
  • diff computes the derivative of a function at a point.
  • du computes a directional derivative of a function at a point.
  • hessian computes the Hessian matrix (matrix of second partial derivatives) of a function at a point.

Combinator Suffixes

The following suffixes alter the meanings of the functions above as follows:

  • ' also return the answer
  • With lets the user supply a function to blend the input with the output
  • F is a version of the base function lifted to return a Traversable (or Functor) result
  • s means the function returns all higher derivatives in a list or f-branching Stream
  • T means the result is transposed with respect to the traditional formulation (usually to avoid paying for transposing back)
  • 0 means that the resulting derivative list is padded with 0s at the end.
  • NoEq means that an infinite list of converging values is returned rather than truncating the list when they become constant

Contact Information

Contributions and bug reports are welcome!

Please feel free to contact me through github or on the #haskell IRC channel on irc.freenode.net.

-Edward Kmett

Changes

4.3.3

  • Revamp Setup.hs to use cabal-doctest. This makes it build with Cabal-2.0, and makes the doctests work with cabal new-build and sandboxes.

4.3.2.1

  • GHC 8 support
  • Fix Kahn mode’s ** implementation
  • Fix multiple problems in Erf and InvErf methods

4.3.2

  • Added NoEq versions of several combinators that can be used when Eq isn’t available on the numeric type involved.

4.3.1

  • Further improvements have been made in the performance of Sparse mode, at least asymptotically, when used on functions with many variables. Since this is the target use-case for Sparse in the first place, this seems like a good trade-off. Note: this results in an API change, but only in the API of an Internal module, so this is treated as a minor version bump.

4.3

  • Made drastic improvements in the performance of Tower and Sparse modes thanks to the help of Björn von Sydow.
  • Added constrained convex optimization.
  • Incorporated some suggestions from herbie for improving floating point accuracy.

4.2.4

  • Added Newton.Double modules for performance.

4.2.3

  • reflection 2 support

4.2.2

  • Major bug fix for grads, jacobians, and anything that uses Sparse mode in Numeric.AD. Derivatives after the first two were previously incorrect.

4.2.1.1

  • Support nats version 1

4.2.1

  • Added stochasticGradientDescent.

4.2

  • Removed broken Directed mode.
  • Added Numeric.AD.Rank1 combinators and moved most infinitesimal handling back out of the modes and into an AD wrapper.

4.1

  • Fixed a bug in the type of conjugateGradientAscent and conjugateGradientDescent that prevent users from being able to ever call it.

4.0.0.1

  • Added the missing instances.h header file to extra-source-files.

4.0

  • An overhaul permitting monomorphic modes was completed by @alang9.
  • Add a ForwardDouble monomorphic mode

3.4

  • Added support for erf and inverf, etc. from Data.Number.Erf.
  • Split the infinitesimal and mode into two separate parameters to facilitate inlining and easier extension of the API.

3.3.1

  • Build system improvements
  • Removed unused LANGUAGE pragmas
  • Added HLint configuration
  • We now use exactly the same versions of the packages used to build ad when running the doctests.

3.3

  • Renamed Reverse to Kahn and Wengert to Reverse. We use Arthur Kahn’s topological sorting algorithm to sort the tape after the fact in Kahn mode, while the stock Reverse mode builds a Wengert list as it goes, which is more efficient in practice.

3.2.2

  • Export of the conjugateGradientDescent and gradientDescent from Numeric.AD

3.2.1

  • conjugateGradientDescent now stops before it starts returning NaN results.

3.2

  • Renamed Chain to Wengert to reflect its use of Wengert lists for reverse mode.
  • Renamed lift to auto to avoid conflict with the more prevalent transformers library.
  • Fixed a bug in Numeric.AD.Forward.gradWith', which caused it to return the wrong value for the primal.

3.1.4

  • Added a better “convergence” test for findZero
  • Compute tan and tanh derivatives directly.

3.1.3

  • Added conjugateGradientDescent and conjugateGradientAscent to Numeric.AD.Newton.

3.1.2

  • Dependency bump

3.1

  • Added Chain mode, which is Reverse using a linear tape that doesn’t need to be sorted.
  • Added a suite of doctests.
  • Bug fix in Forward mode. It was previously yielding incorrect results for anything that used bind or bind' internally.

3.0

  • Moved the contents of Numeric.AD.Mode.Mixed into Numeric.AD
  • Split off Numeric.AD.Variadic for the variadic combinators
  • Removed the UU, FU, UF, and FF type aliases.
  • Stopped exporting the types for Mode and AD from almost every module. Import Numeric.AD.Types if necessary.
  • Renamed Tensors to Jet
  • Dependency bump to be compatible with ghc 7.4.1 and mtl 2.1
  • More aggressive zero tracking.
  • diff (**n) 0 for constant n and diff (0**) both now yield the correct answer for all modes.