backprop
Heterogeneous automatic differentation (backpropagation)
https://github.com/mstksg/backprop#readme
Version on this page: | 0.1.5.2 |
LTS Haskell 22.34: | 0.2.6.5 |
Stackage Nightly 2024-09-10: | 0.2.6.5 |
Latest on Hackage: | 0.2.6.5 |
backprop-0.1.5.2@sha256:cc5d44f46f1f96a2886540cf53400adf66f13354f7743a2d6c89c8cd6064efa9,2502
Module documentation for 0.1.5.2
backprop
Automatic heterogeneous back-propagation.
Write your functions to compute your result, and the library will automatically generate functions to compute your gradient.
Differs from ad by offering full heterogeneity – each intermediate step and the resulting value can have different types. Mostly intended for usage with gradient descent and other numeric optimization techniques.
Currently up on hackage (with 100% documentation coverage), but more up-to-date documentation is currently rendered on github pages!
MNIST Digit Classifier Example
My blog post introduces the concepts in this library in the context of training a handwritten digit classifier. I recommend reading that first.
There are some literate haskell examples in the source, though (rendered as pdf here), which can be built (if stack is installed) using:
$ ./Build.hs exe
There is a follow-up tutorial on using the library with more advanced types, with extensible neural networks a la this blog post, available as literate haskell and also rendered as a PDF.
Brief example
(This is a really brief version of my blog post)
The quick example below describes the running of a neural network with one
hidden layer to calculate its squared error with respect to target targ
,
which is parameterized by two weight matrices and two bias vectors.
Vector/matrix types are from the hmatrix package.
Let’s make a data type to store our parameters, with convenient accessors using lens:
data Network i h o = Net { _weight1 :: L h i
, _bias1 :: R h
, _weight2 :: L o h
, _bias2 :: R o
}
makeLenses ''Network
Normally, we might write code to “run” a neural network on an input like this:
neuralNet
:: R i
-> Network i h o
-> R h
neuralNet x n = z
where
y = logistic $ (n ^. weight1) #> x + (n ^. bias1)
z = logistic $ (n ^. weight2) #> y + (n ^. bias2)
logistic :: Floating a => a -> a
logistic x = 1 / (1 + exp (-x))
(R i
is an i-length vector, L h i
is an h-by-i matrix, etc., #>
is
matrix-vector multiplication, and ^.
is access to a field via lens.)
When given an input vector and the network, we compute the result of the neural network ran on the input vector.
We can write it, instead, using backprop:
neuralNet
:: Reifies s W
=> BVar s (R i)
-> BVar s (Network i h o)
-> BVar s (R o)
neuralNet x n = z
where
y = logistic $ (n ^^. weight1) #> x + (n ^^. bias1)
z = logistic $ (n ^^. weight2) #> y + (n ^^. bias2)
logistic :: Floating a => a -> a
logistic x = 1 / (1 + exp (-x))
(#>!
is a backprop-aware version of #>
, and ^^.
is access to a field via
lens in a BVar
)
And that’s it! neuralNet
is now backpropagatable!
We can “run” it using evalBP
:
evalBP (neuralNet (constVar x)) :: Network i h o -> R o
And we can find the gradient using gradBP
:
gradBP (neuralNet (constVar x)) :: Network i h o -> Network i h o
If we write a function to compute errors:
netError
:: Reifies s W
=> BVar s (R i)
-> BVar s (R o)
-> BVar s (Network i h o)
-> BVar s Double
netError x targ n = norm_2 (neuralNet x - t)
(norm_2
is a backprop-aware euclidean norm)
Now, we can perform gradient descent!
gradDescent
:: R i
-> R o
-> Network i h o
-> Network i h o
gradDescent x targ n0 = n0 - 0.1 * gradient
where
gradient = gradBP (netError (constVar x) (constVar targ)) n0
Ta dah! We were able to compute the gradient of our error function, just by only saying how to compute the error itself.
For a more fleshed out example, see my blog post and the MNIST tutorial (also rendered as a pdf)
Lens Access
A lot of the friction of dealing with BVar s a
s instead of a
s directly is
alleviated with the lens interface.
With a lens, you can “view” and “set” items inside a BVar
, as if they were
the actual values:
(^.) :: a -> Lens' a b -> b
(^^.) :: BVar s a -> Lens' a b -> BVar s b
(.~) :: Lens' a b -> b -> a -> a
(.~~) :: Lens' a b -> BVar s b -> BVar s a -> BVar s a
And you can also extract multiple potential targets, as well, using
Traversal
s and Prism
s:
-- | Actually takes a Traversal, to be more general.
-- Can be used to implement "pattern matching" on BVars
(^?) :: a -> Prism' a b -> Maybe ( b)
(^^?) :: BVar s a -> Prism' a b -> Maybe (BVar s b)
(^..) :: a -> Traversal' a b -> [ b]
(^^..) :: BVar s a -> Traversal' a b -> [BVar s b]
Note that the library itself has no lens dependency, using microlens instead.
Benchmarks
Here are some basic benchmarks comparing the library’s automatic differentiation process to “manual” differentiation by hand. When using the MNIST tutorial as an example:
-
For computing the gradient, there is about a 2.5ms overhead (or about 3.5x) compared to computing the gradients by hand. Some more profiling and investigation can be done, since there are two main sources of potential slow-downs:
- “Inefficient” gradient computations, because of automated differentiation not being as efficient as what you might get from doing things by hand and simplifying. This sort of cost is probably not avoidable.
- Overhead incurred by the book-keeping and actual automatic differentiating system, which involves keeping track of a dependency graph and propagating gradients backwards in memory. This sort of overhead is what we would be aiming to reduce.
It is unclear which one dominates the current slowdown.
-
However, it may be worth noting that this isn’t necessarily a significant bottleneck. Updating the networks using hmatrix actually dominates the runtime of the training. Manual gradient descent takes 3.2ms, so the extra overhead is about 60%-70%.
-
Running the network (and the backprop-aware functions) incurs virtually zero overhead (about 4%), meaning that library authors could actually export backprop-aware functions by default and not lose any performance.
Todo
-
Benchmark against competing back-propagation libraries like ad, and auto-differentiating tensor libraries like grenade
-
Write tests!
-
Explore potentially ditching
Num
for another typeclass that only has+
,0
, and1
. Currently,Num
is required for all backpropagated types, but only+
,fromInteger 0
, andfromInteger 1
are ever used.The main upside to using
Num
is that it integrates well with the rest of the Haskell ecosystem, and many things already have usefulNum
instances.There are two downsides – one minor and one major.
-
It requires more work to make a type backpropagatable. Instead of writing only
+
,0
and1
, users must also define*
,-
ornegate
,abs
,signum
, and all offromInteger
. However, I don’t see this being a big issue in practice, since most values that will be used with backprop would presumably also benefit from having a fullNum
instance even without the need to backprop. -
Automatically generated prisms (used with
^^?
) work with tuples, and so cannot work out-of-the-box without aNum
instance for tuples. In addition, it’s often useful to have anonymous products and tuples in general.This is bandaided-over by having backprop provide canonical tuple-with-
Num
types for different libraries to use, but it’s not a perfect solution.This can be resolved by using the orphan instances in the NumInstances package. Still, there might be some headache for application developers if different libraries using backprop accidentally pull in their orphan instances from different places.
Alternatively, one day we can get
Num
instances for tuples into base!
The extra complexity that would come from adding a custom typeclass just for
+
/0
/1
, though, I feel, might not be worth the benefit. The entire numeric Haskell ecosystem, at the time, revolves aroundNum
.However, it is worth noting that it wouldn’t be too hard to add “Additive Typeclass” instances for any custom types – one would just need to define
(<+>) = (+)
,zero = fromInteger 0
, andone = fromInteger 1
(a three-liner), so it might not be too bad.But really, a lot of this would all resolve itself if we got
Num
instances for tuples in base :) -
-
Explore opportunities for parallelization. There are some naive ways of directly parallelizing right now, but potential overhead should be investigated.
-
Some open questions:
a. Is it possible to support constructors with existential types?
Changes
Changelog
Version 0.1.5.2
Apr 26, 2018
https://github.com/mstksg/backprop/releases/tag/v0.1.5.2
- Added
coerceVar
to Numeric.Backprop - Added
Random
instaces for all tuple types. Same as forBinary
, this does incur a random and time dependency only from the tuple types. Again, because these packages are a part of GHC’s boot libraries, this is hopefully not too bad.
Version 0.1.5.1
Apr 9, 2018
https://github.com/mstksg/backprop/releases/tag/v0.1.5.1
- Fixed
NFData
instance forT
; before, was shallow. - Added
Typeable
instances for all tuple types, and forBVar
. - Added
Eq
,Ord
,Show
, etc. instances forT
. - Added
Binary
instances for all tuple types. Note that this does incur a binary dependency only because of the tuple types; however, this will hopefully be not too much of an issue because binary is a GHC library anyway.
Version 0.1.5.0
Mar 30, 2018
https://github.com/mstksg/backprop/releases/tag/v0.1.5.0
T
added to Numeric.Backprop.Tuple: basically anHList
with aNum
instance.Eq
andOrd
instances forBVar
. Is this sound?
Internal
- Refactored
Monoid
instances in Numeric.Backprop.Tuple
Version 0.1.4.0
Mar 25, 2018
https://github.com/mstksg/backprop/releases/tag/v0.1.4.0
isoVar
,isoVar2
,isoVar3
, andisoVarN
: convenient aliases for applying isomorphisms toBVar
s. Helpful for use with constructors and deconstructors.opIso2
andopIso3
added to Numeric.Backprop.Op, for convenience.T0
(Unit with numeric instances) added to *Numeric.Backprop.Tuple”.
Internal
- Completely decoupled the internal implementation from
Num
, which showed some performance benefits. Mostly just to make the code slightly cleaner, and to prepare for some day potentially decoupling the external API fromNum
as well.
Version 0.1.3.0
Feb 12, 2018
https://github.com/mstksg/backprop/releases/tag/v0.1.3.0
- Preulude.Backprop module added with lifted versions of several Prelude and base functions.
liftOpX
family of operators now have a more logical ordering for type variables. This change breaks backwards-compatibility.opIsoN
added to export list of Numeric.BackpropnoGrad
andnoGrad1
added to Numeric.Backprop.Op, for functions with no defined gradient.
Internal
- Completely decoupled the internal implementation from
Num
, which showed some performance benefits.
Version 0.1.2.0
Feb 7, 2018
https://github.com/mstksg/backprop/releases/tag/v0.1.2.0
- Added currying and uncurrying functions for tuples in Numeric.Backprop.Tuple.
opIsoN
, for isomorphisms between a tuple of values and a value.- (Internal) AD engine now using
Any
from ghc-prim instead ofSome I
from type-combinators
Version 0.1.1.0
Feb 6, 2018
https://github.com/mstksg/backprop/releases/tag/v0.1.1.0
- Added canonical strict tuple types with
Num
instances, in the module Numeric.Backprop.Tuple. This is meant to be a band-aid for the problem of orphan instances and potential mismatched tuple types. - Fixed bug in
collectVar
that occurs if container sizes change
Internal
- Internal tweaks to the underlying automatic differentiation types that
decouple backpropagation from
Num
, internally.Num
is now just used externally as a part of the API, which might someday be made optional.
Version 0.1.0.0
Feb 5, 2018
https://github.com/mstksg/backprop/releases/tag/v0.1.0.0
- First non-alpha release.
- More or less complete redesign of library. The entire API is completely
changed, and there is no backwards compatibility!
- Everything is now “implicit” style, and there is no more
BP
monad. - Accessing items in
BVar
s is now lens-, prism-, and traversal- based, instead of iso- and generics-based. Op
is no longer monadic- Mono modules are removed.
- Implicit modules are removed, since they are the default
- Iso module is removed, since
Iso
s no longer play major role in the implementation of the library.
- Everything is now “implicit” style, and there is no more
- Removed dependency on ad and ad-based ops, which had been pulling in the vast majority of dependencies.
- Moved from .cabal file to hpack system.
Version 0.0.3.0
Alpha
https://github.com/mstksg/backprop/releases/tag/v0.0.3.0
-
Removed samples as registered executables in the cabal file, to reduce dependences to a bare minimum. For convenience, build script now also compiles the samples into the local directory if stack is installed.
-
Added experimental (unsafe) combinators for working with GADTs with existential types,
withGADT
, to Numeric.Backprop module. -
Fixed broken links in changelog.
Version 0.0.2.0
Alpha
https://github.com/mstksg/backprop/releases/tag/v0.0.2.0
-
Added optimized numeric
Op
s, and re-writeNum
/Fractional
/Floating
instances in terms of them. -
Removed all traces of
Summer
/Unity
from the library, eliminating a whole swath of “explicit-Summer”/“explicit-Unity” versions of functions. As a consequence, the library now only works withNum
instances. The API, however, is now much more simple. -
Benchmark suite added for MNIST example.
Version 0.0.1.0
Alpha
https://github.com/mstksg/backprop/releases/tag/v0.0.1.0
- Initial pre-release, as a request for comments. API is in a usable form and everything is fully documented, but there are definitely some things left to be done. (See README.md)