A newtype over
Bool with a better
package represents unboxed arrays of
spending 1 byte (8 bits) per boolean.
This library provides a newtype wrapper
Bit and a custom instance
Vector, which packs bits densely,
achieving 8x less memory footprint.
The performance stays mostly the same;
the most significant degradation happens for random writes
(up to 10% slower).
On the other hand, for certain bulk bit operations
Vector Bit is up to 64x faster than
Data.Bitis faster, but writes and flips are thread-unsafe. This is because naive updates are not atomic: read the whole word from memory, then modify a bit, then write the whole word back.
Data.Bit.ThreadSafeis slower (up to 20%), but writes and flips are thread-safe.
arrayis memory-efficient for
Bool, but lacks a handy
Vectorinterface and is not thread-safe.
Consider the following (very naive) implementation of
the sieve of Eratosthenes. It returns a vector with
at prime indices and
False at composite indices.
import Control.Monad import Control.Monad.ST import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as MU eratosthenes :: U.Vector Bool eratosthenes = runST $ do let len = 100 sieve <- MU.replicate len True MU.write sieve 0 False MU.write sieve 1 False forM_ [2 .. floor (sqrt (fromIntegral len))] $ \p -> do isPrime <- MU.read sieve p when isPrime $ forM_ [2 * p, 3 * p .. len - 1] $ \i -> MU.write sieve i False U.unsafeFreeze sieve
We can switch from
Bit just by adding newtype constructors:
import Data.Bit import Control.Monad import Control.Monad.ST import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as MU eratosthenes :: U.Vector Bit eratosthenes = runST $ do let len = 100 sieve <- MU.replicate len (Bit True) MU.write sieve 0 (Bit False) MU.write sieve 1 (Bit False) forM_ [2 .. floor (sqrt (fromIntegral len))] $ \p -> do Bit isPrime <- MU.read sieve p when isPrime $ forM_ [2 * p, 3 * p .. len - 1] $ \i -> MU.write sieve i (Bit False) U.unsafeFreeze sieve
Bit-based implementation requires 8x less memory to store
the vector. For large sizes it allows to crunch more data in RAM
without swapping. For smaller arrays it helps to fit into
> listBits eratosthenes [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
There are several high-level helpers, digesting bits in bulk,
which makes them up to 64x faster than respective counterparts
Vector Bool. One can query population count (popcount)
of a vector (giving us the prime-counting function):
> countBits eratosthenes 25
And vice-versa, query an address of the n-th set bit (which corresponds to the n-th prime number here):
> nthBitIndex (Bit True) 10 eratosthenes Just 29
One may notice that the order of the inner traversal by
does not matter and get tempted to run it in several parallel threads.
In this case it is vital to switch from
because the former is thread-unsafe with regards to writes.
There is a moderate performance penalty (up to 20%)
for using the thread-safe interface.
- Performance improvements.
- Redesign API from the scratch.
- Add a thread-safe implementation.
- Add ‘nthBitIndex’ function.
- Fix ‘Read’ instance.
- Remove hand-written ‘Num’, ‘Real’, ‘Integral’, ‘Bits’ instances.
- Derive ‘Bits’ and ‘FiniteBits’ instances.
- Expose ‘Bit’ constructor directly and remove ‘fromBool’ function.
- Rename ‘toBool’ to ‘unBit’.
- Fix bugs in
- Speed up