Perfect minimal hashing implementation in native Haskell
|LTS Haskell 20.10:||1.0.0|
|Stackage Nightly 2023-02-06:||1.0.0|
|Latest on Hackage:||1.0.0|
A perfect hash function for a set
S is a hash function that maps distinct elements in
S to a set of integers, with no collisions. A minimal perfect hash function is a perfect hash function that maps
n keys to
n consecutive integers, e.g. the numbers from
In contrast with the PerfectHash package, which is a binding to a C-based library, this package is a fully-native Haskell implementation.
It is intended primarily for generating C code for embedded applications (compare to
gperf). The output of this tool is a pair of arrays that can be included in generated C code for allocation-free hash tables.
Though conceivably this data structure could be used directly in Haskell applications as a read-only hash table, it is not recommened, as lookups are about 10x slower than HashMap.
This implementation was adapted from Steve Hanov's Blog.
The library is written generically to hash both strings and raw integers according to the FNV-1a algorithm. Integers are split by octets before hashing.
import Data.PerfectHash.Construction (createMinimalPerfectHash) import qualified Data.Map as Map tuples = [ (1000, 1) , (5555, 2) , (9876, 3) ] lookup_table = createMinimalPerfectHash $ Map.fromList tuples
Generation of C code based on the arrays in
lookup_table is left as an exercise to the reader. Algorithm documentation in the
Data.PerfectHash.Lookup modules will be helpful.
hash-perfectly-ints-demo, as well as the test suite, for working examples.
$ stack build $ stack exec hash-perfectly-strings-demo
0.2.0.1 (Feb. 2018)
- Fixed a foldr vs. foldl bug with algorithmic implications
1.0.0 (June 2022)
- Changed input type from
- Removed superfluous internal map lookups by threading values alongside keys throughout the algorithm
- Used newtypes internally for algorithmic clarity