construct
Haskell version of the Construct library for easy specification of file formats
Version on this page: | 0.3.1.1 |
LTS Haskell 22.36: | 0.3.1.2 |
Stackage Nightly 2024-10-05: | 0.3.1.2 |
Latest on Hackage: | 0.3.1.2 |
construct-0.3.1.1@sha256:dc0c57fd2fc3eb4b619a884c6070d44b460cc3f7990ccc0973aa94ab5d8b1dd7,2960
Module documentation for 0.3.1.1
Construct.hs
This is a Haskell implementation of Python’s Construct library. It provides a succinct and easy way to specify data formats. Before you get to the succinct part, though, you’ll probably need a bunch of extensions and imports:
{-# LANGUAGE FlexibleInstances, StandaloneDeriving, TemplateHaskell #-}
module README where
import Data.Functor.Identity (Identity(Identity))
import Data.Word (Word8)
import Data.ByteString (ByteString)
import qualified Data.ByteString as ByteString
import qualified Data.ByteString.Char8 as ASCII
import qualified Rank2.TH
import Text.ParserCombinators.Incremental.LeftBiasedLocal (Parser, completeResults, feed, feedEof)
import Construct
import Prelude hiding ((*>), (<*))
Example
With that out of the way, let’s take the simple example format from the original. Here’s what its specification looks like in Haskell:
data BitMap f = BitMap{
width :: f Word8,
height :: f Word8,
pixels :: f [[Word8]]
}
deriving instance Show (BitMap Identity)
$(Rank2.TH.deriveAll ''BitMap)
format :: Format (Parser ByteString) Maybe ByteString (BitMap Identity)
format = literal (ASCII.pack "BMP") *> mfix (\this-> record
BitMap{
width= byte,
height= byte,
pixels= count (fromIntegral $ height this) $
count (fromIntegral $ width this) byte
})
There are two parts to the specification.
The data BitMap
declaration specifies the in-memory layout of a simple bitmap. Note that it’s declared as a record
with every field wrapped in a type constructor parameter. This declaration style is sometimes called Higher-Kinded
Data. To regain a regular record, just instantiate the
parameter to Identity
— a BitMap Identity
contains exactly one value of each field.
The other part of the specification is the format
definition that specifies the bi-directional mapping between the
in-memory and the serialized form of the bitmap. A bijection, to be precise. The two definitions are enough to automatically
serialize the in-memory record form into the binary form:
-- |
-- >>> let record = BitMap{width= pure 3, height= pure 2, pixels= pure [[7,8,9], [11,12,13]]}
-- >>> serialize format record
-- Just "BMP\ETX\STX\a\b\t\v\f\r"
and to parse the serialized binary form back into the record structure:
-- |
-- >>> let bytes = ASCII.pack "BMP\ETX\STX\a\b\t\v\f\r"
-- >>> completeResults $ feedEof $ feed bytes $ parse format
-- [(BitMap {width = Identity 3, height = Identity 2, pixels = Identity [[7,8,9],[11,12,13]]},"")]
Examples of more complex and realistic formats can be found in the
test
directory.
Acknowledgements
I owe the inspiration for this library to Yair Chuchem and his post that introduced me to Construct. I must also express gratitude to the authors of the original library of course. And finally, to the authors of the paper Invertible Syntax Descriptions:Unifying Parsing and Pretty Printing which I remembered reading just in time to avoid following some bad ideas.
Implementation notes
I had to overcome two problems in the course of the implementation. The first difficulty, already touched on in the
aforementioned blog post, is how to convert a record of formats into a format of the record. As the author of
rank2classes, I went for an obvious solution: parameterize the
record as seen in the example, make it an instance of the
Rank2.Traversable
class,
and apply Rank2.traverse
to
it.
That little trick alone would have been enough for a nearly complete package, except the Python library also enables a record field to refer to any of the preceding fields. This is not a problem when serializing, but when parsing it means that a parser must be able to refer to an already-parsed part of the value that’s still being parsed.
The standard solution to the problem of accessing the results (parsed values) of a computation (parsing) while within
the computation is known as the
MonadFix
class, though
perhaps not as widely as it deserves. You won’t find any parsers in the list of its instances, though, and for a good
reason - it’s quite impossible for a parser to obtain a value that it hasn’t parsed yet. All we need, luckily, is
access an already parsed part of a partially parsed structure. A limited MonadFix
instance can do that, provided
that we also use a specialized form of Rank2.traverse
capable of preserving the record structure while it’s being
parsed.
As it happens, I have an old parser library named
incremental-parser on Hackage, and the name seems quite
appropriate for what I’m doing here. I added the necessary functionality there, but another parser combinator library
should be capable of the same feat. It just needs to implement the mfix
and fixSequence
combinators and it can be
used with the present library.
Changes
Revision history for construct
0.3.1.1 – 2022-10-03
- Increment the
monoid-sublasses
andinput-parsers
dependencies’ upper bounds
0.3.1 – 2022-03-25
- Increment the
text
dependency upper bound - Move the Template Haskell splices so they compile with GHC 9
0.3.0.2 – 2021-03-22
- Increment the attoparsec dependency upper bounds
0.3.0.1 – 2021-03-07
- Increment the dependencies’ upper bounds
0.3 – 2020-07-18
- Import the Input[Char]Parser classes from the
input-parsers
package
0.2.0.1 – 2020-03-08
- Incremented the upper bound for the
rank2classes
dependency
0.2 – 2020-01-27
- Updated for
base-4.13
- Eliminated the
Monoid/Semigroup (n s)
constraints - Added
recordWith
- Fixed the URI example and test
- Constrained the URI format serialization to ByteString
- Switched the URI parser to Attoparsec
0.1 – 2020-01-22
- First version. Released on an unsuspecting world.