A collection of tools for processing PDF files.


Version on this page:0.1.1@rev:1
LTS Haskell 22.28:0.1.3
Stackage Nightly 2024-07-13:0.1.3
Latest on Hackage:0.1.3

See all snapshots pdf-toolbox-core appears in

BSD-3-Clause licensed by Yuras Shumovich
Maintained by Yuras Shumovich
This version can be pinned in stack with:pdf-toolbox-core-0.1.1@sha256:9f3a9eea11420982f4f84addda9994d6ee756e9c9ed5c1691214ab0fcc80b6c0,3944

Low level tools for processing PDF files.

Level of abstraction: cross reference, trailer, indirect object, object

The API is based on random access input streams, and is designed to be memory efficient. We don't need to parse the entire PDF file and store it in memory when you need just one page or two. Usually it is also leads to time efficiency, but we don't try optimize performance by e.g. maintaining xref or object cache. Higher level layers should take care of it.

The library is low level. It may mean that you need to be familiar with PDF file internals to actually use it.




  • rework API
  • support ghc from 8.0 to 8.10 and drop older versions
  • interpret unknown xref stream entry type as reference to null object
  • support 1- and 2-digit escapes sequence in literal string

  • add Functor and Applicative instances to fix AMP warnings
  • fix attoparsec module deprication warnings
  • add scientific dependency latest attoparsec uses it for numbers