Haskell PDB file format parser.

Build Status Hackage Hackage Dependencies

Protein Data Bank file format is a most popular format for holding biomolecule data.

This is a very fast parser:

  • below 7s for the largest entry in PDB - 1HTQ which is over 70MB
  • as compared with 11s of RASMOL 2.7.5,
  • or 2m15s of BioPython with Python 2.6 interpreter.

It is aimed to not only deliver event-based interface, but also a high-level data structure for manipulating data in spirit of BioPython’s PDB parser.

Details on official releases are on Hackage

This package is also a part of Stackage - a stable subset of Hackage.

Projects for the future:

Please let me know if you would be willing to push the project further.

In particular one may considering these features:

  • Implement basic spatial operations of RMS superposition (with SVD), affine transform on a substructure.
  • Use lens to facilitate access to the data structures.
    • torsion angles within protein/RNA chain.
  • Add Octree to the default data structure (with automatic update.)
  • Migrate out of text-format, since it gives portability trouble, and slows things down when printing.
  • Write a combinator library for generic fast parsing.
  • Checking whether GHC 7.8 improved efficiency of fixed point arithmetic, since PDB coordinates have dynamic range of just ~2^20 bits, with smallest step of 0.001.
  • Class-based wrappers showing Structure-Model-Chain-Residue-Atom interface with possible wrapping of Repa/Accelerate arrays for fast computation.

Please ask me any questions on Gitter.


-*-Changelog-*- Jul 2018
* Moved from AC-Vector to Linear.
* Added stack configuration. Nov 2017
* Relax deps for GHC 8.2 Jun 2016
* Use hashmap for element properties.
Great thanks to Ben Gamari:
https://ghc.haskell.org/trac/ghc/ticket/917 Jun 2016
* Updated bounds again. Jun 2016
* Updated bounds for GHC 8.0. Jun 2015
* Cleaned, added some documentation. Apr 2015
* Version bump. Apr 2015
* Update for zlib version Dec 2014
* Relaxed upper bounds

1.2.0 May 2014
* Iterable 3.0

1.1.1 Jan 2014
* Exposed PDBWritable class for all objects that can be written to PDB

1.1 Jan 2014
* Octree building with Bio.PDB.Structure.Neighbours.

0.9999.1 Sep 2013
* Removed most compilation options and replaced them with conditionals
on library versions.

0.9999 Sep 2013
* Parallel parsing.
* Change of Iterable interface from imap -> itmap etc.

0.99 Sep 2013
* First public release.
comments powered byDisqus