A package with basic parsing utilities for several Bioinformatic data formats.

Version on this page:1.4.1
LTS Haskell 22.25:
Stackage Nightly 2024-06-16:
Latest on Hackage:

See all snapshots sequence-formats appears in

GPL-3.0-only licensed by Stephan Schiffels
Maintained by [email protected]
This version can be pinned in stack with:sequence-formats-1.4.1@sha256:bac0b2e7e614b5311a5743b69144e6c04d7d5097556b68a28fbccae39cfff947,2244

Sequence-Formats is a Haskell package to provide file parsers and writers for some commonly, and less commonly used file formats in Bioinformatics, speficially population genetics.

The library makes heavy use of the pipes library to provide Producers and Consumers for reading and writing files.


V First entry in the Changelog. Added Haddock documentation to all modules and prepare for releasing on Hackage.

V Exporting readVCFfromProd

V 1.1.5: Fixed VCF parser: Now breaks if lines end prematurely

V 1.1.6: VCF parser now allows for truncated VCF files with no Format and Genotypes (sites-only VCF files)

V 1.1.7: Added option to parse Bim file (slightly different layout to Eigenstrat Snp Format), and added genetic position and snpId to the EigenstratSnpEntry datastructure. This will cause breaking changes in code linking against this library.

V 1.1.8: Added more consumers for Bim and Eigenstrat Snp files

V Added Eq class to EigenstratInd and Sex

V Added Eq and Show classes to various FreqSum entities. Fixed writing function, added tests.

V Added tests for Fasta import. Succeed.

V 1.2.0: Added tests for VCF, and several bugfixes. Now runs on LTS-14.1 with pipes-text as legacy dependency.

V 1.3.0: Removed pipes-text, text and turtle dependencies and some more. Restructured all datatypes to use Bytestring instead of text.

V 1.3.1: Moved test suite outside of the main library into the test source directory. Cleaner setup.

V 1.3.2: Added testDat to Cabal file to make tests work off the tarball.

V Fixed a hard-coded absolute path in the test-suite.

V 1.3.3: Added Pileup as new format. Changed all tests to Hspec.

V 1.4.0: Added three features:

  • Chromosomes now include X, Y and MT (or chrX, chrY, chrMT), in that order after chr22.
  • SNP rsId information is now internallyincluded as an option in the FreqSum data format.
  • Pileup Format now also records strandorientation

V Added test file example.pileup to cabal extra-source-files to make tests work.

V 1.4.1:

  • Added optional genetic position to FreqSumformat,
  • changed various internal strings toByteStrings and vice versa.