Libary for parsing Clustal tools output

This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow to host generated Haddocks.

GPL-3.0-only licensed by Florian Eggenhofer

ClustalParser Hackage Build Status

Currently contains parsers and datatypes for: clustalw2, clustalo, mlocarna, cmalign

Clustal tools are multiple sequence alignment tools for biological sequences like DNA, RNA and Protein. For more information on clustal Tools refer to

Mlocarna is a multiple sequence alignment tool for RNA sequences with secondary structure output. For more information on mlocarna refer to

cmalign is a multiple sequence alignment program based on RNA family models and produces ,among others, clustal output. It is part of infernal

4 types of output are parsed

  • Alignment file (.aln):
  • Parsing with readClustalAlignment from filepath (Bio.ClustalParser)
  • Parsing with parseClustalAlignment from String (Bio.ClustalParser)
  • Alignment file with secondary structure (.aln):
  • Parsing with readStructuralClustalAlignment from filepath (Bio.ClustalParser)
  • Parsing with parsStructuralClustalAlignment from String (Bio.ClustalParser)
  • Summary (printed to STDOUT):
  • Parsing with readClustalSummary from filepath (Bio.ClustalParser)
  • Parsing with parseClustalSummary from String (Bio.ClustalParser)
  • Phylogenetic Tree (.dnd):
  • Parsing with readGraphNewick from filepath (Bio.Phylogeny)
  • Parsing with readGraphNewick from String (Bio.Phylogeny)



1.3.0 Florian Eggenhofer 14. November 2019

  • Fixed requested tick number for compilation with GHC 8.6.*
  • Changed to Biobase style

1.2.3 Florian Eggenhofer 12. March 2018

  • Fixed parsing of additional newline in Biopythons AlignIO output without conservation track

1.2.2 Florian Eggenhofer 07. March 2018

  • Clustal parser can now parse alignments with missing consensus annotation

1.2.1 Florian Eggenhofer 06. February 2017

  • Structural alignment parser now works with multiline consensus structures

1.2.0 Florian Eggenhofer 07. January 2017

  • Changed datastructures for sequence identifers and sequences to Data.Text

1.1.4 Florian Eggenhofer 30. May 2016

  • Fixed a bug in output of clustal alignments with sequence length of 60

1.1.3 Florian Eggenhofer 4. July 2015

  • Nucleotide sequences are now parsed by a unified function in line with IUPAC nucleotide code

1.1.2 Florian Eggenhofer 3. July 2015

  • Included parsing of optional field in mlocarna clustal output

1.1.1 Florian Eggenhofer 2. July 2015

  • Added support for cmalign clustal output .

1.1.0 Florian Eggenhofer 1. July 2015

  • Added Hspec test-suite for parsing functions
  • Added Show instances for ClustalAlignment and StructuralClustalAlignment

1.0.3 Florian Eggenhofer 19. April 2015

  • Added Y (pyrimidine) and R (purine) to sequence characters

1.0.2 [Florian Eggenhofer](> 19. March 2015

* Linebreaks are now filtered from structural alignment sequence identifiers

1.0.1 [Florian Eggenhofer](> 27. October 2014

* Fixed compiler warnings and updated documentation to mention structural clustal format
* Added -Wall and -O2 compiler options
* Added support for clustal alignments with secondary structure annotation