Repeats from ESTs

Latest on Hackage:0.3.1

This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow to host generated Haddocks.

GPL licensed by Ketil Malde
Maintained by Ketil Malde


rselect - select a random set of sequences from a FASTA file.
reps - extract exact k-word repeats based that occur in
sequences grouped in different clusters.


You'll need GHC or possibly another Haskell system, and the
Haskell bioinformatics library. The Makefile should work to
build and install (by default to your home directory) the


rselect [-r] n [m] input.seq

Selects n sequences from the file input.seq. If the optional
m is given, this limits the selection to happen only from the first
m sequences in the file, which may be more efficient. If -r is given,
the sequences will be reoriented randomly.

The selected sequences are written to standard output, so you
probably want to redirect them to a file.

reps k clustered.seq

Generate a list of repeated k-words (or k-grams) found in the sequences.
The sequences are expected to be on UniGene format, i.e. a FASTA file
with #-initiated comments separating the clusters.

A k-word is considered repeated if it is found in more than one of the

reps k clustering.lst sequences.seq

As above, but take a separate input clustering (and ignore any #'s in the
sequences. The clustering should consist of one line per cluster, with each
line containing the sequence identifier (first word after the initial '>'
in the FASTA header).


Do let me know about them, at <>!

Used by 1 package:
comments powered byDisqus