elynx-seq
Handle molecular sequences
https://github.com/dschrempf/elynx#readme
Version on this page: | 0.5.1.1 |
LTS Haskell 22.37: | 0.7.2.2 |
Stackage Nightly 2024-10-09: | 0.7.2.2 |
Latest on Hackage: | 0.7.2.2 |
elynx-seq-0.5.1.1@sha256:838275d08597de458179aec95ddc0b1cb6319049b1f0a55d44b4283f625ffe41,3139
Module documentation for 0.5.1.1
- ELynx
- ELynx.Data
- ELynx.Data.Alphabet
- ELynx.Data.Character
- ELynx.Data.Sequence
- ELynx.Export
- ELynx.Export.Sequence
- ELynx.Import
- ELynx.Import.Sequence
- ELynx.Data
The ELynx Suite
Version: 0.5.1.0. Reproducible evolution made easy.
A Haskell library and tool set for computational biology. The goal of ELynx is reproducible research. Evolutionary sequences and phylogenetic trees can be read, viewed, modified and simulated. The command line with all arguments is logged consistently, and automatically. Data integrity is verified using SHA256 sums so that validation of past analyses is possible without the need to recompute the result.
The Elynx Suite consists of library packages and executables providing a range of sub commands.
The library packages are:
- elynx-nexus: Nexus file support.
- elynx-markov: Simulate multi sequence alignments along phylogenetic trees.
- elynx-seq: Handle evolutionary sequences and multi sequence alignments.
- elynx-tools: Tools for the provided executables.
- elynx-tree: Handle phylogenetic trees.
The executables are:
- slynx: Analyze, modify, and simulate evolutionary sequences.
- tlynx: Analyze, modify, and simulate phylogenetic trees.
- elynx: Validate and redo past analyses.
Documentation is available on Hackage (use direct links above).
ELynx is actively developed. We happily receive comments, ideas, feature requests, and pull requests!
Installation
ELynx is written in Haskell and can be installed with Stack.
-
Install Stack with your package manager, or directly from the web page.
curl -sSL https://get.haskellstack.org/ | sh
-
Clone the ELynx repository.
git clone https://github.com/dschrempf/elynx
-
Navigate to the newly created
elynx
folder and build the binaries. This will take a while.stack build
-
Run a binary from within the project directory. For example,
stack exec tlynx -- --help
-
If needed, install the binaries.
stack install
The binaries are installed into
~/.local/bin/
which has to be added to the PATH environment variable. Then, they can be used directly.
SLynx
Handle evolutionary sequences.
stack exec slynx -- --help | head -n -16
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: slynx [-v|--verbosity VALUE] [-o|--output-file-basename NAME]
[-f|--force] [--no-elynx-file] COMMAND
Analyze, and simulate multi sequence alignments.
Available options:
-h,--help Show this help text
-V,--version Show version
-v,--verbosity VALUE Be verbose; one of: Quiet Warning Info Debug
(default: Info)
-o,--output-file-basename NAME
Specify base name of output file
-f,--force Ignore previous analysis and overwrite existing
output files.
--no-elynx-file Do not write data required to reproduce an analysis.
Available commands:
concatenate Concatenate sequences found in input files.
examine Examine sequences. If data is a multi sequence alignment, additionally analyze columns.
filter-columns Filter columns of multi sequence alignments.
filter-rows Filter rows (or sequences) found in input files.
simulate Simulate multi sequence alignments.
sub-sample Sub-sample columns from multi sequence alignments.
translate Translate from DNA to Protein or DNAX to ProteinX.
Available sequence file formats:
- FASTA
Available alphabets:
- DNA (nucleotides)
- DNAX (nucleotides; including gaps)
- DNAI (nucleotides; including gaps, and IUPAC codes)
- Protein (amino acids)
- ProteinX (amino acids; including gaps)
- ProteinS (amino acids; including gaps, and translation stops)
Concatenate
Concatenate multi sequence alignments.
stack exec slynx -- concatenate --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: slynx concatenate (-a|--alphabet NAME) INPUT-FILE
Concatenate sequences found in input files.
Available options:
-h,--help Show this help text
-V,--version Show version
-a,--alphabet NAME Specify alphabet type NAME
INPUT-FILE Read sequences from INPUT-FILE
-h,--help Show this help text
Examine
Examine sequence with slynx examine
.
stack exec slynx -- examine --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: slynx examine (-a|--alphabet NAME) INPUT-FILE [--per-site]
Examine sequences. If data is a multi sequence alignment, additionally analyze columns.
Available options:
-h,--help Show this help text
-V,--version Show version
-a,--alphabet NAME Specify alphabet type NAME
INPUT-FILE Read sequences from INPUT-FILE
--per-site Report per site summary statistics
-h,--help Show this help text
Filter
Filter sequences with filer-rows
.
stack exec slynx -- filter-rows --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: slynx filter-rows (-a|--alphabet NAME) INPUT-FILE [--longer-than LENGTH]
[--shorter-than LENGTH] [--standard-characters]
Filter rows (or sequences) found in input files.
Available options:
-h,--help Show this help text
-V,--version Show version
-a,--alphabet NAME Specify alphabet type NAME
INPUT-FILE Read sequences from INPUT-FILE
--longer-than LENGTH Only keep sequences longer than LENGTH
--shorter-than LENGTH Only keep sequences shorter than LENGTH
--standard-characters Only keep sequences containing at least one standard
(i.e., non-IUPAC) character
-h,--help Show this help text
Filter columns of multi sequence alignments with filter-columns
.
stack exec slynx -- filter-columns --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: slynx filter-columns (-a|--alphabet NAME) INPUT-FILE
[--standard-chars DOUBLE]
Filter columns of multi sequence alignments.
Available options:
-h,--help Show this help text
-V,--version Show version
-a,--alphabet NAME Specify alphabet type NAME
INPUT-FILE Read sequences from INPUT-FILE
--standard-chars DOUBLE Keep columns with a proportion standard (non-IUPAC)
characters larger than DOUBLE in [0,1]
-h,--help Show this help text
Simulate
Simulate sequences with slynx simulate
.
stack exec slynx -- simulate --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: slynx simulate (-t|--tree-file Name) [-s|--substitution-model MODEL]
[-m|--mixture-model MODEL] [-e|--edm-file NAME]
[-p|--siteprofile-files NAMES]
[-w|--mixture-model-weights "[DOUBLE,DOUBLE,...]"]
[-g|--gamma-rate-heterogeneity "(NCAT,SHAPE)"]
(-l|--length NUMBER) [-S|--seed [INT]]
Simulate multi sequence alignments.
Available options:
-h,--help Show this help text
-V,--version Show version
-t,--tree-file Name Read tree from Newick file NAME
-s,--substitution-model MODEL
Set the phylogenetic substitution model; available
models are shown below (mutually exclusive with -m
option)
-m,--mixture-model MODEL Set the phylogenetic mixture model; available models
are shown below (mutually exclusive with -s option)
-e,--edm-file NAME Empirical distribution model file NAME in Phylobayes
format
-p,--siteprofile-files NAMES
File names of site profiles in Phylobayes format
-w,--mixture-model-weights "[DOUBLE,DOUBLE,...]"
Weights of mixture model components
-g,--gamma-rate-heterogeneity "(NCAT,SHAPE)"
Number of gamma rate categories and shape parameter
-l,--length NUMBER Set alignment length to NUMBER
-S,--seed [INT] Seed for random number generator; list of 32 bit
integers with up to 256 elements (default: random)
-h,--help Show this help text
Substitution models:
-s "MODEL[PARAMETER,PARAMETER,...]{STATIONARY_DISTRIBUTION}"
Supported DNA models: JC, F81, HKY, GTR4.
For example,
-s HKY[KAPPA]{DOUBLE,DOUBLE,DOUBLE,DOUBLE}
-s GTR4[e_AC,e_AG,e_AT,e_CG,e_CT,e_GT]{DOUBLE,DOUBLE,DOUBLE,DOUBLE}
where the 'e_XY' are the exchangeabilities from nucleotide X to Y.
Supported Protein models: Poisson, Poisson-Custom, LG, LG-Custom, WAG, WAG-Custom, GTR20.
MODEL-Custom means that only the exchangeabilities of MODEL are used,
and a custom stationary distribution is provided.
For example,
-s LG
-s LG-Custom{...}
-s GTR20[e_AR,e_AN,...]{...}
the 'e_XY' are the exchangeabilities from amino acid X to Y (alphabetical order).
Notes: The F81 model for DNA is equivalent to the Poisson-Custom for proteins.
The GTR4 model for DNA is equivalent to the GTR20 for proteins.
Mixture models:
-m "MIXTURE(SUBSTITUTION_MODEL_1,SUBSTITUTION_MODEL_2[PARAMETERS]{STATIONARY_DISTRIBUTION},...)"
For example,
-m "MIXTURE(JC,HKY[6.0]{0.3,0.2,0.2,0.3})"
Mixture weights have to be provided with the -w option.
Special mixture models:
-m CXX
where XX is 10, 20, 30, 40, 50, or 60; CXX models, Quang et al., 2008.
-m "EDM(EXCHANGEABILITIES)"
Arbitrary empirical distribution mixture (EDM) models.
Stationary distributions have to be provided with the -e or -p option.
For example,
LG exchangeabilities with stationary distributions given in FILE.
-m "EDM(LG-Custom)" -e FILE
LG exchangeabilities with site profiles (Phylobayes) given in FILES.
-m "EDM(LG-Custom)" -p FILES
For special mixture models, mixture weights are optional.
Sub-sample
Sub-sample columns from multi sequence alignments.
stack exec slynx -- sub-sample --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: slynx sub-sample (-a|--alphabet NAME) INPUT-FILE
(-n|--number-of-sites INT)
(-m|--number-of-alignments INT) [-S|--seed [INT]]
Sub-sample columns from multi sequence alignments.
Available options:
-h,--help Show this help text
-V,--version Show version
-a,--alphabet NAME Specify alphabet type NAME
INPUT-FILE Read sequences from INPUT-FILE
-n,--number-of-sites INT Number of sites randomly drawn with replacement
-m,--number-of-alignments INT
Number of multi sequence alignments to be created
-S,--seed [INT] Seed for random number generator; list of 32 bit
integers with up to 256 elements (default: random)
-h,--help Show this help text
Create a given number of multi sequence alignments, each of which contains a given number of random sites drawn from the original multi sequence alignment.
Translate
Translate sequences.
stack exec slynx -- translate --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: slynx translate (-a|--alphabet NAME) INPUT-FILE (-r|--reading-frame INT)
(-u|--universal-code CODE)
Translate from DNA to Protein or DNAX to ProteinX.
Available options:
-h,--help Show this help text
-V,--version Show version
-a,--alphabet NAME Specify alphabet type NAME
INPUT-FILE Read sequences from INPUT-FILE
-r,--reading-frame INT Reading frame [0|1|2].
-u,--universal-code CODE universal code; one of: Standard,
VertebrateMitochondrial.
-h,--help Show this help text
TLynx
Handle phylogenetic trees in Newick format.
stack exec tlynx -- --help | head -n -16
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: tlynx [-v|--verbosity VALUE] [-o|--output-file-basename NAME]
[-f|--force] [--no-elynx-file] COMMAND
Compare, examine, and simulate phylogenetic trees.
Available options:
-h,--help Show this help text
-V,--version Show version
-v,--verbosity VALUE Be verbose; one of: Quiet Warning Info Debug
(default: Info)
-o,--output-file-basename NAME
Specify base name of output file
-f,--force Ignore previous analysis and overwrite existing
output files.
--no-elynx-file Do not write data required to reproduce an analysis.
Available commands:
compare Compare two phylogenetic trees (compute distances and branch-wise differences).
connect Connect two phylogenetic trees in all ways (possibly honoring constraints).
distance Compute distances between many phylogenetic trees.
examine Compute summary statistics of phylogenetic trees.
shuffle Shuffle a phylogenetic tree (keep coalescent times, but shuffle topology and leaves).
simulate Simulate phylogenetic trees using a birth and death or coalescent process.
Available tree file formats:
- Newick Standard: Branch support values are stored in square brackets after branch lengths.
- Newick IqTree: Branch support values are stored as node names after the closing bracket of forests.
- Newick RevBayes: Key-value pairs is provided in square brackets after node names as well as branch lengths. XXX: Key value pairs are ignored at the moment.
Compare
Compute distances between phylogenetic trees.
stack exec tlynx -- compare --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: tlynx compare [-n|--normalize] [-b|--bipartitions] [-t|--intersect]
[-f|--newick-format FORMAT] NAMES
Compare two phylogenetic trees (compute distances and branch-wise differences).
Available options:
-h,--help Show this help text
-V,--version Show version
-n,--normalize Normalize trees before comparison
-b,--bipartitions Print and plot common and missing bipartitions
-t,--intersect Compare intersections; i.e., before comparison, drop
leaves that are not present in the other tree
-f,--newick-format FORMAT
Newick tree format: Standard, IqTree, or RevBayes;
default: Standard; for detailed help, see 'tlynx
--help'
NAMES Tree files
-h,--help Show this help text
Connect
Connect two phylogenetic tree in all ways (possibly honoring constraints).
stack exec tlynx -- connect --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: tlynx connect [-f|--newick-format FORMAT] [-c|--contraints CONSTRAINTS]
TREE-FILE-A TREE-FILE-B
Connect two phylogenetic trees in all ways (possibly honoring constraints).
Available options:
-h,--help Show this help text
-V,--version Show version
-f,--newick-format FORMAT
Newick tree format: Standard, IqTree, or RevBayes;
default: Standard; for detailed help, see 'tlynx
--help'
-c,--contraints CONSTRAINTS
File containing one or more Newick trees to be used
as constraints
TREE-FILE-A File containing the first Newick tree
TREE-FILE-B File containing the second Newick tree
-h,--help Show this help text
Distancce
Compute distances between many phylogenetic trees.
stack exec tlynx -- distance --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: tlynx distance (-d|--distance MEASURE) [-n|--normalize] [-t|--intersect]
[-s|--summary-statistics]
[-m|--master-tree-file MASTER-TREE-File]
[-f|--newick-format FORMAT] [INPUT-FILES]
Compute distances between many phylogenetic trees.
Available options:
-h,--help Show this help text
-V,--version Show version
-d,--distance MEASURE Type of distance to calculate (available distance
measures are listed below)
-n,--normalize Normalize trees before distance calculation; only
affect distances depending on branch lengths
-t,--intersect Compare intersections; i.e., before comparison, drop
leaves that are not present in the other tree
-s,--summary-statistics Report summary statistics only
-m,--master-tree-file MASTER-TREE-File
Compare all trees to the tree in the master tree
file.
-f,--newick-format FORMAT
Newick tree format: Standard, IqTree, or RevBayes;
default: Standard; for detailed help, see 'tlynx
--help'
INPUT-FILES Read tree(s) from INPUT-FILES; if more files are
given, one tree is expected per file
-h,--help Show this help text
Distance measures:
symmetric Symmetric distance (Robinson-Foulds distance).
incompatible-split[VAL] Incompatible split distance. Collapse branches with (normalized)
support less than 0.0<=VAL<=1.0 before distance calculation;
if, let's say, VAL>0.7, only well supported differences contribute
to the total distance.
branch-score Branch score distance.
Examine
Compute summary statistics of phylogenetic trees.
stack exec tlynx -- examine --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: tlynx examine INPUT-FILE [-f|--newick-format FORMAT]
Compute summary statistics of phylogenetic trees.
Available options:
-h,--help Show this help text
-V,--version Show version
INPUT-FILE Read trees from INPUT-FILE
-f,--newick-format FORMAT
Newick tree format: Standard, IqTree, or RevBayes;
default: Standard; for detailed help, see 'tlynx
--help'
-h,--help Show this help text
Shuffle
Shuffle a phylogenetic tree (keep coalescent times, but shuffle topology and leaves).
stack exec tlynx -- shuffle --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: tlynx shuffle [-f|--newick-format FORMAT] [-n|--replicates N] TREE-FILE
[-S|--seed [INT]]
Shuffle a phylogenetic tree (keep coalescent times, but shuffle topology and leaves).
Available options:
-h,--help Show this help text
-V,--version Show version
-f,--newick-format FORMAT
Newick tree format: Standard, IqTree, or RevBayes;
default: Standard; for detailed help, see 'tlynx
--help'
-n,--replicates N Number of trees to generate
TREE-FILE File containing a Newick tree
-S,--seed [INT] Seed for random number generator; list of 32 bit
integers with up to 256 elements (default: random)
-h,--help Show this help text
Simulate
Simulate phylogenetic trees using birth and death processes.
stack exec tlynx -- simulate --help
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: tlynx simulate (-t|--nTrees INT) (-n|--nLeaves INT) PROCESS
[-u|--sub-sample DOUBLE] [-s|--summary-statistics]
[-S|--seed [INT]]
Simulate phylogenetic trees using a birth and death or coalescent process.
Available options:
-h,--help Show this help text
-V,--version Show version
-t,--nTrees INT Number of trees
-n,--nLeaves INT Number of leaves per tree
-u,--sub-sample DOUBLE Perform sub-sampling; see below.
-s,--summary-statistics For each branch, print length and number of children
-S,--seed [INT] Seed for random number generator; list of 32 bit
integers with up to 256 elements (default: random)
-h,--help Show this help text
Available processes:
birthdeath Birth and death process
coalescent Coalescent process
See, for example, 'tlynx simulate birthdeath --help'.
Sub-sample with probability p:
1. Simulate one big tree with n'=round(n/p), n'>=n, leaves;
2. Randomly sample sub-trees with n leaves.
(With p=1.0, the same tree is reported over and over again.)
ELynx
Validate and (optionally) redo past ELynx analyses.
stack exec elynx -- --help | head -n -16
ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.
Usage: elynx COMMAND
Validate and redo past ELynx analyses
Available options:
-h,--help Show this help text
-V,--version Show version
Available commands:
validate Validate an ELynx analysis
redo Redo an ELynx analysis
Changes
Revision history for ELynx
Unreleased changes
Version 0.5.1.0
- elynx-tree: new functions
isValidPath
,isLeaf
,depth
; add conversion topology -> tree; various internal algorithmic improvements; improved error messages; simplified interface to Newick parsers; parallel fold map; Nix flake. - Remove unneeded dependencies.
Version 0.5.0.2
- Speed up mixture model simulation.
- Improve rooting functions.
- Improve
Topology
data type (but still a lot to do). - Various additions to the documentation.
- Rename
Measurable
toHasLength
,Supported
toHasSupport
, andNamed
toHasLength
. - Cabal and stack file changes.
Version 0.5.0.1
modLen
,modSup
.- Newtype wrappers for branch length, branch support, and node name. Those data types and some functions were also renamed.
- Add
Path
, andgetSubTreeUnsafe
to tree zipper. - Rename
unsafe
functions so thatunsafe
is at the end. - Many small changes.
Version 0.4.1
- Improve
TimeSpec
(Point process). - Parallel evaluation strategies.
- Change names of some functions involving partitions. For example,
mp
was renamed topt
. - Improve documentation for (bi)partitions.
- Bugfix
tlynx compare
; do not throw error when branch support values are not set. - Add
no-elynx-file
option. - Also parse Nexus files with
tlynx
commands. - Bugfix
subSample
; the sub sample was reversed.
Version 0.4.0
- Major refactor of
elynx-tree
. All required function can now conveniently reexported byELynx.Tree
.
Version 0.3.4
- Improve
slynx examine
; show hamming distance; show constant sites. - PhyloStrict -> PhyloExplicit; and some conversion functions were changed.
tlynx coalesce
was merged intotlynx simulate
, the syntax has changed; seetlynx simulate --help
.
Version 0.3.3
- Fix test suites.
Version 0.3.2
- Remove llvm dependency.
- Move away from hpack.
Version 0.3.1
- Use Attoparsec.
- Use ByteString consistently.
- Remove elynx-tools dependency from libaries.
Version 0.3.0
elynx-nexus
: library to import and export Nexus files.elynx-tree
: major refactor and big cleanup; use rose trees with branch labels.elynx-tree
: provide zippers.
Version 0.2.2
- Validation and repetition of previous analyses is finally possible with the
new
elynx
binary. - A library
elynx-markov
for running Markov processes along phylogenetic trees has been split offelynx-seq
. This library performs the computations when executingslynx simulate ...
. - Many other small improvements.