gsc-weighting

Generic implementation of Gerstein/Sonnhammer/Chothia weighting.

Latest on Hackage:0.2.2

This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.

BSD-3-Clause licensed by Felipe Almeida Lessa
Maintained by [email protected]

In their 1994 paper "Volume Changes in Protein Evolution", Gerstein, Sonnhammer and Chothia developed a weighting procedure for protein sequences to avoid over-represented sequences in the appendix "A Method to Weight Protein Sequences to Correct for Unequal Representation". Although their method was developed for protein sequences, it is readily extended to work on any measurable set.

This package calculates GSC weights for any reasonable dendrogram. If you want to recreate their algorithm, then just use UPGMA as linkage and residue identity as distance function when creating the dendrogram.

Changes in version 0.2:

  • Updated to work with hierarchical-clustering 0.4.

Changes in version 0.1.1.1:

  • Use an stricter upper bound on hierachical-clustering.

Changes in version 0.1.1:

  • Now works even when some (or all) branches have distance zero (i.e. elements below that branches are all equal with respect to distance metric that was used to create the dendrogram).