Online entropy-based model of lexical category acquisition. https://bitbucket.org/gchrupala/delta-h
|Latest on Hackage:||0.0.3|
This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.
Online entropy-based model of lexical category acquisition.
Grzegorz Chrupala and Afra Alishahi
Install the Haskell Platform: http://hackage.haskell.org/platform/
On linux, the following command will install the delta-h executable in the
cabal install --prefix=`pwd`
The data directory has an example input file data/goat.txt
The other files are CHILDES.
To induce a model (i.e. a set of clusters), execute the following:
> ./bin/delta-h learn '[-12,0,12]' data/goat.txt
The argument '[-12,0,12]' specifies the features to be used (in this case
preceding bigram, focus word, and following bigram. Feature ids can be
inspected in the source file src/Entropy/Features.hs
The model will be stored in data/goat.txt.[-12,0,12].learn.model
You can display the model in a human-readable format with:
> ./bin/delta-h display data/goat.txt.[-12,0,12].learn.model
The learned model can also be used to label input data, without
> ./bin/delta-h label True True data/goat.txt.[-12,0,12].learn.model < \
The first argument specifies whether to use focus word for labeling,
the second argument whether to avoid outputting new cluster ids (not
in the model).
There is also a command which test the learned model on the word
> ./bin/delta-h eval-mrr True True data/goat.txt.[-12,0,12].learn.model < \
The first argument specifies whether to marginalize over all cluster
assignments, the second whether to output detailed information.
The semantic property prediction task can be run with the eval-sem command:
> ./bin/delta-h eval-sem False data/lexicon TRAIN.pos TRAIN.cluster \
The meaning of the arguments to this command:
False - do not produce verbose output
data/lexicon - semantic property lexicon file (generated from Wordnet)
TRAIN.pos - POS tagged train data
TRAIN.cluster - train data labeled with cluster IDs (use the label command to
TEST.pos - POS tagged test data
TEST.cluster - test data labeled with cluster IDs (use the label command to
There are some other (currently undocumented) commands: inspect src/Main.hs
The main part of the model is implemented in src/Entropy/Algorithm.hs.