delta-h

Online entropy-based model of lexical category acquisition. https://bitbucket.org/gchrupala/delta-h

Latest on Hackage:0.0.3

This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.

BSD3 licensed by Grzegorz Chrupala and Afra Alishahi
Maintained by pitekus@gmail.com
= DELTA-H

Online entropy-based model of lexical category acquisition.
Grzegorz Chrupala and Afra Alishahi

= INSTALL

Install the Haskell Platform: http://hackage.haskell.org/platform/

On linux, the following command will install the delta-h executable in the
bin directory:

cabal install --prefix=`pwd`

= USAGE

The data directory has an example input file data/goat.txt
The other files are CHILDES.

To induce a model (i.e. a set of clusters), execute the following:

> ./bin/delta-h learn '[-12,0,12]' data/goat.txt

The argument '[-12,0,12]' specifies the features to be used (in this case
preceding bigram, focus word, and following bigram. Feature ids can be
inspected in the source file src/Entropy/Features.hs

The model will be stored in data/goat.txt.[-12,0,12].learn.model

You can display the model in a human-readable format with:

> ./bin/delta-h display data/goat.txt.[-12,0,12].learn.model

The learned model can also be used to label input data, without
further learning:

> ./bin/delta-h label True True data/goat.txt.[-12,0,12].learn.model < \
data/goat.txt

The first argument specifies whether to use focus word for labeling,
the second argument whether to avoid outputting new cluster ids (not
in the model).

There is also a command which test the learned model on the word
prediction task:

> ./bin/delta-h eval-mrr True True data/goat.txt.[-12,0,12].learn.model < \
data/goat.txt

The first argument specifies whether to marginalize over all cluster
assignments, the second whether to output detailed information.

The semantic property prediction task can be run with the eval-sem command:
> ./bin/delta-h eval-sem False data/lexicon TRAIN.pos TRAIN.cluster \
TEST.pos TEST.cluster

The meaning of the arguments to this command:
False - do not produce verbose output
data/lexicon - semantic property lexicon file (generated from Wordnet)
TRAIN.pos - POS tagged train data
TRAIN.cluster - train data labeled with cluster IDs (use the label command to
generate it)
TEST.pos - POS tagged test data
TEST.cluster - test data labeled with cluster IDs (use the label command to
generate it)

= SOURCES

There are some other (currently undocumented) commands: inspect src/Main.hs

The main part of the model is implemented in src/Entropy/Algorithm.hs.

comments powered byDisqus