Gene Expression Programming evolutionary algorithm in Haskell http://github.com/mjsottile/hsgep/

Latest on Hackage:0.1.5

This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.

BSD3 licensed by Matthew Sottile
Maintained by Matthew Sottile
= HSGEP: Gene Expression Programming in Haskell =
= Version 0.1.3 =
= Author: Matthew Sottile (mjsottile@computer.org) =

** This code is released under the BSD3 Open Source License **

0.0: Credits

- Matthew Sottile (mjsottile@computer.org)
- Dmitrij Naumov

1.0: Introduction

This package implements the Gene Expression Programming algorithm invented
by Candida Ferreira. See the following paper for a good, concise explanation
of the method:

Ferreira, C., 2001. Gene Expression Programming: A New Adaptive
Algorithm for Solving Problems. Complex Systems, Vol. 13, issue 2:87-129.

GEP is an evolutionary algorithm for solving optimization problems. The
introduction to the paper cited above provides a good explanation of what
differentiates GEP from GP and GAs.

2.0: Background

This project is an ancestor of an earlier effort to build a generic framework
for using GEP, originally in Java. The move to a functional language
(originally SML, now Haskell) was because:

- At its core, GEP is focused on manipulating symbolic sequences and
tree structures. List and user defined data types in Haskell are very
well suited to this.

- Functional languages naturally support functions being first class
citizens, being passed around as arguments to functions. While
this abstraction is possible using object interfaces and
hierarchies (which is precisely what the Java version used), it
felt more cumbersome to manage and code up.

- Pattern matching and strict type checking provide very strong
checks on the core of the library to ensure that some classes of
bugs are not present. For example, being able to guarantee that a pattern
match is exhaustive at compile time is preferable to potential runtime
errors that may result if such compilation time checks are not performed.

3.0: Usage

At this point, the best places to look for documentation on using the
library is:

- The Haddock documentation.
- Looking at examples.

The most mature example currently included in the released code is the
regression example. In this example, a set of data points are provided
and the optimization phase seeks to evolve a polynomial composed of
basic arithmetic (+-*/), sqrt, and exponentiation operations that best fits
the data points. The example provided is fairly simplistic, and fails to
include useful things like the ability to evolve constants as part of the
polynomials. In any case, it is sufficient to demonstrate the library.

To run a regression example, you can use example input parameter and data
files from the Examples/Regression directory. For example, after building
the code, you can run the example "test1" as:

./dist/build/HSGEP_Regression/HSGEP_Regression -i ./Examples/Regression/test1.in -f ./Examples/Regression/test1.csv

The current code will then evolve a solution that maximizes the fitness
function (goodness of fit to the given data points), and will print it out.
The example will also attempt to connect to a machine running a Maxima
server to perform polynomial simplification to turn the long string of
basic arithmetic operators into a more useful polynomial. This is likely
going to not work for most people, so either ignore the error, or tweak
the code to either disable it or run the Maxima server code somewhere.

4.0: FAQ

- Q: Is this library intended to be more than a toy?
A: Yes. It has been a testbed for me to get used to some of the
details related to releasing a properly packaged library to
the world on Hackage, so some of the core has been neglected while I
did things related to build, organization, and documentation.

- Q: What are the plans for near-term new versions?
A: Lots.

- If a plotting library is available (e.g.: Chart), produce plots of
fitness over time.
- Add more examples, such as the CA Density classification task.
- Performance improvements.
- Go parallel -- there are many opportunities for parallelism during
the run of the algorithm, so it would be worth taking advantage of
- Abstract out some of the patterns currently residing in each example,
such as the process of expressing indiduals as structures. The
ultimate goal is to make the end-user code that uses the library as
simple as possible, so absorbing as much of this into the core of the
library will be a good step in that direction.
comments powered byDisqus