MIT licensed by Tom Nielsen
Maintained by Marco Zocca <ocramz fripost org>
This version can be pinned in stack with:datasets-0.4.0@sha256:9bfd5b54c6c5e1e72384a890cf29bf85a02007e0a31c98753f7d225be3c7fa6a,4929

Module documentation for 0.4.0

  • Numeric
    • Numeric.Dataloader
    • Numeric.Datasets
      • Numeric.Datasets.Abalone
      • Numeric.Datasets.Adult
      • Numeric.Datasets.Anscombe
      • Numeric.Datasets.BostonHousing
      • Numeric.Datasets.BreastCancerWisconsin
      • Numeric.Datasets.CIFAR10
      • Numeric.Datasets.CO2
      • Numeric.Datasets.Car
      • Numeric.Datasets.Coal
      • Numeric.Datasets.Gapminder
      • Numeric.Datasets.Internal
        • Numeric.Datasets.Internal.Streaming
      • Numeric.Datasets.Iris
      • Numeric.Datasets.Michelson
      • Numeric.Datasets.Mushroom
      • Numeric.Datasets.Netflix
      • Numeric.Datasets.Nightingale
      • Numeric.Datasets.OldFaithful
      • Numeric.Datasets.Quakes
      • Numeric.Datasets.States
      • Numeric.Datasets.Sunspots
      • Numeric.Datasets.Titanic
      • Numeric.Datasets.UN
      • Numeric.Datasets.Vocabulary
      • Numeric.Datasets.Wine
      • Numeric.Datasets.WineQuality

Classical machine learning and statistics datasets from the UCI Machine Learning Repository and other sources.

The datasets package defines two different kinds of datasets:

  • small data sets which are directly (or indirectly with `file-embed`) embedded in the package as pure values and do not require network or IO to download the data set. This includes Iris, Anscombe and OldFaithful.

  • other data sets which need to be fetched over the network with Numeric.Datasets.getDataset and are cached in a local temporary directory.

The datafiles/ directory of this package includes copies of a few famous datasets, such as Titanic, Nightingale and Michelson.

Example :

import Numeric.Datasets (getDataset)
import Numeric.Datasets.Iris (iris)
import Numeric.Datasets.Abalone (abalone)

main = do
  -- The Iris data set is embedded
  print (length iris)
  print (head iris)
  -- The Abalone dataset is fetched
  abas <- getDataset abalone
  print (length abas)
  print (head abas)

Changes

0.4 * Get rid of dependency on ‘data-default’ (introduced by previous versions of ‘req’)

* Bump 'req' dependency to 2.0.0 

0.3 * ‘datasets’ hosted within the DataHaskell/dh-core project

* use 'req' for HTTP and HTTPS requests, instead of 'wreq'

* Mushroom and Titanic datasets

* Restructured top-level documentation

* Removed 'csvDatasetPreprocess' and added 'withPreprocess'. Now bytestring preprocessing is more compositional, i.e. 'withPreprocess' can be used with JSON datasets as well.

0.2.5

* Old Faithful matches R dataset

0.2.4

* Netflix dataset

0.2.3

* Coal dataset

* New internal API

* Ord instance for IrisClass

0.2.2

* Enum, Bounded instances for IrisClass

* Gapminder dataset

* Use wreq for HTTP and HTTPS requests

0.2.1

* Wine quality datasets

* Vocabulary, UN, States datasets

* CO2, Sunspots and Quakes datasets

0.2.0.3

* Further GHC portability

0.2.0.2

* Improve GHC portability

0.2.0.1

* Bugfix: include embedded data files in cabal extra-source-files

0.2

* iris dataset is a pure value (with file-embed)

* Michelson, Nightingale and BostonHousing datasets