datasets

Classical data sets for statistics and machine learning

LTS Haskell 20.26:	0.4.0
Stackage Nightly 2022-11-17:	0.4.0
Latest on Hackage:	0.4.0

See all snapshots datasets appears in

MIT licensed by Tom Nielsen

Maintained by Marco Zocca <ocramz fripost org>

This version can be pinned in stack with:datasets-0.4.0@sha256:9bfd5b54c6c5e1e72384a890cf29bf85a02007e0a31c98753f7d225be3c7fa6a,4929

Module documentation for 0.4.0

Numeric
- Numeric.Dataloader
- Numeric.Datasets
  - Numeric.Datasets.Abalone
  - Numeric.Datasets.Adult
  - Numeric.Datasets.Anscombe
  - Numeric.Datasets.BostonHousing
  - Numeric.Datasets.BreastCancerWisconsin
  - Numeric.Datasets.CIFAR10
  - Numeric.Datasets.CO2
  - Numeric.Datasets.Car
  - Numeric.Datasets.Coal
  - Numeric.Datasets.Gapminder
  - Numeric.Datasets.Internal
    - Numeric.Datasets.Internal.Streaming
  - Numeric.Datasets.Iris
  - Numeric.Datasets.Michelson
  - Numeric.Datasets.Mushroom
  - Numeric.Datasets.Netflix
  - Numeric.Datasets.Nightingale
  - Numeric.Datasets.OldFaithful
  - Numeric.Datasets.Quakes
  - Numeric.Datasets.States
  - Numeric.Datasets.Sunspots
  - Numeric.Datasets.Titanic
  - Numeric.Datasets.UN
  - Numeric.Datasets.Vocabulary
  - Numeric.Datasets.Wine
  - Numeric.Datasets.WineQuality

Depends on 30 packages(full list with versions):

Classical machine learning and statistics datasets from the UCI Machine Learning Repository and other sources.

The datasets package defines two different kinds of datasets:

small data sets which are directly (or indirectly with `file-embed`) embedded in the package as pure values and do not require network or IO to download the data set. This includes Iris, Anscombe and OldFaithful.
other data sets which need to be fetched over the network with Numeric.Datasets.getDataset and are cached in a local temporary directory.

The datafiles/ directory of this package includes copies of a few famous datasets, such as Titanic, Nightingale and Michelson.

Example :

import Numeric.Datasets (getDataset)
import Numeric.Datasets.Iris (iris)
import Numeric.Datasets.Abalone (abalone)

main = do
  -- The Iris data set is embedded
  print (length iris)
  print (head iris)
  -- The Abalone dataset is fetched
  abas <- getDataset abalone
  print (length abas)
  print (head abas)

Changes

0.4 * Get rid of dependency on ‘data-default’ (introduced by previous versions of ‘req’)

* Bump 'req' dependency to 2.0.0

0.3 * ‘datasets’ hosted within the DataHaskell/dh-core project

* use 'req' for HTTP and HTTPS requests, instead of 'wreq'

* Mushroom and Titanic datasets

* Restructured top-level documentation

* Removed 'csvDatasetPreprocess' and added 'withPreprocess'. Now bytestring preprocessing is more compositional, i.e. 'withPreprocess' can be used with JSON datasets as well.

0.2.5

* Old Faithful matches R dataset

0.2.4

* Netflix dataset

0.2.3

* Coal dataset

* New internal API

* Ord instance for IrisClass

0.2.2

* Enum, Bounded instances for IrisClass

* Gapminder dataset

* Use wreq for HTTP and HTTPS requests

0.2.1

* Wine quality datasets

* Vocabulary, UN, States datasets

* CO2, Sunspots and Quakes datasets

0.2.0.3

* Further GHC portability

0.2.0.2

* Improve GHC portability

0.2.0.1

* Bugfix: include embedded data files in cabal extra-source-files

0.2

* iris dataset is a pure value (with file-embed)

* Michelson, Nightingale and BostonHousing datasets