datasets

Classical data sets for statistics and machine learning

https://github.com/diffusionkinetics/open/datasets

Version on this page:0.2.5
LTS Haskell 20.26:0.4.0
Stackage Nightly 2022-11-17:0.4.0
Latest on Hackage:0.4.0

See all snapshots datasets appears in

MIT licensed and maintained by Tom Nielsen
This version can be pinned in stack with:datasets-0.2.5@sha256:3188930dae7df41c4632f95728536a2e9eac0aaf9ad3431a35c6ce3b994a03c0,4100

Module documentation for 0.2.5

  • Numeric
    • Numeric.Datasets
      • Numeric.Datasets.Abalone
      • Numeric.Datasets.Adult
      • Numeric.Datasets.Anscombe
      • Numeric.Datasets.BostonHousing
      • Numeric.Datasets.BreastCancerWisconsin
      • Numeric.Datasets.CO2
      • Numeric.Datasets.Car
      • Numeric.Datasets.Coal
      • Numeric.Datasets.Gapminder
      • Numeric.Datasets.Iris
      • Numeric.Datasets.Michelson
      • Numeric.Datasets.Netflix
      • Numeric.Datasets.Nightingale
      • Numeric.Datasets.OldFaithful
      • Numeric.Datasets.Quakes
      • Numeric.Datasets.States
      • Numeric.Datasets.Sunspots
      • Numeric.Datasets.UN
      • Numeric.Datasets.Vocabulary
      • Numeric.Datasets.Wine
      • Numeric.Datasets.WineQuality

Classical machine learning and statistics datasets from the UCI Machine Learning Repository and other sources.

The datasets package defines two different kinds of datasets:

  • small data sets which are directly (or indirectly with `file-embed`) embedded in the package as pure values and do not require network or IO to download the data set. This includes Iris, Anscombe and OldFaithful.

  • other data sets which need to be fetched over the network with Numeric.Datasets.getDataset and are cached in a local temporary directory.

import Numeric.Datasets (getDataset)
import Numeric.Datasets.Iris (iris)
import Numeric.Datasets.Abalone (abalone)

main = do
  -- The Iris data set is embedded
  print (length iris)
  print (head iris)
  -- The Abalone dataset is fetched
  abas <- getDataset abalone
  print (length abas)
  print (head abas)

Changes

0.2.5

  • Old faithful matches R dataset

0.2.4

  • Netflix dataset

0.2.3

  • Coal dataset

  • New internal API

  • Ord instance for IrisClass

0.2.2

  • Enum, bounded instances for IrisClass

  • Gapminder dataset

  • Use wreq for HTTP and HTTPS requests

0.2.1

  • Wine quality datasets

  • Vocabulary, UN, States datasets

  • CO2, Sunspots and Quakes datasets

0.2.0.3

  • Further GHC portability

0.2.0.2

  • Improve GHC portability

0.2.0.1

  • Bugfix: include embedded data files in cabal extra-source-files

0.2

  • iris dataset is a pure value (with file-embed)

  • Michelson, Nightingale and BostonHousing datasets