tdigest

On-line accumulation of rank-based statistics

https://github.com/phadej/haskell-tdigest#readme

Version on this page:0.2.1.1@rev:3
LTS Haskell 22.14:0.3@rev:1
Stackage Nightly 2024-03-28:0.3@rev:1
Latest on Hackage:0.3@rev:1

See all snapshots tdigest appears in

BSD-3-Clause licensed and maintained by Oleg Grenrus
This version can be pinned in stack with:tdigest-0.2.1.1@sha256:1607bb1fb9a5b5d7284b6ce67edf2d40c6c3d7c874a563b30170c2331cdf6928,2855

tdigest

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: “Computing extremely accurate quantiles using t-digest” by Ted Dunning and Otmar Ertl

Synopsis

λ *Data.TDigest > median (tdigest [1..1000] :: TDigest 3)
Just 499.0090729817737

Benchmarks

Using 50M exponentially distributed numbers:

  • average: 16s; incorrect approximation of median, mostly to measure prng speed
  • sorting using vector-algorithms: 33s; using 1000MB of memory
  • sparking t-digest (using some par): 53s
  • buffered t-digest: 68s
  • sequential t-digest: 65s

Example histogram

tdigest-simple -m tdigest -d standard -s 100000 -c 10 -o output.svg -i 34
cp output.svg example.svg
inkscape --export-png=example.png --export-dpi=80 --export-background-opacity=0 --without-gui example.svg

Example

Changes

0.2.1.1

  • build-type: Simple

0.2.1

  • Add size, valid, validate, and debugPrint for NonEmpty #26

0.2

  • Add Data.TDigest.Vector module.

0.1

  • Add validateHistogram and debugPrint
  • Fix a pointy centroid bug.
  • Add Data.TDigest.NonEmpty module
  • Add mean, variance, stddev