On-line accumulation of rank-based statistics

Latest on Hackage:0.1

This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow to host generated Haddocks.

BSD3 licensed and maintained by Oleg Grenrus


A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl


λ *Data.TDigest > median (tdigest [1..1000] :: TDigest 3)
Just 499.0090729817737


Using 50M exponentially distributed numbers:

  • average: 16s; incorrect approximation of median, mostly to measure prng speed
  • sorting using vector-algorithms: 33s; using 1000MB of memory
  • sparking t-digest (using some par): 53s
  • buffered t-digest: 68s
  • sequential t-digest: 65s

Example histogram

tdigest-simple -m tdigest -d standard -s 100000 -c 10 -o output.svg -i 34
cp output.svg example.svg
inkscape --export-png=example.png --export-dpi=80 --export-background-opacity=0 --without-gui example.svg




  • Add validateHistogram and debugPrint
  • Fix a pointy centroid bug.
  • Add Data.TDigest.NonEmpty module
  • Add mean, variance, stddev
comments powered byDisqus