# intset

Pure, mergeable, succinct Int sets. https://github.com/pxqr/intset

Latest on Hackage: | 0.1.1.0 |

This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.

**Sam Truzjan**

**Sam Truzjan**

### Synopsis

This package provides efficient integer interval sets.

### Description

Persistent... is it trees?

Yes, Radix trees. Trees are balanced by prefix bits, so we have fast merge operations, such as union, intersection and difference. Chris Okasaki and Andrew Gill shows that Patricia tree based integer maps might be order of magnitude faster than Red-Black tree counterparts on this operations. The same apply to integer sets, we just have keys, but don't have values.

That does mean the "dense"?

That means we keep suffixes in bitmaps and we might pack, say 10, integers which lies close together in one bitmap. This optimization doesn't affect execution times for sparse sets, but makes dense sets much more memory efficient — near 10-50 times less space usage depending on machine word size and the actual density of the set. Basically, this let us be 3-4 times less memory efficient comparing with arrays of tightly packed bits, but see...

How suffix compaction is performed?

There are exist a pretty simple algorithm used in memory allocators
called "buddy memory allocator". In a nutshell, we have
a big block which is splitted by half when we remove from one of the
half, and merge then back when we insert. It's somewhat inverse to the
ordinary tree approach — in buddy tree we hold more information about
elements that it *doesn't* contain, while in prefix tree we hold more
information about elements that it *does* contain. It's easy to guess
that we should do with it — take the two structures then fuse them
into one to produce a new structure which perform *better*.

Indeed, the key idea in the design is right here — we switch forth and back between representations per subtree basis. We intersperse different representations in different tree branches. It's like chameleon:

If the some subset is

*sparse*, we just keep a radix tree with bitmaps at leafs.If the some subset becomes

*full*we turn it into block. If some buddy block appears, we join the buddy blocks into one. And so forth.

That is, we just get a structure that dynamically choose the optimal
representation depending on *density* of set. Moreover in best case
this lead to huge space savings:

`> ppStats (fromList [0..123456])`

gives:

```
Bin count: 6
Tip count: 1
Fin count: 6
Size in bytes: 408
Saved space over dense set: 123072
Saved space over bytestring: 11879
Percent saved over dense set: 99.6695821185617%
Percent saved over bytestring: 96.67941727028567%
```

The `ppStats`

is not an exposed function but you can play with it
using `cabal-dev ghci`

.

I don't know if it is an old idea, but this works just fine.

So when this data structure is good choice?

In many situation. It might be used as persistent and compact replacement for bool arrays or Data.IntSet with the following advantages:

- Purity is extremely useful in multithreaded settings — we could keep a set in a mutable transactional variable or an IORef and atomically update/modify the set. So it could be used as replacement for TArray Int Bool as well.
- By merging intervals together we achieve compactness. In best case some of main operations will take O(1)time and space, so if you need interval set it's here.
- Fast serizalization: if you are need conversion to/from bytestrings.
Because of bitmaps it's possible to do this conversion
*extremely*fast.

How this implementation relate to containers version?

Heavely based. Essentially we just add the buddy interval compaction, but it turns out that some operations becomes more complicated and requires much more effort to implement — in order to maintain the all tree invariants we need to take into account more cases. This is the reason why some operations are not implemented yet (e.g. lack of views), but I hope I'll fix it with the time.

### Documentation

For documentation see haddock generated documentation.

### Build Status

### Maintainer

This library is written and maintained by Sam T.

Feel free to report bugs and suggestions via github issue tracker or the mail.

## Changes

* 0.1.1.0: Make Show instance compatible with containers package

2013-10-04 Sam Truzjan <pxqr.sta@gmail.com>

* 0.1.0.3: FIX: Add changelog to tarball.

2013-10-04 Sam Truzjan <pxqr.sta@gmail.com>

* 0.1.0.2: Move release notes to changelog file.

2013-08-12 Sam Truzjan <pxqr.sta@gmail.com>

* 0.1.0.1: Fix build failure on 32 bit arch.

2013-06-08 Sam Truzjan <pxqr.sta@gmail.com>

* 0.1.0.0: Initial version.