html-parse

A high-performance HTML tokenizer http://github.com/bgamari/html-parse

Latest on Hackage:0.2.0.0

This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.

BSD3 licensed by Ben Gamari
Maintained by ben@smart-cactus.org

This package provides a fast and reasonably robust HTML5 tokenizer built upon the attoparsec library. The parsing strategy is based upon the HTML5 parsing specification with few deviations.

The package targets similar use-cases to the venerable tagsoup library, but is significantly more efficient, achieving parsing speeds of over 50 megabytes per second on modern hardware with and typical web documents.

Depends on:
comments powered byDisqus