Fast parsing and extracting information from (possibly malformed) HTML/XML documents https://github.com/vshabanov/fast-tagsoup
|Latest on Hackage:||1.0.14|
This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.
Fast TagSoup parser. Speeds of 20-200MB/sec were observed.
Works only with strict bytestrings.
This library is intended to be used in conjunction with the original
import Text.HTML.TagSoup hiding (parseTags, renderTags) import Text.HTML.TagSoup.Fast
fast-tagsoup correctly handles HTML
<style> tags, converts tags to lower case and can decode non UTF-8 XML for you.
This parser is used in production in BazQux Reader feeds and comments crawler.