Parse HTML documents using xml-conduit datatypes.

Version on this page:1.3.2
LTS Haskell 14.10:
Stackage Nightly 2019-09-21:1.3.2
Latest on Hackage:

See all snapshots html-conduit appears in

MIT licensed by Michael Snoyman
Maintained by

Module documentation for 1.3.2

This version can be pinned in stack with:html-conduit-1.3.2@sha256:e24bf7bd12e41ed960566804989672eeed8027905f9da4260e60ad184897ce46,2102

This package uses tagstream-conduit for its parser. It automatically balances mismatched tags, so that there shouldn’t be any parse failures. It does not handle a full HTML document rendering, such as adding missing html and head tags. Note that, since version 1.3.1, it uses an inlined copy of tagstream-conduit with entity decoding bugfixes applied.

Simple usage example:

#!/usr/bin/env stack
{- stack --install-ghc --resolver lts-6.23 runghc
   --package http-conduit --package html-conduit
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text.IO        as T
import           Network.HTTP.Simple (httpSink)
import           Text.HTML.DOM       (sinkDoc)
import           Text.XML.Cursor     (attributeIs, content, element,
                                      fromDocument, ($//), (&/), (&//))

main :: IO ()
main = do
    doc <- httpSink "" $ const sinkDoc
    let cursor = fromDocument doc
    T.putStrLn "Chapters in the Yesod book:\n"
    mapM_ T.putStrLn
      $ cursor
      $// attributeIs "class" "main-listing"
      &// element "li"
      &/ element "a"
      &/ content



  • Fix a bug that was removing < symbols in script tags.


  • Inline tagstream-conduit for entity decoding in attribute value bug fix.


  • Upgrade to conduit 1.3

  • Remove an upper bound
  • Doc improvement

  • Allow xml-conduit 1.4


  • Add strict and lazy text parsing #66


  • Drop system-filepath

  • Fix a bug with double-unescaping of entities
comments powered byDisqus