html-conduit

Parse HTML documents using xml-conduit datatypes.

Version on this page:	1.2.1.2
LTS Haskell 24.28:	1.3.2.2
Stackage Nightly 2026-01-17:	1.3.2.2
Latest on Hackage:	1.3.2.2

See all snapshots html-conduit appears in

MIT licensed by Michael Snoyman

Maintained by [email protected]

This version can be pinned in stack with:html-conduit-1.2.1.2@sha256:4b25a32f85f4e976750e3697c55a7d0eaef1caa5776bf9db8a6a6c410b9f6592,1892

Module documentation for 1.2.1.2

Text
- Text.HTML
  - Text.HTML.DOM

Depends on 11 packages(full list with versions):

base, bytestring, conduit, conduit-extra, containers, resourcet, tagstream-conduit, text, transformers, xml-conduit, xml-types

Used by 3 packages in nightly-2018-01-03(full list with versions):

stackage-curator, xml-html-qq, yesod-test

This package uses tagstream-conduit for its parser. It automatically balances mismatched tags, so that there shouldn’t be any parse failures. It does not handle a full HTML document rendering, such as adding missing html and head tags.

Simple usage example:

#!/usr/bin/env stack
{- stack --install-ghc --resolver lts-6.23 runghc
   --package http-conduit --package html-conduit
-}
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text.IO        as T
import           Network.HTTP.Simple (httpSink)
import           Text.HTML.DOM       (sinkDoc)
import           Text.XML.Cursor     (attributeIs, content, element,
                                      fromDocument, ($//), (&/), (&//))

main :: IO ()
main = do
    doc <- httpSink "http://www.yesodweb.com/book" $ const sinkDoc
    let cursor = fromDocument doc
    T.putStrLn "Chapters in the Yesod book:\n"
    mapM_ T.putStrLn
      $ cursor
      $// attributeIs "class" "main-listing"
      &// element "li"
      &/ element "a"
      &/ content

Changes

1.2.1.2

Remove an upper bound
Doc improvement

1.2.1.1

Allow xml-conduit 1.4

1.2.1

Add strict and lazy text parsing #66

1.2.0

Drop system-filepath

1.1.1.2

Fix a bug with double-unescaping of entities