scalpel-core
A high level web scraping library for Haskell.
https://github.com/fimad/scalpel
| LTS Haskell 24.28: | 0.6.2.2 |
| Stackage Nightly 2026-01-18: | 0.6.2.2 |
| Latest on Hackage: | 0.6.2.2 |
Apache-2.0 licensed by Will Coster
Maintained by [email protected]
This version can be pinned in stack with:
scalpel-core-0.6.2.2@sha256:703de024e3c2abc90d1e86dec583456d60d7f594904abab4f923715ac20e56bd,2436Module documentation for 0.6.2.2
- Text
- Text.HTML
- Text.HTML.Scalpel
- Text.HTML.Scalpel.Core
- Text.HTML.Scalpel
- Text.HTML
Depends on 13 packages(full list with versions):
Used by 1 package in nightly-2026-01-18(full list with versions):
Scalpel Core
Scalpel core provides a subset of the scalpel web scraping library that is intended to have lightweight dependencies and to be free of all non-Haskell dependencies.
Notably this package does not contain any networking support. Users who desire a
batteries include solution should depend on scalpel which does include
networking support instead of scalpel-core.
More thorough documentation including example code can be found in the documentation of the scalpel package.
Changes
Change Log
HEAD
0.6.2.2
- Fix build breakage on GHC 9.8 / mtl 2.3
0.6.2.1
- Match Content-Type case-insensitively.
0.6.2
- Add the monad transformer
ScraperT.
0.6.1
- Support GHC 8.8.
0.6.0
Breaking Changes
anySelectornow captures text nodes. This causes different results when used with a plural scraper (e.g.chroots). Usage with a singular scraper (e.g.chroot) should be unaffected.- The dependency on
curlhas been replaced withhttp-clientandhttp-client-tls. This has the following observable changes.scrapeURLWithOptsis removed.- The
Configtype used withscrapeURLWithConfigno longer contains a list of curl options. Instead it now takes aMaybe Managerfromhttp-client. - The
Decoderfunction type now takes in aResponsetype fromhttp-client. scrapeURLwill now throw an exception if there is a problem connecting to a URL.
Other Changes
- Remove
Ordconstraint from public APIs. - Add
atDepthoperator which allows for selecting nodes at a specified depth in relation to another node (#21). - Fix issue selecting malformed HTML where
"a" // "c"would not match<a><b><c></c></a></b>. - Add
textSelectorfor selecting text nodes. - Add
SerialScrapertype and associated primitives (#48).
0.5.1
- Fix bug (#59, #54) in DFS traversal order.
0.5.0
- Split
scalpelinto two packages:scalpelandscalpel-core. The latter does not provide networking support and does not depend on curl.
0.4.1
- Added
notPattribute predicate.
0.4.0
- Add the
chroottricks (#23 and #25) to README.md and added examples. - Fix backtracking that occurs when using
guardandchroot. - Fix bug where the same tag may appear in the result set multiple times.
- Performance optimizations when using the (//) operator.
- Make Scraper an instance of MonadFail. Practically this means that failed
pattern matches in
<-expressions within a do block will evaluate to mzero instead of throwing an error and bringing down the entire script. - Pluralized scrapers will now return the empty list instead mzero when there are no matches.
- Add the
positionscraper which provides the index of the current sub-tree within the context of achroots’s do-block.
0.3.1
- Added the
innerHTMLandinnerHTMLsscraper. - Added the
matchfunction which allows for the creation of arbitrary attribute predicates. - Fixed build breakage with GHC 8.0.1.
0.3.0.1
- Make tag and attribute matching case-insensitive.
0.3.0
- Added benchmarks and many optimizations.
- The
selectmethod is removed from the public API. - Many methods now have a constraint that the string type parametrizing TagSoup’s tag type now must be order-able.
- Added
scrapeUrlWithConfigthat will hopefully put an end to multiplyingscrapeUrlWith*methods. - The default behaviour of the
scrapeUrl*methods is to attempt to infer the character encoding from theContent-Typeheader.
0.2.1.1
- Cleanup stale instance references in documentation of TagName and AttributeName.
0.2.1
- Made Scraper an instance of MonadPlus.
0.2.0.1
- Fixed examples in documentation and added an examples folder for ready to compile examples. Added travis tests to ensures that examples remain compilable.
0.2.0
- Removed the StringLike parameter from the Selector, Selectable, AttributePredicate, AttributeName, and TagName types. Instead they are now agnostic to the underlying string type, and are only constructable with Strings and the Any type.
0.1.3.1
- Tighten dependencies and drop download-curl all together.
0.1.3
- Add the html and html scraper primitives for extracting raw HTML.
0.1.2
- Make scrapeURL follow redirects by default.
- Expose a new function scrapeURLWithOpts that takes a list of curl options.
- Fix bug (#2) where image tags that do not have a trailing “/” are not selectable.
0.1.1
- Tighten dependencies on download-curl.
0.1.0
- First version!