pcre2

Regular expressions via the PCRE2 C library (included)

https://github.com/sjshuck/hs-pcre2#readme

Version on this page:2.1.1.1
LTS Haskell 20.2:2.1.1.1
Stackage Nightly 2022-11-28:2.2.1
Latest on Hackage:2.2.1

See all snapshots pcre2 appears in

Apache-2.0 licensed by Shlomo Shuck and contributors
This version can be pinned in stack with:pcre2-2.1.1.1@sha256:bb9a011e3c10d81112174838ce756eedbf56bb7426c44239ec680868aa2766c5,6353

Module documentation for 2.1.1.1

pcre2

CI Hackage

Regular expressions for Haskell.

Teasers

licensePlate :: Text -> Maybe Text
licensePlate = match "[A-Z]{3}[0-9]{3,4}"

licensePlates :: Text -> [Text]
licensePlates = match "[A-Z]{3}[0-9]{3,4}"
case "The quick brown fox" of
    [regex|\bbrown\s+(?<animal>[A-z]+)\b|] -> Text.putStrLn animal
    _                                      -> error "nothing brown"
let kv'd = lined . packed . [_regex|(?x)  # Extended PCRE2 syntax
        ^\s*          # Ignore leading whitespace
        ([^=:\s].*?)  # Capture the non-empty key
        \s*           # Ignore trailing whitespace
        [=:]          # Separator
        \s*           # Ignore leading whitespace
        (.*?)         # Capture the possibly-empty value
        \s*$          # Ignore trailing whitespace
    |]

forMOf kv'd file $ execStateT $ do
    k <- gets $ capture @1
    v <- gets $ capture @2
    liftIO $ Text.putStrLn $ "found " <> k <> " set to " <> v

    case myMap ^. at k of
        Just v' | v /= v' -> do
            liftIO $ Text.putStrLn $ "setting " <> k <> " to " <> v'
            _capture @2 .= v'
        _ -> liftIO $ Text.putStrLn "no change"

Features

  • No opaque “Regex” object. Instead, quiet functions with simple types—for the most part it’s Text (pattern) -> Text (subject) -> result. Use partial application to create performant, compile-once-match-many code.
  • No custom typeclasses.
  • A single datatype for both compile and match options, the Option monoid.
  • Text everywhere.
  • Match success expressed via Alternative.
  • Opt-in Template Haskell facilities for compile-time verification of patterns, indexing captures, and memoizing inline regexes.
  • Opt-in lens support.
  • No failure monads to express compile errors, preferring pure functions and throwing imprecise exceptions with pretty Show instances. Write simple code and debug it. Or, don’t, and use the Template Haskell features instead. Both are first-class.
  • Vast presentation of PCRE2 functionality. We can even register Haskell callbacks to run during matching!
  • Zero-copying of substrings where beneficial. Benchmarks show a 10× speedup over pcre-light, and 20× over regex-pcre, for longer captures.
  • Few dependencies.
  • Bundled, statically-linked UTF-16 build of up-to-date PCRE2 (version 10.40), with a complete, exposed Haskell binding.

Wishlist

  • Many performance optimizations. Currently we are as much as 2–3× slower than other libraries for some operations, although things are improving. (We are already faster than regex-base/regex-pcre when working with Text, even without zero-copying.) If it’s really regex processing that’s causing a bottleneck, pcre-light/-heavy/lens-regex-pcre are recommended instead of this library for the very best performance.
  • Make use of DFA matching and JIT compilation.
  • Improve PCRE2 C compile time.
  • Add splitting support.

License

Apache 2.0.
PCRE2 is distributed under the 3-clause BSD license.

Main Author

©2020–2022 Shlomo Shuck

Changes

Changelog and Acknowledgements

2.1.1.1

  • Updated library, tests, and docs for mtl 2.3 and microlens-platform 0.4.3.0. The mtl part of this is pursuant to #30.

2.1.1

  • Added pattern serialization API, which fixes #23.
  • Updated PCRE2 to 10.40 (no API changes).

2.1.0.1

  • Explicitly required text < 2.
  • Minor docs adjustments.

2.1.0

  • Replaced Proxy :: Proxy info with type applications in splices from regex/_regex. This significantly shortens the splices, producing nicer error messages. As a very minor consequence, we now require the user to turn on {-# LANGUAGE TypeApplications #-} when using regex/_regex with patterns with parenthesized captures, even when not using capture/_capture.

2.0.5

  • Enabled PCRE2’s built-in Unicode support, which fixes #21.

2.0.4

  • Added Show instance for Captures to ease debugging user code.

2.0.3

2.0.2

  • Fixed a minor issue where the caret indicating pattern location of a Pcre2CompileException was misplaced if the pattern contained a newline.

2.0.1

  • Added microlens as a dependency to improve Haddock docs (Traversal' et al. are clickable) and relieve maintenance burden somewhat.
  • Moderate refactoring of internals.

2.0.0

This release introduces significant breaking changes in order to make the API smaller, more consistent, and safer.

  • Implemented #18:
    • Removed matchAll, matchAllOpt, capturesAll, and capturesAllOpt.
    • Upgraded match, matchOpt, captures, and capturesOpt to offer their functionality, respectively.
    • Renamed capturesA and capturesAOpt to captures and capturesOpt, replacing the latter two functions altogether. captures/-Opt were intended to be extreme convenience functions that required no special datatypes beyond the Prelude. However, this was of doubtful benefit, since that’s false anyway—they required Text, not to mention {-# LANGUAGE OverloadedStrings #-}. Their names are simple and valuable, and no other Alternative-producing function has the naming convention “-A”, so repurposing their names was in order.
  • Moved the callout interface to a new module, Text.Regex.Pcre2.Unsafe. This includes the options UnsafeCompileRecGuard, UnsafeCallout, UnsafeSubCallout, and AutoCallout, and the types CalloutInfo, CalloutIndex, CalloutResult, SubCalloutInfo, and SubCalloutResult.
  • Also moved option BadEscapeIsLiteral there.
  • Removed the ineffectual options DupNames and Utf.

Other improvements with no API impact:

  • Updated PCRE2 to 10.37.
  • Replaced copied C files with symlinks, diminishing codebase by 1.5K lines and simplifying future PCRE2 updates.
  • Reduced size of Template Haskell splices to make error messages less obnoxious.
  • Moderate refactoring of internals and documentation.

1.1.5

  • Fixed #17, where functions returning Alternative containers were not restricted to single results despite their documentation.
  • Minor improvements to docs and examples.

1.1.4

  • Fixed some incorrect foreign imports’ safety.

1.1.3.1

  • Fixed a very minor issue where pcreVersion still reported “10.35” even though it in fact was using 10.36.

1.1.3

  • Made in-house streaming abstraction based on streaming and removed the latter as a dependency.
  • Updated PCRE2 to 10.36 (no API changes).
  • Docs fixes.

1.1.2

  • Refactored using the streaming library. Fixed #11, where large global matches were very slow.

1.1.1

  • Fixed #12, where some functions returned too many match results.

1.1.0

  • Added global matching.
    • New functions matchAll, matchAllOpt, capturesAll, capturesAllOpt.
    • Changed all traversals from affine to non-affine.
  • Changed capturesOptA to capturesAOpt for naming consistency.

1.0.2

  • Fixed #4, where multiple named captures were not type-indexed correctly.
  • Established automated builds using Github Workflows. Thanks amesgen!

1.0.1.1

  • Temporarily eliminate all dependency version bounds to get it building on Hackage.

1.0.1

  • Fixed #1, where building on Windows would succeed but not run. Thanks Andrew!
  • Try to adjust dependency version bounds to get it building on Hackage. Thanks snoyberg!

1.0.0

  • Initial release.