BSD-3-Clause licensed by Harendra Kumar
Maintained by [email protected]
This version can be pinned in stack with:unicode-transforms-0.3.4@sha256:951629204ae02fd47ff52487edef459071afb76b66a4a5e56151e4e6d2f62c5b,5462

Module documentation for 0.3.4

  • Data
    • Data.ByteString
      • Data.ByteString.UTF8
        • Data.ByteString.UTF8.Normalize
    • Data.Text
      • Data.Text.Normalize
    • Data.Unicode
      • Data.Unicode.Types
Used by 1 package in nightly-2018-08-13(full list with versions):

Unicode Transforms

Fast Unicode 9.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).

What is normalization?

Unicode characters with adornments (e.g. Á) can be represented in two different forms, as a single composed character (U+00C1 = Á) or as multiple decomposed characters (U+0041(A) U+0301( ́ ) = Á). They are differently encoded byte sequences but for humans they have exactly the same visual appearance.

A regular byte comparison may tell that two strings are different even though they might be equivalent. We need to convert both the strings in a normalized form using the Unicode Character Database before we can compare them for equivalence. For example:

>> import Data.Text.Normalize
>> normalize NFC "\193" == normalize NFC "\65\769"
True

Contributing

Please use https://github.com/harendra-kumar/unicode-transforms to raise issues, or send pull requests.

Changes

0.3.4

  • GHC 8.4.1 support

0.3.3

  • GHC 8.2.1 support

0.3.2

  • Work around a GHC/LLVM issue for ARM

0.3.1

  • Update dependency versions

0.3.0

  • Support Unicode version 9.0

0.2.1

  • Improve speed and resource hog during compilation

0.2.0

  • Support Unicode version 8.0
  • Switch to pure Haskell implementation

0.1.0.1

  • Initial release based on utf8proc C implementation