unicode-transforms

Unicode normalization

http://github.com/harendra-kumar/unicode-transforms

Version on this page:	0.3.3
LTS Haskell 23.21:	0.4.0.1@rev:7
Stackage Nightly 2025-05-07:	0.4.0.1@rev:7
Latest on Hackage:	0.4.0.1@rev:7

See all snapshots unicode-transforms appears in

BSD-3-Clause licensed by Harendra Kumar

Maintained by [email protected]

This version can be pinned in stack with:unicode-transforms-0.3.3@sha256:e76f7027dfbbbf9f8658bd5b545249fbaa5e01e18675152b07617c0702759561,5709

Module documentation for 0.3.3

Data
- Data.ByteString
  - Data.ByteString.UTF8
    - Data.ByteString.UTF8.Normalize
- Data.Text
  - Data.Text.Normalize
- Data.Unicode
  - Data.Unicode.Types

Depends on 4 packages(full list with versions):

base, bitarray, bytestring, text

Unicode Transforms

Fast Unicode 9.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).

What is normalization?

Unicode characters with adornments (e.g. Á) can be represented in two different forms, as a single composed character (U+00C1 = Á) or as multiple decomposed characters (U+0041(A) U+0301( ́ ) = Á). They are differently encoded byte sequences but for humans they have exactly the same visual appearance.

A regular byte comparison may tell that two strings are different even though they might be equivalent. We need to convert both the strings in a normalized form using the Unicode Character Database before we can compare them for equivalence. For example:

>> import Data.Text.Normalize
>> normalize NFC "\193" == normalize NFC "\65\769"
True

Contributing

Please use https://github.com/harendra-kumar/unicode-transforms to raise issues, or send pull requests.

Changes

0.3.3

GHC 8.2.1 support

0.3.2

Work around a GHC/LLVM issue for ARM

0.3.1

Update dependency versions

0.3.0

Support Unicode version 9.0

0.2.1

Improve speed and resource hog during compilation

0.2.0

Support Unicode version 8.0
Switch to pure Haskell implementation

0.1.0.1

Initial release based on utf8proc C implementation