Fast Unicode 9.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).
What is normalization?
Unicode characters with adornments (e.g. Á) can be represented in two different forms, as a single composed character (U+00C1 = Á) or as multiple decomposed characters (U+0041(A) U+0301( ́ ) = Á). They are differently encoded byte sequences but for humans they have exactly the same visual appearance.
A regular byte comparison may tell that two strings are different even though
they might be equivalent. We need to convert both the strings in a
normalized form using the Unicode
Character Database before we can
compare them for equivalence. For example:
>> import Data.Text.Normalize >> normalize NFC "\193" == normalize NFC "\65\769" True
Please use https://github.com/harendra-kumar/unicode-transforms to raise issues, or send pull requests.
- GHC 8.2.1 support
- Work around a GHC/LLVM issue for ARM
- Update dependency versions
- Support Unicode version 9.0
- Improve speed and resource hog during compilation
- Support Unicode version 8.0
- Switch to pure Haskell implementation
- Initial release based on utf8proc C implementation