Module documentation for 18.104.22.168
Text: Fast, packed Unicode strings, using stream fusion
This package provides the Data.Text library, a library for the space- and time-efficient manipulation of Unicode text in Haskell.
Normalization, conversion, and collation, oh my!
This library intentionally provides a simple API based on the Haskell prelude’s list manipulation functions. For more complicated real-world tasks, such as Unicode normalization, conversion to and from a larger variety of encodings, and collation, use the text-icu package.
That library uses the well-respected and liberally licensed ICU library to provide these facilities.
Please report bugs via the github issue tracker.
Master git repository:
git clone git://github.com/bos/text.git
There’s also a Mercurial mirror:
hg clone https://bitbucket.org/bos/text
(You can create and contribute changes using either Mercurial or git.)
The base code for this library was originally written by Tom Harper, based on the stream fusion framework developed by Roman Leshchinskiy, Duncan Coutts, and Don Stewart.
The core library was fleshed out, debugged, and tested by Bryan O’Sullivan [email protected], and he is the current maintainer.
toTitlefunction now correctly handles letters that immediately follow punctuation. Before,
"there's"would turn into
"There'S". Now, it becomes
The implementation of unstreaming is faster, resulting in operations such as
interspersespeeding up by up to 30%, with smaller code generated.
The optimised length comparison function is now more likely to be used after some rewrite rule tweaking.
Bug fix: an off-by-one bug in
Bug fix: a logic error in
The switch to
integer-purein 22.214.171.124 was apparently mistaken. The build flag has been renamed accordingly. Your army of diligent maintainers apologizes for the churn.
toCaseFoldnow follows the Unicode 8.0 spec (updated from 7.0)
An STG lint error has been fixed
integer-simplepackage, upon which this package optionally depended, has been replaced with
integer-pure. The build flag has been renamed accordingly.
Bug fix: For the
Binaryinstance, If UTF-8 decoding fails during a
get, the error is propagated via
failinstead of an uncatchable crash.
New instances for the
- Bug fix: As it turns out, moving the literal rewrite rules to simplifier
phase 2 does not prevent competition with the
unpackrule, which is also active in this phase. Unfortunately this was hidden due to a silly test environment mistake. Moving literal rules back to phase 1 finally fixes GHC Trac #10528 correctly.
- Bug fix: Run literal rewrite rules in simplifier phase 2. The behavior of the simplifier changed in GHC 7.10.2, causing these rules to fail to fire, leading to poor code generation and long compilation times. See GHC Trac #10528.
- Expose unpackCString#, which you should never use.
- Added Binary instances for both Text types. (If you have previously been using the text-binary package to get a Binary instance, it is now obsolete.)
- Fixed a space leak in UTF-8 decoding
Feature parity: repeat, cycle, iterate are now implemented for lazy Text, and the Data instance is more complete
Build speed: an inliner space explosion has been fixed with toCaseFold
Bug fix: encoding Int to a Builder would infinite-loop if the integer-simple package was used
Deprecation: OnEncodeError and EncodeError are deprecated, as they are never used
Internals: some types that are used internally in fusion-related functions have moved around, been renamed, or been deleted (we don’t bump the major version if .Internal modules change)
Spec compliance: toCaseFold now follows the Unicode 7.0 spec (updated from 6.3)
- Fixed an incompatibility with base < 4.5
- Update formatRealFloat to correspond to the definition in versions of base newer than 4.5 (https://github.com/bos/text/issues/105)
- Bumped lower bound on deepseq to 1.4 for compatibility with the upcoming GHC 7.10
- Fixed a buffer overflow in rendering of large Integers (https://github.com/bos/text/issues/99)
Fixed an integer overflow in the replace function (https://github.com/bos/text/issues/81)
Fixed a hang in lazy decodeUtf8With (https://github.com/bos/text/issues/87)
Reduced codegen bloat caused by use of empty and single-character literals
Added an instance of IsList for GHC 7.8 and above
The Data.Data instance now allows gunfold to work, via a virtual pack constructor
dropEnd, takeEnd: new functions
Comparing the length of a Text against a number can now short-circuit in more cases
- streamDecodeUtf8: fixed gh-70, did not return all unconsumed bytes in single-byte chunks
encodeUtf8: Performance is improved by up to 4x.
encodeUtf8Builder, encodeUtf8BuilderEscaped: new functions, available only if bytestring >= 0.10.4.0 is installed, that allow very fast and flexible encoding of a Text value to a bytestring Builder.
As an example of the performance gain to be had, the encodeUtf8BuilderEscaped function helps to double the speed of JSON encoding in the latest version of aeson! (Note: if all you need is a plain ByteString, encodeUtf8 is still the faster way to go.)
All of the internal module hierarchy is now publicly exposed. If a module is in the .Internal hierarchy, or is documented as internal, use at your own risk - there are no API stability guarantees for internal modules!
- decodeUtf8: Fixed a regression that caused us to incorrectly identify truncated UTF-8 as valid (gh-61)
Added support for Unicode 6.3.0 to case conversion functions
New function toTitle converts words in a string to title case
New functions peekCStringLen and withCStringLen simplify interoperability with C functionns
Added support for decoding UTF-8 in stream-friendly fashion
Fixed a bug in mapAccumL
Added trusted Haskell support
Removed support for GHC 6.10 (released in 2008) and older