string-interpolate

Haskell string/text/bytestring interpolation that just works

https://gitlab.com/williamyaoh/string-interpolate/blob/master/README.md

Version on this page:0.3.1.0
LTS Haskell 22.14:0.3.3.0
Stackage Nightly 2024-03-28:0.3.3.0
Latest on Hackage:0.3.3.0

See all snapshots string-interpolate appears in

BSD-3-Clause licensed by William Yao
Maintained by [email protected]
This version can be pinned in stack with:string-interpolate-0.3.1.0@sha256:7641c72fe2e9be59b5bdebf5d97fa91305bf63f46c4ff948b58724c103aa4f75,4122

string-interpolate pipeline status hackage version license

Haskell having 5 different textual types in common use (String, strict and lazy Text, strict and lazy ByteString) means that doing any kind of string manipulation becomes a complicated game of type tetris with constant conversion back and forth. What if string handling was as simple and easy as it is in literally any other language?

Behold:

showWelcomeMessage :: Text -> Integer -> Text
showWelcomeMessage username visits =
  [i|Welcome to my website, #{username}! You are visitor #{visits}!|]

No more needing to mconcat, mappend, and (<>) to glue strings together. No more having to remember a gajillion different functions for converting between strict and lazy versions of Text, or having to worry about encoding between Text <=> ByteString. No more getting bitten by trying to work with Unicode ByteStrings. It just works!

string-interpolate provides a quasiquoter, i, that allows you to interpolate expressions directly into your string. It can produce anything that is an instance of IsString, and can interpolate anything which is an instance of Show.

In addition to the main quasiquoter i, there are two additional quasiquoters for handling multiline strings. If you need to remove extra whitespace and collapse into a single line, use iii. If you need to remove extra indentation but keep linebreaks, use __i.

If you need even more specific functionality in how you handle whitespace, there are variants of __i and iii with different behavior for surrounding newlines. These are suffixed by either 'E or 'L depending on what behavior you need. For instance, __i'E will remove extra indentation from its body, but will leave any surrounding newlines intact. iii'L will collapse its body into a single line, and collapse any surrounding newlines at the beginning/end into a single newline.

Unicode handling

string-interpolate handles converting to/from Unicode when converting String/Text to ByteString and vice versa. Lots of libraries use ByteString to represent human-readable text, even though this is not safe. There are lots of useful libraries in the ecosystem that are unfortunately annoying to work with because of the need to generate ByteStrings containing application-specific info. Insisting on explicitly converting to/from UTF-8 in these cases and handling decoding failures adds lots of syntactic noise, when often you can reasonably assume that a given ByteString will, 95% of the time, contain Unicode text. So string-interpolate aims to provide reasonable defaults around conversion between ByteString and real textual types so that developers don’t need to constantly be aware of text encodings.

When converting a String/Text to a ByteString, string-interpolate will automatically encode it as a sequence of UTF-8 bytes. When converting a ByteString to String/Text, string-interpolate will assume that the ByteString contains a UTF-8 string, and convert the characters accordingly. Any invalid characters in the ByteString will be converted to the Unicode replacement character � (U+FFFD).

Remember: string-interpolate is not designed for 100% correctness around text encodings, just for convenience in the most common case. If you absolutely need to be aware of text encodings and to handle decode failures, take a look at text-conversions.

Usage

First things first: add string-interpolate to your dependencies:

dependencies:
  - string-interpolate

and import the quasiquoter and enable -XQuasiQuotes:

{-# LANGUAGE QuasiQuotes #-}

import Data.String.Interpolate ( i )

Wrap anything you want to be interpolated with #{}:

λ> name = "William"
λ> [i|Hello, #{name}!|] :: String
>>> "Hello, William!"

You can interpolate in anything which implements Show:

λ> import Data.Time
λ> now <- getCurrentTime
λ> [i|The current time is #{now}.|] :: String
>>> "The current time is 2019-03-10 18:58:40.573892546 UTC."

…and interpolate into anything which implements IsString.

string-interpolate must know what concrete type it’s producing; it cannot be used to generate a IsString a => a. If you’re using string-interpolate from GHCi, make sure to add type signatures to toplevel usages!

string-interpolate also needs to know what concrete type it’s interpolating. For instance, the following code won’t work:

showIt :: Show a => a -> String
showIt it = [i|The value: #{it}|]

You would need to convert it to a String using show first.

Strings and characters are always interpolated without surrounding quotes.

λ> verb = 'c'
λ> noun = "sea"
λ> [i|We went to go #{verb} the #{noun}.|] :: String
>>> "We went to go c the sea."

You can interpolate arbitrary expressions:

λ> [i|Tomorrow's date is #{addDays 1 $ utctDay now}.|] :: String
>>> "Tomorrow's date is 2019-03-11."

string-interpolate, by default, handles multiline strings by copying the newline verbatim into the output.

λ> :{
 | [i|
 |   a
 |   b
 |   c
 | |] :: String
 | :}
>>> "\n  a\n  b\n  c\n"

Another quasiquoter, iii, is provided that handles multiline strings/whitespace in a different way, by collapsing any whitespace into a single space. The intention is to use it when you want to split something across multiple lines in source for readability but want it emitted like a normal sentence. iii is otherwise identical to i, with the ability to interpolate arbitrary values.

λ> :{
 | [iii|
 |   Lorum
 |   ipsum
 |   dolor
 |   sit
 |   amet.
 | |] :: String
 | :}
>>> "Lorum ipsum dolor sit amet."

One last quasiquoter, __i, is provided that handles removing indentation without removing line breaks, perhaps if you need to output code samples or error messages. Again, __i is otherwise identical to i, with the ability to interpolate arbitrary values.

λ> :{
 | [__i|
 |   id :: a -> a
 |   id x = y
 |     where y = x
 | |] :: String
 | :}
>>> "id :: a -> a\nid x = y\n  where y = x"

The intended mnemonics for remembering what iii and __i do:

  • iii: Look at the i’s as individual lines which have been collapsed into a single line
  • __i: Look at the i as being indented

In addition, there are variants of iii and __i, desginated by a letter suffix. For instance, __i'L will reduce indentation, while collapsing any surrounding newlines into a single newline.

λ> :{
 | [__i'L|
 |
 |   id :: a -> a
 |   id x = y
 |     where y = x
 |
 | |] :: String
 | :}
>>> "\nid :: a -> a\nid x = y\n  where y = x\n"

Currently there are two variant suffixes, 'E and 'L

  • 'E: Leave any surrounding newlines intact. To remember what this does, look visually at the capital E; the multiple horizontal lines suggests multiple newlines.
  • 'L: Collapse any surrounding newlines into a single newline. To remember what this does, look visually at the capital L; the single horizontal line suggests a single newline.

Check the Haddock documentation for all the available variants.

Backslashes are handled exactly the same way they are in normal Haskell strings. If you need to put a literal #{ into your string, prefix the pound symbol with a backslash:

λ> [i|\#{ some inner text }#|] :: String
>>> "#{ some inner text }#"

Comparison to other interpolation libraries

Some other interpolation libraries available:

Of these, Text.Printf isn’t exception-safe, and neat-interpolation can only produce strict Text values. interpolate, formatting, Interpolation, and interpolatedstring-perl6 provide different solutions to the problem of providing a general way of interpolating any value, into any kind of text.

Features

string-interpolate interpolate formatting Interpolation interpolatedstring-perl6 neat-interpolation
String/Text support ⚠️ ⚠️
ByteString support ⚠️
Can interpolate arbitrary Show instances
Unicode-aware ⚠️ ⚠️
Multiline strings
Indentation handling
Whitespace/newline chomping

⚠ Since formatting doesn’t support ByteStrings, it technically supports Unicode.

Interpolation supports all five textual formats, but doesn’t allow you to mix and match; that is, you can’t interpolate a String into an output string of type Text, and vice versa.

neat-interpolation only supports strict Text. Because of that, it technically supports Unicode.

Performance

Overall: string-interpolate is competitive with the fastest interpolation libraries, only getting outperformed on ByteStrings by Interpolation and interpolatedstring-perl6, and on large, strict Text specifically by formatting.

We run three benchmarks: small string interpolation (<100 chars) with a single interpolation parameter; small strings with multiple interpolation parameters, and large string (~100KB) interpolation. Each of these benchmarks is then run against String, both Text types, and both ByteString types. Numbers are runtime in relation to string-interpolate; smaller is better.

string-interpolate formatting Interpolation interpolatedstring-perl6 neat-interpolation interpolate
small String 1x 2.8x 1x 1x 1x
multi interp, String 1x 4.3x 1x 1x 7.9x
small Text 1x 4.3x 1.8x 1.9x 5.8x 61x
multi interp, Text 1x 3.5x 5.3x 5.3x 3.3x 29x
large Text 1x 0.6x 11x 11x 22x 10,000x
small lazy Text 1x 6.1x 14.5x 14.5x 93x
multi interp, lazy Text 1x 3.7x 5.8x 6x 34x
large lazy Text 1x 3.9x 22,000x 22,000x 3,500,000x
small ByteString 1x 1x 1x 47x
multi interp, ByteString 1x 0.7x 0.7x 17x
large ByteString 1x 1x 1x 31,000x
small lazy ByteString 1x 1x 1x 85x
multi interp, lazy ByteString 1x 0.4x 0.4x 19x
large lazy ByteString 1x 0.8x 0.8x 1,300,000x

(We don’t bother running tests on large Strings, because no one is working with data that large using String anyways.)

In particular, notice that Interpolation and interpolatedstring-perl6 blow up on both Text types; string-interpolate and formatting have consistent performance across all benchmarks, with string-interpolation leading the pack in Text cases.

All results were tested on an AWS EC2 t2.medium, with GHC 8.6.5. If you’d like to replicate the results, the benchmarks are located in bench/, and can be run with cabal v2-run string-interpolate-bench -O2 -fextended-benchmarks.

Larger Text and ByteString

By default, string-interpolate is performance tuned for outputting smaller strings. If you find yourself regularly needing extremely large outputs, however, you can change the way output strings are constructed to optimize accordingly. Enable either the text-builder or bytestring-builder Cabal flag, depending on your need, and you should see speedups constructing large strings, at the cost of slowing down smaller outputs.

Changes

CHANGELOG

Unreleased

v0.3.1.0 (2021-02-12)

  • Added variant interpolators, __i'E, __i'L, iii'E, iii'L, that handle surrounding newlines in a different way based on suffix.

v0.3.0.2 (2020-10-01)

  • Removed upper bounds on base16-bytestring. Safe to do so now that text-converions has added upper bounds. This change should not affect any users of string-interpolate.

v0.3.0.1 (2020-09-19)

  • Downgraded version of base16-bytestring to avoid breaking API changes in new versions. (We would just upgrade the version ourselves, but it’s one of our dependencies text-conversions that’s not building due to the API changes, not us.)

v0.3.0.0 (2020-06-30)

  • Changed the behavior of iii to only collapse statically-available whitespace, which also removes the performance penalty of using iii.
  • Removed noise on compile errors if quasiquoters are misused
  • Changed behavior of backslash escapes to error on unknown escape characters

v0.2.1.0 (2020-05-04)

  • Added __i interpolator for stripping indentation in multiline strings
  • Added benchmarks for lazy Text and lazy ByteString
  • Changed default behavior for Text and ByteString to use the actual types themselves as intermediate objects rather than construct Builders. This should give significant speedups in the common case of interpolating smaller outputs. Old behavior can be reenabled using Cabal flags -ftext-builder and -fbytestring-builder
  • Gated benchmarks for Interpolation and interpolatedstring-perl6 behind a Cabal flag so that we can still be in Stackage without needing to remove these dependencies

v0.2.0.3 (2020-04-26)

  • Commented out interpolatedstring-perl6 benchmarks, since that library does not build on GHC 8.8.2

v0.2.0.2 (2020-04-25)

  • Updated interpolation parser to use enabled language extensions (Thanks Cary Robbins!)

v0.2.0.1 (2020-04-09)

  • Fixed bug caused when escaping a backslash right before an interpolation (Thanks Vladimir Stepchenko!)
  • Fixed behavior of iii (Thanks Vladimir Stepchenko!)

v0.2.0.0 (2019-12-16)

  • Added iii interpolator for collapsing whitespace/newlines into single spaces
  • Added feature comparison to/benchmark with neat-interpolation
  • Just generally make the documentation better
  • Add homepage info to cabal file/Haddock documentation

v0.1.0.1 (2019-05-06)

  • Remove Interpolation from the default benchmarks because it’s not on Stackage

v0.1.0.0 (2019-03-17)

  • Add support for using Text and ByteString Builders as both sinks and sources
  • Add support for interpolating Chars without the surrounding quotes

v0.0.1.0 (2019-03-10)

  • Initial release