BSD-3-Clause licensed by michaelt
This version can be pinned in stack with:streaming-bytestring-0.2.1@sha256:3b85f79fa0ab92c23d81b4117911572881583478c48304bec37e37c2f6dad075,3011

streaming-bytestring

Build Build Status Hackage

This library enables fast and safe streaming of byte data, in either Word8 or Char form. It is a core addition to the streaming ecosystem and avoids the usual pitfalls of combinbing lazy ByteStrings with lazy IO.

This library is used by streaming-attoparsec to enable vanilla Attoparsec parsers to work with streaming “for free”.

Usage

Importing and Types

Modules from this library are intended to be imported qualified. To avoid conflicts with both the bytestring library and streaming, we recommended Q as the qualified name:

import qualified Streaming.ByteString.Char8 as Q

Like the bytestring library, leaving off the Char8 will expose an API based on Word8. Following the philosophy of streaming that “the best API is the one you already know”, these APIs are based closely on bytestring. The core type is ByteStream m r, where:

  • m: The Monad used to fetch further chunks from the “source”, usually IO.
  • r: The final return value after all streaming has concluded, usually () as in streaming.

You can imagine this type to represent an infinitely-sized collection of bytes, although internally it references a strict ByteString no larger than 32kb, followed by monadic instructions to fetch further chunks.

Examples

File Input

To open a file of any size and count its characters:

import Control.Monad.Trans.Resource (runResourceT)
import qualified Streaming.Streaming.Char8 as Q

-- | Represents a potentially-infinite stream of `Char`.
chars :: ByteStream IO ()
chars = Q.readFile "huge-file.txt"

main :: IO ()
main = runResourceT (Q.length_ chars) >>= print

Note that file IO specifically requires the resourcet library.

Line splitting and Stream interop

In the example above you may have noticed a lack of Of that we usually see with Stream. Our old friend lines hints at this too:

lines :: Monad m => ByteStream m r -> Stream (ByteStream m) m r

A stream-of-streams, yet no Of here either. The return type can’t naively be Stream (Of ByteString) m r, since the first line break might be at the very end of a large file. Forcing that into a single strict ByteString would crash your program.

To count the number of lines whose first letter is i:

countOfI :: IO Int
countOfI = runResourceT
  . S.length_                   -- IO Int
  . S.filter (== 'i')           -- Stream (Of Char) IO ()
  . S.concat                    -- Stream (Of Char) IO ()
  . S.mapped Q.head             -- Stream (Of (Maybe Char)) IO ()
  . Q.lines                     -- Stream (ByteStream IO) IO ()
  $ Q.readFile "huge-file.txt"  -- ByteStream IO ()

Critically, there are several functions which when combined with mapped can bring us back into Of-land:

head     :: Monad m => ByteStream m r -> m (Of (Maybe Char) r)
last     :: Monad m => ByteStream m r -> m (Of (Maybe Char) r)
null     :: Monad m => ByteStream m r -> m (Of Bool) r)
count    :: Monad m => ByteStream m r -> m (Of Int) r)
toLazy   :: Monad m => ByteStream m r -> m (Of ByteString r) -- Be careful with this.
toStrict :: Monad m => ByteStream m r -> m (Of ByteString r) -- Be even *more* careful with this.

When moving in the opposite direction API-wise, consider:

fromChunks :: Stream (Of ByteString) m r -> ByteStream m r

Changes

0.2.1 (2021-06-23)

Changed

  • Performance improvement when using GHC 9.

0.2.0 (2020-10-26)

Note: The deprecations added in 0.1.7 have not been removed in this version. Instead of 0.1.7, that release should have been 0.2 in the first place.

Added

  • Add missing exports of zipWithStream, materialize, and dematerialize.

Changed

  • Breaking: Switched names of fold and fold_ in the non-Char8 modules. The corresponding Char8 functions and the rest of the library uses _ for the variant that forgets the r value.
  • Breaking: Unified nextByte/nextChar with uncons. The old uncons returned Maybe instead of the more natural Either r.
  • Breaking: Similarly, unconsChunk and nextChunk have been unified.
  • nextByte, nextChar, and nextChunk have been deprecated.
  • Relaxed signature of toStrict_ to allow any r, not just ().
  • Permance improvements for packChars and denull.
  • Various documentation improvements.
  • Improved performance of w8IsSpace to more quickly filter out non-whitespace characters, and updated words to use it instead of the internal function isSpaceWord8 from the bytestring package. See also bytestring#315.

Fixed

  • An edge case involving overflow in readInt.
  • A potential crash in uncons.
  • intersperse now ignores any initial empty chunks.
  • intercalate now does not insert anything between the final substream and the outer stream end.
  • unlines now correctly handles Chunk "" (Empty r) and Empty r.

0.1.7 (2020-10-14)

Thanks to Viktor Dukhovni and Colin Woodbury for their contributions to this release.

Added

  • The skipSomeWS function for efficiently skipping leading whitespace of both ASCII and non-ASCII.

Changed

  • The ByteString type has been renamed to ByteStream. This fixes a well-reported confusion from users. An alias to the old name has been provided for back-compatibility, but is deprecated and be removed in the next major release.
  • Modules have been renamed to match the precedent set by the main streaming library. Aliases to the old names have been provided, but will be removed in the next major release.
    • Data.ByteString.Streaming -> Streaming.ByteString
    • Data.ByteString.Streaming.Char8 -> Streaming.ByteString.Char8
  • An order-of-magnitude performance improvement in line splitting. #18
  • Performance and correctness improvements for the readInt function. #31
  • Documentation improved, and docstring coverage is now 100%. #27

Fixed

  • An incorrect comment about Handles being automatically closed upon EOF with hGetContents and hGetContentsN. #9
  • A crash in group and groupBy when reading too many bytes. #22
  • groupBy incorrectly ordering its output elements. #4

0.1.6

  • Semigroup instance for ByteString m r added
  • New function lineSplit

0.1.5

  • Update for streaming-0.2