streaming-bytestring
Fast, effectful byte streams.
https://github.com/haskell-streaming/streaming-bytestring
LTS Haskell 22.33: | 0.3.2 |
Stackage Nightly 2024-09-08: | 0.3.2 |
Latest on Hackage: | 0.3.2 |
streaming-bytestring-0.3.2@sha256:176d93d14d71dba3e2a5cc746a92e240c77472068845977e5896ed6fcd1af702,3070
Module documentation for 0.3.2
streaming-bytestring
This library enables fast and safe streaming of byte data, in either Word8
or
Char
form. It is a core addition to the streaming
ecosystem and avoids the usual pitfalls
of combinbing lazy ByteString
s with lazy IO
.
This library is used by
streaming-attoparsec
to enable vanilla Attoparsec
parsers to work with streaming
“for free”.
Usage
Importing and Types
Modules from this library are intended to be imported qualified. To avoid
conflicts with both the bytestring
library and streaming
, we recommended Q
as the qualified name:
import qualified Streaming.ByteString.Char8 as Q
Like the bytestring
library, leaving off the Char8
will expose an API based
on Word8
. Following the philosophy of streaming
that “the best API is the
one you already know”, these APIs are based closely on bytestring
. The core
type is ByteStream m r
, where:
m
: The Monad used to fetch further chunks from the “source”, usuallyIO
.r
: The final return value after all streaming has concluded, usually()
as instreaming
.
You can imagine this type to represent an infinitely-sized collection of bytes,
although internally it references a strict ByteString
no larger than 32kb,
followed by monadic instructions to fetch further chunks.
Examples
File Input
To open a file of any size and count its characters:
import Control.Monad.Trans.Resource (runResourceT)
import qualified Streaming.Streaming.Char8 as Q
-- | Represents a potentially-infinite stream of `Char`.
chars :: ByteStream IO ()
chars = Q.readFile "huge-file.txt"
main :: IO ()
main = runResourceT (Q.length_ chars) >>= print
Note that file IO specifically requires the
resourcet
library.
Line splitting and Stream
interop
In the example above you may have noticed a lack of Of
that we usually see
with Stream
. Our old friend lines
hints at this too:
lines :: Monad m => ByteStream m r -> Stream (ByteStream m) m r
A stream-of-streams, yet no Of
here either. The return type can’t naively be
Stream (Of ByteString) m r
, since the first line break might be at the very
end of a large file. Forcing that into a single strict ByteString
would crash
your program.
To count the number of lines whose first letter is i
:
countOfI :: IO Int
countOfI = runResourceT
. S.length_ -- IO Int
. S.filter (== 'i') -- Stream (Of Char) IO ()
. S.concat -- Stream (Of Char) IO ()
. S.mapped Q.head -- Stream (Of (Maybe Char)) IO ()
. Q.lines -- Stream (ByteStream IO) IO ()
$ Q.readFile "huge-file.txt" -- ByteStream IO ()
Critically, there are several functions which when combined with mapped
can
bring us back into Of
-land:
head :: Monad m => ByteStream m r -> m (Of (Maybe Char) r)
last :: Monad m => ByteStream m r -> m (Of (Maybe Char) r)
null :: Monad m => ByteStream m r -> m (Of Bool) r)
count :: Monad m => ByteStream m r -> m (Of Int) r)
toLazy :: Monad m => ByteStream m r -> m (Of ByteString r) -- Be careful with this.
toStrict :: Monad m => ByteStream m r -> m (Of ByteString r) -- Be even *more* careful with this.
When moving in the opposite direction API-wise, consider:
fromChunks :: Stream (Of ByteString) m r -> ByteStream m r
Changes
0.3.2 (2023-11-17)
Changed
- Ensure support for GHC 9.8.
0.3.1 (2023-06-28)
Changed
- Ensure support for GHC 9.6.
0.3.0 (2023-04-24)
Changed
- Dropped support for GHC 7.
- Tightened PVP version bounds, for GHC 8.0 through to GHC 9.4.4.
0.2.4 (2022-08-26)
Changed
- Changed
for
’s callback to returnByteStream m x
, to clarify that it is not used.
0.2.3 (2022-08-18)
Added
- Add
for :: Monad m => ByteStream m r -> (P.ByteString -> ByteStream m r) -> ByteStream m r
0.2.2 (2022-05-18)
Changed
- Dependency adjustments.
0.2.1 (2021-06-23)
Changed
- Performance improvement when using GHC 9.
0.2.0 (2020-10-26)
Note: The deprecations added in 0.1.7
have not been removed in this
version. Instead of 0.1.7
, that release should have been 0.2
in the first
place.
Added
- Add missing exports of
zipWithStream
,materialize
, anddematerialize
.
Changed
- Breaking: Switched names of
fold
andfold_
in the non-Char8
modules. The correspondingChar8
functions and the rest of the library uses_
for the variant that forgets ther
value. - Breaking: Unified
nextByte
/nextChar
withuncons
. The olduncons
returnedMaybe
instead of the more naturalEither r
. - Breaking: Similarly,
unconsChunk
andnextChunk
have been unified. nextByte
,nextChar
, andnextChunk
have been deprecated.- Relaxed signature of
toStrict_
to allow anyr
, not just()
. - Permance improvements for
packChars
anddenull
. - Various documentation improvements.
- Improved performance of
w8IsSpace
to more quickly filter out non-whitespace characters, and updatedwords
to use it instead of the internal functionisSpaceWord8
from thebytestring
package. See also bytestring#315.
Fixed
- An edge case involving overflow in
readInt
. - A potential crash in
uncons
. intersperse
now ignores any initial empty chunks.intercalate
now does not insert anything between the final substream and the outer stream end.unlines
now correctly handlesChunk "" (Empty r)
andEmpty r
.
0.1.7 (2020-10-14)
Thanks to Viktor Dukhovni and Colin Woodbury for their contributions to this release.
Added
- The
skipSomeWS
function for efficiently skipping leading whitespace of both ASCII and non-ASCII.
Changed
- The
ByteString
type has been renamed toByteStream
. This fixes a well-reported confusion from users. An alias to the old name has been provided for back-compatibility, but is deprecated and be removed in the next major release. - Modules have been renamed to match the precedent set by the main
streaming
library. Aliases to the old names have been provided, but will be removed in the next major release.Data.ByteString.Streaming
->Streaming.ByteString
Data.ByteString.Streaming.Char8
->Streaming.ByteString.Char8
- An order-of-magnitude performance improvement in line splitting. #18
- Performance and correctness improvements for the
readInt
function. #31 - Documentation improved, and docstring coverage is now 100%. #27
Fixed
- An incorrect comment about
Handle
s being automatically closed upon EOF withhGetContents
andhGetContentsN
. #9 - A crash in
group
andgroupBy
when reading too many bytes. #22 groupBy
incorrectly ordering its output elements. #4
0.1.6
Semigroup
instance forByteString m r
added- New function
lineSplit
0.1.5
- Update for
streaming-0.2