This library enables fast and safe streaming of byte data, in either
Char form. It is a core addition to the
ecosystem and avoids the usual pitfalls
of combinbing lazy
ByteStrings with lazy
Importing and Types
Modules from this library are intended to be imported qualified. To avoid
conflicts with both the
bytestring library and
streaming, we recommended
as the qualified name:
import qualified Streaming.ByteString.Char8 as Q
bytestring library, leaving off the
Char8 will expose an API based
Word8. Following the philosophy of
streaming that “the best API is the
one you already know”, these APIs are based closely on
bytestring. The core
ByteStream m r, where:
m: The Monad used to fetch further chunks from the “source”, usually
r: The final return value after all streaming has concluded, usually
You can imagine this type to represent an infinitely-sized collection of bytes,
although internally it references a strict
ByteString no larger than 32kb,
followed by monadic instructions to fetch further chunks.
To open a file of any size and count its characters:
import Control.Monad.Trans.Resource (runResourceT) import qualified Streaming.Streaming.Char8 as Q -- | Represents a potentially-infinite stream of `Char`. chars :: ByteStream IO () chars = Q.readFile "huge-file.txt" main :: IO () main = runResourceT (Q.length_ chars) >>= print
Note that file IO specifically requires the
Line splitting and
In the example above you may have noticed a lack of
Of that we usually see
Stream. Our old friend
lines hints at this too:
lines :: Monad m => ByteStream m r -> Stream (ByteStream m) m r
A stream-of-streams, yet no
Of here either. The return type can’t naively be
Stream (Of ByteString) m r, since the first line break might be at the very
end of a large file. Forcing that into a single strict
ByteString would crash
To count the number of lines whose first letter is
countOfI :: IO Int countOfI = runResourceT . S.length_ -- IO Int . S.filter (== 'i') -- Stream (Of Char) IO () . S.concat -- Stream (Of Char) IO () . S.mapped Q.head -- Stream (Of (Maybe Char)) IO () . Q.lines -- Stream (ByteStream IO) IO () $ Q.readFile "huge-file.txt" -- ByteStream IO ()
Critically, there are several functions which when combined with
bring us back into
head :: Monad m => ByteStream m r -> m (Of (Maybe Char) r) last :: Monad m => ByteStream m r -> m (Of (Maybe Char) r) null :: Monad m => ByteStream m r -> m (Of Bool) r) count :: Monad m => ByteStream m r -> m (Of Int) r) toLazy :: Monad m => ByteStream m r -> m (Of ByteString r) -- Be careful with this. toStrict :: Monad m => ByteStream m r -> m (Of ByteString r) -- Be even *more* careful with this.
When moving in the opposite direction API-wise, consider:
fromChunks :: Stream (Of ByteString) m r -> ByteStream m r
- Performance improvement when using GHC 9.
Note: The deprecations added in
0.1.7 have not been removed in this
version. Instead of
0.1.7, that release should have been
0.2 in the first
- Add missing exports of
- Breaking: Switched names of
fold_in the non-
Char8modules. The corresponding
Char8functions and the rest of the library uses
_for the variant that forgets the
- Breaking: Unified
uncons. The old
Maybeinstead of the more natural
- Breaking: Similarly,
nextChunkhave been unified.
nextChunkhave been deprecated.
- Relaxed signature of
toStrict_to allow any
r, not just
- Permance improvements for
- Various documentation improvements.
- Improved performance of
w8IsSpaceto more quickly filter out non-whitespace characters, and updated
wordsto use it instead of the internal function
bytestringpackage. See also bytestring#315.
- An edge case involving overflow in
- A potential crash in
interspersenow ignores any initial empty chunks.
intercalatenow does not insert anything between the final substream and the outer stream end.
unlinesnow correctly handles
Chunk "" (Empty r)and
Thanks to Viktor Dukhovni and Colin Woodbury for their contributions to this release.
skipSomeWSfunction for efficiently skipping leading whitespace of both ASCII and non-ASCII.
ByteStringtype has been renamed to
ByteStream. This fixes a well-reported confusion from users. An alias to the old name has been provided for back-compatibility, but is deprecated and be removed in the next major release.
- Modules have been renamed to match the precedent set by the main
streaminglibrary. Aliases to the old names have been provided, but will be removed in the next major release.
- An order-of-magnitude performance improvement in line splitting. #18
- Performance and correctness improvements for the
- Documentation improved, and docstring coverage is now 100%. #27
- An incorrect comment about
Handles being automatically closed upon EOF with
- A crash in
groupBywhen reading too many bytes. #22
groupByincorrectly ordering its output elements. #4
ByteString m radded
- New function
- Update for