The Haskell Stream Processor command line utility
|Latest on Hackage:||0.3|
This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.
Haskell Stream Processor
Haskell Stream Processor is a command line utility to process streams using Haskell code.
There are many reasons why Haskell is suitable for stream processing from the command line. Code written in Haskell is concise thanks to a clean syntax and the type inference which allows code without type decoration. Also it is very easy to define one-line transformations by combining functions.
hsp "L.map (L.head . words) . lines"
prints the first word of each line of the input stream.
From the project directory
This will compile and install the executable
hsp and the library
hsp supports different modes:
Evaluate an expression
It is possible to use
hsp to evaluate a user expression without
input using the option
hsp -e "1"
Work on the stream
The standard mode of
hsp process the whole stream. It accepts a
string representing a transformation from the stream, that has type
Data.ByteString.Lazy.ByteString, to some value with type that is an
ByteString -> Rows a
Rows is a special case of
Show for representing data on the
command line . For example, to print on stdout what it gets from stdin:
Split stream in chunks and process them
Many times, stream processing is about splitting the stream on some delimiter,
'\n', and process each chunk of data. With the standard mode of
hsp this can be achieved using the
split function of
hsp "L.filter (not . null) . split '\n'"
This happens so often that
hsp has a mode to split automatically the
stream on a delimiter using
-d [<delimiter>]. If
<delimiter> is omitted, then it is set to
function provided must have type:
[ByteString] -> Rows a
The command before can be rewritten as:
hsp -d "L.filter (not . null)"
Map a function on each chunk of data
A specific case of
hsp -d <delimiter> is
hsp -d <delimiter> -m that
is equivalent of mapping the supplied function to the input list. In this case
the function must have type:
ByteString -> Row a
For example, to take the first word of each line:
hsp -m "L.head . words"
-m is specified,
-d can be omitted and the delimiter is
automatically set to
Haskell Stream Processor is a command line utility and for this reason it needs
informations, like which modules should be loaded, that cannot be easily passed
as arguments. There are two configuration files located under
$HOME/.hsp, one to import modules and one to import user defined
Haskell Stream Processor reads a list of modules to load from the file
$HOME/.hsp/modules. Each line of this file is composed by the name of a
module eventually followed by a space and it's qualified name. An example could
Control.Monad Data.List L
which means that all the functions from
will be available to the user, but for
Data.List functions you must
qualify them with
There are some modules that are loaded automatically without qualification. In particular,
Data.ByteString.Lazy.Char8 is automatically loaded
hsp works on lazy bytestrings. This means functions like that
Prelude work on list, like
hsp work on
ByteStrings. Same for function that work on
Prelude is loaded with the qualified name
P, so its
functions are not directly visible.
An example of module file can be found in the example directory.
User defined functions
It is possible to define new function to be used in Haskell Stream Processor
inside the file
An example of toolkit can be found in the example directory.
Differences with the Glasgow Haskell Compiler
It is already possible to evaluate an function using the
Glasgow Haskell Compiler using the option
-e and by passing the custom function to
ghc -e "interact id"
The main differences are that Haskell Stream Processor works on (lazy)
ByteString instead of the slower
String, it can load modules
automatically from the
module file and can load user defined functions
toolkit.hs file. Also, Haskell Stream Processor supports
different modes from working on the entire stream, like working on each line.
In all the examples,
Data.ByteString is loaded without qualification
Data.List is qualified as
L. The function
match is an
hsp -e "2^100"
Print numbers from 1 to 100:
hsp -e "[1 .. 100]"
Take the first line of a stream:
... | hsp -d "L.take 1"
Take the last two lines of a stream:
... | hsp -d "L.reverse . L.take 2 . L.reverse"
Print the 10th element of each line:
... | hsp -m "(L.!! 10) . words"
Print the elements from the 2nd to the 20th of each line:
... | hsp -m "L.take 20 . L.drop 1 . words"
Get the number of words:
... | hsp -d "L.length . L.concatMap words"
Get the number of lines:
... | hsp -d "L.length"
Sort integers and remove duplicates:
... | hsp -d "L.nub . L.sort . L.map asInt"
Sum the 2nd elements of every line:
... | hsp -d "P.sum . L.map (asFloat . (L.!! 1) . words)"
Split each line on a delimiter ':' and print the second element:
... | hsp -m "(L.!! 1) . split ':'"
Remove empty lines:
... | hsp -d "L.filter (not . null)"
Filter lines that match a pattern:
... | hsp -d "L.filter (`match` "t\\w\\wt")"