csv-conduit
A flexible, fast, conduit-based CSV parser library for Haskell.
http://github.com/ozataman/csv-conduit
| LTS Haskell 24.28: | 1.0.1.1 |
| Stackage Nightly 2026-01-18: | 1.0.1.1 |
| Latest on Hackage: | 1.0.1.1 |
csv-conduit-1.0.1.1@sha256:7f8a786dd3de0dfe394e969b46da6dc103ba2d1e05ddc567a93647d66c56b930,3404Module documentation for 1.0.1.1
- Data
- Data.CSV
- Data.CSV.Conduit
- Data.CSV.Conduit.Conversion
- Data.CSV.Conduit.Parser
- Data.CSV.Conduit.Parser.ByteString
- Data.CSV.Conduit.Parser.Text
- Data.CSV.Conduit.Types
- Data.CSV.Conduit
- Data.CSV
README
CSV Files and Haskell
CSV files are the de-facto standard in many cases of data transfer, particularly when dealing with enterprise application or disparate database systems.
While there are a number of csv libraries in Haskell, at the time of this project’s start, there wasn’t one that provided all of the following:
- Full flexibility in quote characters, separators, input/output
- Constant space operation
- Robust parsing and error resiliency
- Battle-tested reliability in real-world datasets
- Fast operation
- Convenient interface that supports a variety of use cases
Over time, people created other plausible CSV packages like cassava. The major benefit from this library remains to be:
- Direct participation in the conduit ecosystem, which is now quite large, and all the benefits that come with it.
- Flexibility in CSV format definition.
- Resiliency to errors in the input data.
This package
csv-conduit is a conduit-based CSV parsing library that is easy to use, flexible and fast. It leverages the conduit infrastructure to provide constant-space operation, which is quite critical in many real world use cases.
For example, you can use http-conduit to download a CSV file from the internet and plug its Source into intoCSV to stream-convert the download into the Row data type and do something with it as the data streams, that is without having to download the entire file to disk first.
Author & Contributors
- Ozgun Ataman (@ozataman)
- Daniel Bergey (@bergey)
- BJTerry (@BJTerry)
- Mike Craig (@mkscrg)
- Daniel Corson (@dancor)
- Dmitry Dzhus (@dzhus)
- Niklas Hambüchen (@nh2)
- Facundo Domínguez (@facundominguez)
- Daniel Vianna (@dmvianna)
Introduction
- The CSVeable typeclass implements the key operations.
- CSVeable is parameterized on both a stream type and a target CSV row type.
- There are 2 basic row types and they implement exactly the same operations,
so you can chose the right one for the job at hand:
type MapRow t = Map t ttype Row t = [t]
- You basically use the Conduits defined in this library to do the parsing from a CSV stream and rendering back into a CSV stream.
- Use the full flexibility and modularity of conduits for sources and sinks.
Speed
While fast operation is of concern, I have so far cared more about correct operation and a flexible API. Please let me know if you notice any performance regressions or optimization opportunities.
Usage Examples
Example #1: Basics Using Convenience API
{-# LANGUAGE OverloadedStrings #-}
import Data.Conduit
import Data.Conduit.Binary
import Data.Conduit.List as CL
import Data.CSV.Conduit
import Data.Text (Text)
-- Just reverse te columns
myProcessor :: Monad m => Conduit (Row Text) m (Row Text)
myProcessor = CL.map reverse
test :: IO ()
test = runResourceT $
transformCSV defCSVSettings
(sourceFile "input.csv")
myProcessor
(sinkFile "output.csv")
Example #2: Basics Using Conduit API
{-# LANGUAGE OverloadedStrings #-}
import Data.Conduit
import Data.Conduit.Binary
import Data.CSV.Conduit
import Data.Text (Text)
myProcessor :: Monad m => Conduit (Row Text) m (Row Text)
myProcessor = awaitForever $ yield
-- Let's simply stream from a file, parse the CSV, reserialize it
-- and push back into another file.
test :: IO ()
test = runResourceT $
sourceFile "test/BigFile.csv" $=
intoCSV defCSVSettings $=
myProcessor $=
fromCSV defCSVSettings $$
sinkFile "test/BigFileOut.csv"
Changes
1.0.1.1
- Fix test suite to build with text 2.1.2 and ghc 9.12.2, resolving #62
1.0.1.0
- Use ConduitT instead of ConduitM (prettier type inference with newer conduit imports)
1.0.0.2
- Fixed #17, where CSV created with Excel in Mac OS failed to parse due to its newline characters.
1.0.0.1
- Removed dependencies: mmorph, monad-control, mtl, unordered-containers, primitive
1.0.0.0
- Removed
returnfrom theMonadinstance forParser, and transfered its definition topurein theApplicativeinstance ofParser. This was necessary to support GHC 9.6.4. - Create new API to choose whether to handle empty CSV cells as empty strings or NULLs.
- Added imports that were removed from
Preludein GHC 9.6.4. - Bumped the default Stack resolver to LTS-22.20.
0.7.3.0
- Add ordered versions of named records for consistent, controllable header column ordering. PR 44
- Add support for GHC 9.0.1
0.7.2.0
- Remove some dependency upper bounds for forward compatibility.
0.7.1.0
- Add MonadFail instance for Parser. PR 38
0.7.0.0
- BREAKING: Switch from partial Monoid instance on Parser to total Semigroup instance.
- Compatibility with GHC 8.4.x/base-4.11.1.0
0.6.8.1
- Fix documentation mistake in FromNamedRecord/ToNamedRecord examples.
0.6.8
- Haddocks improvements
- Fix inlining and specialization rules around formatDecimal
- Updates to permit newest conduit/resourcet packages
0.6.7
- Fix build for GHC 8.0.1