Native Haskell implementation of Avro

This is a Haskell Avro library useful for decoding and encoding Avro data structures. Avro can be thought of as a serialization format and RPC specification which induces three separable tasks:

  • Serialization/Deserialization - This library has been used “in anger” for:
    • Deserialization of avro container files
    • Serialization/deserialization Avro messages to/from Kafka topics
  • RPC - There is currently no support for Avro RPC in this library.

This library also provides functionality for automatically generating Avro-related data types and instances from Avro schemas (using TemplateHaskell).

Quickstart

This library provides the following conversions between Haskell types and Avro types:

Haskell type Avro type
() “null”
Bool “boolean”
Int, Int64 “long”
Int32 “int”
Double “double”
Text “string”
ByteString “bytes”
Maybe a [“null”, “a”]
Either a b [“a”, “b”]
Map Text a {“type”: “map”, “value”: “a”}
Map String a {“type”: “map”, “value”: “a”}
HashMap Text a {“type”: “map”, “value”: “a”}
HashMap String a {“type”: “map”, “value”: “a”}
[a] {“type”: “array”, “value”: “a”}

User defined data types should provide HasAvroSchema/ToAvro/FromAvro instances to be encoded/decoded to/from Avro.

Defining types and HasAvroSchema / FromAvro / ToAvro manually

Typically these imports are useful:

import           Data.Avro
import           Data.Avro.Schema as S
import qualified Data.Avro.Types  as AT

Assuming there is a data type to be encoded/decoded from/to Avro:

data Gender = Male | Female deriving (Eq, Ord, Show, Enum)
data Person = Person
     { fullName :: Text
     , age      :: Int32
     , gender   :: Gender
     , ssn      :: Maybe Text
     } deriving (Show, Eq)

Avro schema for this type can be defined as:

genderSchema :: Schema
genderSchema = mkEnum "Gender" [] Nothing Nothing ["Male", "Female"]

personSchema :: Schema
personSchema =
  Record "Person" Nothing [] Nothing Nothing
    [ fld "name"   String       Nothing
    , fld "age"    Int          Nothing
    , fld "gender" genderSchema Nothing
    , fld "ssn" (mkUnion $ Null :| [String]) Nothing
    ]
    where
     fld nm ty def = Field nm [] Nothing Nothing ty def

instance HasAvroSchema Person where
  schema = pure personSchema

ToAvro instance for Person can be defined as:

instance ToAvro Person where
  schema = pure personSchema
  toAvro p = record personSchema
             [ "name"   .= fullName p
             , "age"    .= age p
             , "gender" .= gender p
             , "ssn"    .= ssn p
             ]

FromAvro instance for Person can be defined as:

instance FromAvro Person where
  fromAvro (AT.Record _ r) =
    Person <$> r .: "name"
           <*> r .: "age"
           <*> r .: "gender"
           <*> r .: "ssn"
  fromAvro r = badValue r "Person"

Defining types and HasAvroSchema / FromAvro / ToAvro “automatically”

This library provides functionality to derive Haskell data types and HasAvroSchema/FromAvro/ToAvro instances “automatically” from already existing Avro schemas (using TemplateHaskell).

Examples

deriveAvro will derive data types, FromAvro and ToAvro instances from a provided Avro schema file:

{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE DeriveGeneric   #-}
import Data.Avro.Deriving

deriveAvro "schemas/contract.avsc"

Similarly, deriveFromAvro can be used to only derive data types and FromAvro, but not ToAvro instances.

If you prefer defining Avro schema in Haskell and not in avsc, then deriveAvro' can be used instead of deriveAvro.

Conventions

When Haskell data types are generated, these conventions are followed:

  • Type and field names are “sanitized”: all the charachers except [a-z,A-Z,',_] are removed from names
  • Field names are prefixed with the name of the record they are declared in.

For example, if Avro schema defines Person record as:

{ "type": "record",
  "name": "Person",
  "fields": [
    { "name": "name", "type": "string"}
  ]
}

then generated Haskell type will look like:

data Person = Person
     { personName :: Text
     } deriving (Show, Eq)

Limitations

Two-parts unions like ["null", "MyType"] or ["MyType", "YourType"] are supported (as Haskell’s Maybe MyType and Either MyType YourType), but multi-parts unions are currently not supported. It is not due to any fundamental problems but because it has not been done yet. PRs are welcomed! :)

TODO

Please see the TODO

Changes

Revision history for avro

HEAD

  • Replace entropy library with tf-random, which is easier to use with ghcjs

0.4.1.1

  • Fixed bugs in handling of namespaces when parsing and printing avro types
  • Fixed a schema overlay test

0.4.1.0

  • Fixed an omitted data fixture from the cabal sdist
  • Improvements on experimental lazy decoding (up to 25% faster on our tests)
  • Useful instances for EitherN

0.4.0.0

  • Technical release to respect potentially breaking changes introduced earlier.

0.3.6.1

  • Fixed Data.Avro.Schema.extractBindings by @TikhonJelvis

0.1.0.0 – YYYY-mm-dd

  • First version. Released on an unsuspecting world.