Native Haskell implementation of Avro
This is a Haskell Avro library useful for decoding
and encoding Avro data structures. Avro can be thought of as a serialization
format and RPC specification which induces three separable tasks:
- Serialization/Deserialization - This library has been used “in anger” for:
- Deserialization of avro container files
- Serialization/deserialization Avro messages to/from Kafka topics
- RPC - There is currently no support for Avro RPC in this library.
This library also provides functionality for automatically generating Avro-related data types and instances from Avro schemas (using TemplateHaskell).
Quickstart
This library provides the following conversions between Haskell types and Avro types:
Haskell type |
Avro type |
() |
“null” |
Bool |
“boolean” |
Int, Int64 |
“long” |
Int32 |
“int” |
Double |
“double” |
Text |
“string” |
ByteString |
“bytes” |
Maybe a |
[“null”, “a”] |
Either a b |
[“a”, “b”] |
Map Text a |
{“type”: “map”, “value”: “a”} |
Map String a |
{“type”: “map”, “value”: “a”} |
HashMap Text a |
{“type”: “map”, “value”: “a”} |
HashMap String a |
{“type”: “map”, “value”: “a”} |
[a] |
{“type”: “array”, “value”: “a”} |
User defined data types should provide HasAvroSchema
/ToAvro
/FromAvro
instances to be encoded/decoded to/from Avro.
Defining types and HasAvroSchema
/ FromAvro
/ ToAvro
manually
Typically these imports are useful:
import Data.Avro
import Data.Avro.Schema as S
import qualified Data.Avro.Types as AT
Assuming there is a data type to be encoded/decoded from/to Avro:
data Gender = Male | Female deriving (Eq, Ord, Show, Enum)
data Person = Person
{ fullName :: Text
, age :: Int32
, gender :: Gender
, ssn :: Maybe Text
} deriving (Show, Eq)
Avro schema for this type can be defined as:
genderSchema :: Schema
genderSchema = mkEnum "Gender" [] Nothing Nothing ["Male", "Female"]
personSchema :: Schema
personSchema =
Record "Person" Nothing [] Nothing Nothing
[ fld "name" String Nothing
, fld "age" Int Nothing
, fld "gender" genderSchema Nothing
, fld "ssn" (mkUnion $ Null :| [String]) Nothing
]
where
fld nm ty def = Field nm [] Nothing Nothing ty def
instance HasAvroSchema Person where
schema = pure personSchema
ToAvro
instance for Person
can be defined as:
instance ToAvro Person where
schema = pure personSchema
toAvro p = record personSchema
[ "name" .= fullName p
, "age" .= age p
, "gender" .= gender p
, "ssn" .= ssn p
]
FromAvro
instance for Person
can be defined as:
instance FromAvro Person where
fromAvro (AT.Record _ r) =
Person <$> r .: "name"
<*> r .: "age"
<*> r .: "gender"
<*> r .: "ssn"
fromAvro r = badValue r "Person"
Defining types and HasAvroSchema
/ FromAvro
/ ToAvro
“automatically”
This library provides functionality to derive Haskell data types and HasAvroSchema
/FromAvro
/ToAvro
instances “automatically” from already existing Avro schemas (using TemplateHaskell).
Examples
deriveAvro
will derive data types, FromAvro
and ToAvro
instances from a provided Avro schema file:
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE DeriveGeneric #-}
import Data.Avro.Deriving
deriveAvro "schemas/contract.avsc"
Similarly, deriveFromAvro
can be used to only derive data types and FromAvro
, but not ToAvro
instances.
If you prefer defining Avro schema in Haskell and not in avsc
, then deriveAvro'
can be used instead of deriveAvro
.
Conventions
When Haskell data types are generated, these conventions are followed:
- Type and field names are “sanitized”:
all the charachers except
[a-z,A-Z,',_]
are removed from names
- Field names are prefixed with the name of the record they are declared in.
For example, if Avro schema defines Person
record as:
{ "type": "record",
"name": "Person",
"fields": [
{ "name": "name", "type": "string"}
]
}
then generated Haskell type will look like:
data Person = Person
{ personName :: Text
} deriving (Show, Eq)
Limitations
Two-parts unions like ["null", "MyType"]
or ["MyType", "YourType"]
are supported (as Haskell’s Maybe MyType
and Either MyType YourType
), but multi-parts unions are currently not supported.
It is not due to any fundamental problems but because it has not been done yet. PRs are welcomed! :)
TODO
Please see the TODO