I’m a big fan of Avro. Why?
- It has a direct mapping to and from JSON.
- It has a very compact format. The bulk of JSON, repeating every field name with every single record, is what makes JSON inefficient for high-volume usage.
- It is very fast.
- It has great bindings for a wide variety of programming languages so you can generate Java objects that make working with event data easier.
- It does not require code generation so tools can be written generically for any data stream.
- It has a rich, extensible schema language defined in pure JSON.
- It has the best notion of compatibility for evolving your data over time.
Confluent, who I’m also a big fan of think so to and have built a schema registry to integrate Kafka with Avro.
However, I’ve recently been working with Scala, I’m not an expert or advanced but can use it to get things done. One of the things I like is case classes and pattern matching. These don’t sit well with Java classes which come from the Generics API or the generated class that are the result of the sbt, maven plugins or Avro tools. Also using reflection on case classes or regular scala classes to get the schema
Additionally, I’ve been working with inbound JSON and converting it to Avro, this is relatively straightforward and I used Kite to make it simpler.
You can use this in a Flume interceptor for example.
I discussed this with some Scala experts and they put together the following:
https://github.com/51zero/avro4s by Samuel Stephens
Avro4s is a scheme/class generation and serializing/deserializing library for Avro written in Scala. The objective is to allow seamless use with Scala without the need to to write boilerplate conversions yourself, and without the runtime overhead of reflection.
And it use at http://www.landoop.com by Antonios Chalkiopoulos