Apache Avro

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services.
Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages; one for human editing and another which is more machine-readable based on JSON.
It is similar to Thrift and Protocol Buffers, but does not require running a code-generation program when a schema changes.
Apache Spark SQL can access Avro as a data source.

Avro Object Container File

An Avro Object Container File consists of:

A file header, followed by
one or more file data blocks.

A file header consists of:

Four bytes, ASCII 'O', 'b', 'j', followed by the Avro version number which is 1 .
File metadata, including the schema definition.
The 16-byte, randomly-generated sync marker for this file.

For data blocks Avro specifies two serialization encodings: binary and JSON. Most applications will use the binary encoding, as it is smaller and faster. For debugging and web-based applications, the JSON encoding may sometimes be appropriate.

Schema definition

Avro schemas are defined using JSON. Schemas are composed of primitive types and complex types.
Simple schema example:

Serializing and deserializing

Data in Avro might be stored with its corresponding schema, meaning a serialized item can be read without knowing the schema ahead of time.

Example serialization and deserialization code in Python

Serialization:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
schema = avro.schema.Parse.read) # need to know the schema to write. According to 1.8.2 of Apache Avro
writer = DataFileWriter, DatumWriter, schema)
writer.append
writer.append
writer.close

File "users.avro" will contain the schema in JSON and a compact binary representation of the data:

$ od -v -t x1z users.avro
0000000 4f 62 6a 01 04 14 61 76 72 6f 2e 63 6f 64 65 63 >Obj...avro.codec<
0000020 08 6e 75 6c 6c 16 61 76 72 6f 2e 73 63 68 65 6d >.null.avro.schem<
0000040 61 ba 03 7b 22 74 79 70 65 22 3a 20 22 72 65 63 >a..<
0000400 00 05 f9 a3 80 98 47 54 62 bf 68 95 a2 ab 42 ef >......GTb.h...B.<
0000420 24 04 2c 0c 41 6c 79 73 73 61 00 80 04 02 06 42 >$.,.Alyssa.....B<
0000440 65 6e 00 0e 00 06 72 65 64 05 f9 a3 80 98 47 54 >en....red.....GT<
0000460 62 bf 68 95 a2 ab 42 ef 24 >b.h...B.$<
0000471

Deserialization:

reader = DataFileReader, DatumReader) # the schema is embedded in the data file
for user in reader:
print user
reader.close

This outputs:

Languages with APIs

Though theoretically any language could use Avro, the following languages have APIs written for them:

C
C++
C#
Go
Haskell
Java
Javascript
Perl
PHP
Python
Ruby
Rust
Scala
Avro IDL

In addition to supporting JSON for type and protocol definitions, Avro includes experimental support for an alternative interface description language syntax known as Avro IDL. Previously known as GenAvro, this format is designed to ease adoption by users familiar with more traditional IDLs and programming languages, with a syntax similar to C/C++, Protocol Buffers and others.

Logo

The Apache Avro logo is from the defunct British aircraft manufacturer Avro. Football team Avro F.C. uses the same logo.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...