Apache Parquet

Apache Parquet is is open-source columnar data format by Apache Software Foundation.

Being a columnar data format, it is highly compressible. Having native support for several data types, it is far more robust than text-only formats, like CSV.

Parquet is a good choice as intermediate data lake storage format before bringing the data into Big Data systems. Snappy compression is built into the format, yet one may choose Gzip for better compression instead.

One should always opt to use Parquet instead of CSV for data load intermediate storage. Resulting files are far smaller and BLOB data won't cause havoc like it may with CSV.

Apache Parquet

data types we support

Integral
bigint
int
decimal
decimal
double
float
text
ntext
Binary
binary (graphic)
varbinary (varbin, binary varying)
Date/Time
timestamp
Large objects
byte array
ntext
Other
boolean
timespan