Data Lakes and Warehouses

Event Transactions

Transactions (everyday events stored in rows as they occur) are similar to how you would naturally store data in Excel. However, analysis is typically performed over long time periods, and it would be very inefficient to traverse millions or even billions of rows for each analytical query.

How does loading data into Data Warehouses work?

Data warehouses use columnar storage (storing data column by column), which allows them exceptionally strong compression and quick data aggregation.

However, this means that loading data into data warehouses require a different process than standard row-by-row insertion coming from relational databases optimized for transaction processing. One should prepare the data in a Blob storage and instruct the warehouse to ingest those files. The warehouse does so in a massively parallel manner.

Such files must be of an optimal size, usually in the range of 100 to 200 megabytes each, which means that data from a single large table will be spread over many files.

Preparing Data

One can prepare data in several different formats depending on what the data warehouse supports. Typically, it can be CSV, Parquet, Orc, or Avro. Omni Loader supports CSV and Parquet file formats in both uncompressed and compressed variants. Specifically for CSV, we support GZIP compression, while for Parquet, we support both GZIP compression and Snappy.

Migrating row-based data from the source is a multi-step process. Data has to be read sequentially, data types mapped into the types supported by the target database (and the intermediate file format), row-based structure transformed into columnar-based one, compressed, uploaded to the data lake, and only then can the target warehouse be notified there is data to ingest.

Omni Loader hides the complexity of that process so loading the data into a data warehouse looks as if one simply copies the data over from the standard OLTP source to the standard OLTP target. That is exactly what one wants - a boring and predictable copying process.

FAQ

Frequently asked questions

By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Preferences Deny Accept

Data Lakes and Warehouses

Event Transactions

How does loading data into Data Warehouses work?

Preparing Data

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris imperdiet egestas blandit. Suspendisse rhoncus congue lorem ut malesuada.

Frequently asked questions

TECHNOLOGY

RESOURCES

LEGAL