Data Lakes and Warehouses

Relational databases are typically used for two distinct purposes — transaction processing and analysis. Because these are very different use-cases, the optimal underlying architecture varies significantly between the two.

Omniloader  feature

Event Transactions

Transaction processing stores everyday events as individual rows in real-time — like logging data in Excel. This structure becomes inefficient for analysis. Examining such data over long periods would require processing millions of records for each query.

How does loading data into Data Warehouses work?

Unlike transaction databases that store data row-by-row, data warehouses use columnar storage, organizing data column-by-column. This structure enables exceptional compression and fast data aggregation. But it requires a different loading approach.

Instead of inserting individual rows, data warehouses work with prepared data files stored in blob storage. The warehouse ingests these files using massively parallel processing for maximum speed.

These files must be optimally sized — typically 100 to 200 megabytes each — meaning data from a single large table is distributed across multiple files for efficient processing.

Preparing Data

Data warehouses accept files in various formats — typically CSV, Parquet, ORC, or Avro. Omni Loader supports CSV and Parquet formats, both uncompressed and compressed (GZIP for CSV; GZIP and Snappy for Parquet).

Converting row-based source data for warehouse ingestion involves multiple complex steps: reading data sequentially, mapping data types, transforming from row-based to columnar structure, compressing files, uploading to data lake storage, and finally notifying the warehouse to begin ingestion.

Omni Loader automates this entire process. What appears to be a simple data copy between databases actually handles all the behind-the-scenes complexity, giving you a straightforward, reliable migration experience.

An optimized process

Omni Loader reads the data, transforms it into a columnar format, compresses it, uploads it into the blob storage, then has the data warehouse ingest the data files as they become ready.
All that without requiring you to set up any ETL pipelines.

graphic icon
Sisyphus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris imperdiet egestas blandit. Suspendisse rhoncus congue lorem ut malesuada.

Avatar image
Candice Doe
Product Manager, Sisyphus
Save time, increase efficiency.

Ready to get started?

FAQ

Frequently asked questions

By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.