What is a Microbatch?
Microbatch is an innovative incremental strategy designed for large time-series datasets. Introduced in dbt Core version 1.9 (currently in beta), it complements existing incremental strategies by offering a structured and efficient way to process data in batches.
In this article:
Key features of Microbatch include:
- Utilizes a time column to define batch ranges.
- Supports reprocessing failed batches.
- Auto-detects parallel batch y
- Eliminates complex conditional logic for backfilling.
However, it’s not suitable for datasets lacking a reliable time column or requiring fine-grained control over processing logic.
How Microbatches Work
Microbatching works by splitting model processing into multiple queries (batches) based on:
- event_time: The time column defining batch ranges.
- batch_size: The time period for each batch (hour, day [default], month, year).
Each batch functions as an independent, atomic unit, meaning:
- Batches can be processed, retried, or replaced individually.
- Parallel execution enables separate, idempotent batch processing.
Batch replacement strategies vary by database adapter:
- Postgres: Uses
merge
. - BigQuery, Spark: Uses
insert_overwrite
. - Databricks: Uses
replace_where
. - Redshift, Snowflake: Uses
delete + insert
.
Microbatch Model Configurations
When setting up a Microbatch model, the following configurations are required:
- event_time: Specifies the time column in UTC.
- batch_size: Defines batch granularity (hour, day, month, year).
- begin: Sets the start point for initial or full-refresh builds.
Optional configurations include:
- lookback: Processes prior batches for late-arriving records.
- concurrent_batches: Controls parallel execution (auto-detected by default).
Running Batches in Parallel
Parallel execution is automatically detected based on batch conditions and adapter support. However, users can override this behavior using the concurrent_batches
setting.
Parallel execution is possible when:
- The batch is neither the first nor last in the sequence.
- The database adapter supports parallel execution.
- The model logic does not depend on execution order.
How to Backload Data
Backloading allows reprocessing historical data within a specific time range using the following command:
dbt run --event-time-start "2025-02-01" --event-time-end "2025-02-03"
This ensures that only batches within the defined range are processed independently.
Microbatch vs. Other Incremental Strategies
Microbatch differs from traditional incremental strategies by:
- Using independent queries for time-based batches.
- Eliminating the need for
is_incremental()
and complex SQL logic. - Automatically selecting the most efficient operation (insert, update, replace) for each platform.
Conclusion
Microbatch is a powerful new approach to incremental data processing in dbt Core. By breaking down large datasets into manageable, parallelizable chunks, it simplifies data modeling while improving efficiency and scalability. However, it is essential to consider whether Microbatch suits your data pipeline’s requirements before implementing it.
Watch the Video
Meet the Speaker

Dmytro Polishchuk
Senior BI Consultant
Dmytro Polishchuk has 7 years of experience in business intelligence and works as a Senior BI Consultant for Scalefree. Dmytro is a proven Data Vault 2.0 expert and has excellent knowledge of various (cloud) architectures, data modeling, and the implementation of automation frameworks. Dmytro excels in team integration and structured project work. Dmytro has a bachelor’s degree in Finance and Financial Management.