Snapshots in dbt: A Quick Overview
dbt snapshots allow you to “look back” at historical data by capturing changes in your database tables. This is achieved by implementing type-2 Slowly Changing Dimensions (SCDs), which track how a row has changed over time. For example, you can keep track of an order’s status as it moves from ‘pending’ to ‘shipped’ to ‘delivered’.
In diesem Artikel:
How Snapshots Work
When you run dbt snapshot
, it creates a new table with the same columns as your source table, plus additional metadata columns like dbt_valid_from
und dbt_valid_to
. On subsequent runs, dbt updates the dbt_valid_to
column for changed records and adds new records with dbt_valid_to
set to null.
dbt offers different strategies for detecting changes, including comparing timestamps, specific column values, or a unique key. You can also configure dbt to track deletions.
Snapshots Best Practices
- Store snapshots in a separate schema to keep your historical data organized.
- Snapshot raw data using the
source
function and select all columns. - Avoid joins in your snapshot queries; instead, create separate snapshots and join them downstream.
- If you need to perform transformations, do so in ephemeral models before snapshotting.
- Schedule snapshots to run frequently to capture changes regularly.
By following these best practices, you can ensure that your dbt snapshots are accurate, efficient, and easy to maintain.
Das Video ansehen
Treffen mit dem Sprecher
Dmytro Polischtschuk
Senior BI-Berater
Dmytro Polishchuk has 7 years of experience in business intelligence and works as a Senior BI Consultant for Scalefree. Dmytro is a proven Data Vault 2.0 expert and has excellent knowledge of various (cloud) architectures, data modeling, and the implementation of automation frameworks. Dmytro excels in team integration and structured project work. Dmytro has a bachelor’s degree in Finance and Financial Management.